Performance Zone is brought to you in partnership with:

Daniel has posted 3 posts at DZone. View Full User Profile

Optimizing JPA Performance: An EclipseLink, Hibernate, and OpenJPA Comparison

07.20.2010
| 102416 views |
  • submit to reddit


Caches

A cache allows data to remain closer to an application's tier without constantly polling an RDBMS for the same data. I entitled the section in plural -- caches -- because there can be several caches involved in an application using JPA. This of course doesn't mean you have to configure or use all the caches provided by an application relying on JPA, but properly configuring caches can go a long way toward enhancing an application's JPA performance. 

So lets start by analyzing what it's each JPA implementation offers in its out-of-the-box state in terms of caching. The following table illustrates tests done by simply invoking the previous JPA queries for a second and third consecutive time, without stopping the server. Note that the same process of deploying a single application at once was used, in addition to the server being re-started on each set of tests.   

Query / Implementation 

   EclipseLink  

  Hibernate 

 OpenJPA

All records (1st time)

3215 ms

3558 ms

5998 ms

All records (2nd time)

507 ms 

272 ms

521 ms

All records (3rd time)

439 ms

218 ms

263 ms

First name (1st time)

1265 ms

613 ms

1643 ms

First name (2nd time)

151 ms 

115 ms 

239 ms

First name (3rd time)

154 ms

101 ms

227 ms

Last name (1st time)

986 ms

537 ms

1452 ms

Last name (2nd time)

41 ms

41 ms 

112 ms

Last name (3rd time)

65 ms

38 ms

117 ms

By ID (1st time)

521 ms 

157 ms 

1052 ms

By ID (2nd time)

1 ms

6 ms 

3 ms

By ID (3rd time)

1 ms 

3 ms 

3 ms


As you can observe, on both the second and third invocation all the queries show substantial improvements with respect to the first invocation. The primary cause for these improvements is unequivocally due to the use of a cache. But what type of cache exactly ? Could it be an RDBMS's own caching engine ? JPA ? Spring ? Or some other variation ?. In order to shed some light on cache usage, the following table illustrates the cache statistics generated on each of the previous JPA queries.

Query / Impleme)ntation 

   EclipseLink  

  Hibernate 

 OpenJPA

All records (2nd time)

number of objects=17468,

total time=506,

local time=506,

row fetch=65,

object building=328,

cache=112,

sql execute=47,

objects/second=34521,

sessions opened=2,
sessions closed=2,
connections obtained=2,
statements prepared=2,
statements closed=2,
second level cache puts=0,
second level cache hits=0,
second level cache misses=0,
entities loaded=34936,
queries executed to database=2,
query cache puts=0,
query cache hits=0,
query cache misses=0

N/A

All records (3rd time)

number of objects=17468,

total time=435,

local time=435,

profiling time=1,

row fetch=28,

object building=323,

cache=106,

logging=1,

sql execute=27,

objects/second=40156,



sessions opened=3,
sessions closed=3,
connections obtained=3,
statements prepared=3,
statements closed=3,
second level cache puts=0,
second level cache hits=0,
second level cache misses=0,
entities loaded=52404,
queries executed to database=3,
query cache puts=0,
query cache hits=0,
query cache misses=0

N/A

First name (2nd time)

number of objects=472,

total time=148,

local time=148,

row fetch=27,

object building=106,

cache=7,

logging=1,

sql execute=3,

objects/second=3189,



sessions opened=2,
sessions closed=2,
connections obtained=2,
statements prepared=2,
statements closed=2,
second level cache puts=0,
second level cache hits=0,
second level cache misses=0,
entities loaded=944,
queries executed to database=2,
query cache puts=0,
query cache hits=0,
query cache misses=0

N/A

First name (3rd time)

number of objects=472,

total time=152,

local time=152,

row fetch=20,

object building=121,

cache=7,

sql execute=3,

objects/second=3105,



sessions opened=3,
sessions closed=3
connections obtained=3,
statements prepared=3,
statements closed=3,
second level cache puts=0,
second level cache hits=0,
second level cache misses=0,
entities loaded=1416,
queries executed to database=3,
query cache puts=0,
query cache hits=0,
query cache misses=0

N/A

Last name (2nd time)

number of objects=146,

total time=40,

local time=40,

row fetch=7,

object building=27,

cache=2,

logging=1,

sql execute=3,

objects/second=3650,

sessions opened=2,
sessions closed=2,
connections obtained=2,
statements prepared=2,
statements closed=2,
second level cache puts=0,
second level cache hits=0,
second level cache misses=0,
entities loaded=292,
queries executed to database=2,
query cache puts=0,
query cache hits=0,
query cache misses=0

N/A

Last name (3rd time)

number of objects=146,

total time=63,

local time=63,

profiling time=1,

row fetch=6,

object building=19,

cache=5,

sql prepare=1,

sql execute=23,

objects/second=2317,

sessions opened=3,
sessions closed=3,
connections obtained=3,
statements prepared=3,
statements closed=3,
second level cache puts=0,
second level cache hits=0,
second level cache misses=0,
entities loaded=438
queries executed to database=3,
query cache puts=0,
query cache hits=0,
query cache misses=0

N/A

By ID (2nd time)

number of objects=1,

total time=1,

local time=1,

time/object=1,

objects/second=1000,



sessions opened=2,
sessions closed=2,
connections obtained=2,
statements prepared=2,
statements closed=2,
second level cache puts=0,
second level cache hits=0,
second level cache misses=0,
entities loaded=2,
queries executed to database=0,
query cache puts=0,
query cache hits=0,
query cache misses=0

N/A

By ID (3rd time)

number of objects=1,

total time=1,

local time=1,

time/object=1,

objects/second=1000,



sessions opened=3,
sessions closed=3,
connections obtained=3,
statements prepared=3,
statements closed=3,
second level cache puts=0,
second level cache hits=0,
second level cache misses=0,
entities loaded=3,
queries executed to database=0,
query cache puts=0,
query cache hits=0,
query cache misses=0

N/A


Notice the statistics generated by each JPA implementation are different. EclipseLink reports a single cache statistic, OpenJPA doesn't even report statistics unless a cache is enabled -- see previous section on metrics for details on this behavior -- and Hibernate reports two cache related statistics: second level cache and query cache.

At this juncture, if you look at the test results and statistics for the second and third invocation, something won't add up. How is it that OpenJPA's test results came out faster when caching is disabled by default ? An how about Hibernate returning 0's on its cache related statistics, even when its test results came out faster ? The reason for this performance increase is due to RDBMS caching. On the first query, the RDBMS needs to read data from its own file system (i.e. perform an I/O operation), on subsequent requests the data is present in RDBMS memory (i.e. its cache) making the entire JPA query much faster. A closer look at the Hibernate statistics field 'queries executed to the database' can confirm this. Notice that on every second query it shows 2 and on every third query it shows 3, meaning the data was read directly from the database. NOTE: The only exception to this occurs when a query is made on a single entity (i.e. by id), I will address this shortly.


Next, lets start breaking down the caches you will encounter when using JPA applications. The JPA 2.0 standard defines two types of caches: A first level cache and a second level cache. The first level cache or EntityManager cache is used to properly handle JPA transactions. A first level cache only exist for the duration of the EntityManager. With the exception of long lived operations performed against a RDBMS, JPA EntityManager's are short lived and are created & destroyed per request or per transaction. In this case, given the nature of the queries, first level caches are cleared on every query. A second level cache on the other hand is a broader cache that can be used across transactions and users. This makes a JPA second level cache more powerful, since it can avoid constantly polling an RDBMS for the same data.

But even though the JPA 2.0 standard now addresses second level cache features, this was not the case in JPA 1.0. In the 1.0 version of the JPA standard only a first level cache was addressed, leaving the door completely open on the topic of a second level cache. This created a fragmented approach to caching in JPA implementations, which even now as JPA 2.0 compliant implementations emerge, some non-standard features continue to be part of certain implementations given the value they provide to JPA caching in general. So as I move forward, bear in mind that just like previous JPA topics, each JPA implementation can have its own particular way of dealing with second level caching. 

I will start with OpenJPA, which has the least amount of proprietary caching options. To enable OpenJPA caching (i.e. second level caching) you need to declare the following two properties in an application's persistence.xml file:

<property name="openjpa.DataCache" value="true(EnableStatistics=true)"/>
<property name="openjpa.RemoteCommitProvider" value="sjvm"/>

 

The first property ensures caching and statistics are activated, while the second property is used to indicate caching take place on a single JVM. The following results and statistics were obtained with OpenJPA's second level cache enabled.

Query with OpenJPA caching 

   Time

  Statistics

 Time without statistics

All records (2nd time)

420 ms 

read count=34936,
hit count=17468,
write count=17468,
total read count=34936,
total hit count=17468,
total write count=17468

347 ms

All records (3rd time)

254 ms

read count=52404,
hit count=34936,
write count=17468,
total read count=52404,
total hit count=34936,
total write count=17468

230 ms

First name (2nd time)

125 ms 

read count=944,
hit count=472,
write count=472,
total read count=944,
total hit count=472,
total write count=472

127 ms

First name (3rd time)

114 ms

read count=1416,
hit count=944,
write count=472,
total read count=1416,
total hit count=944,
total write count=472

132 ms

Last name (2nd time)

63 ms

read count=292,
hit count=146,
write count=146,
total read count=292,
total hit count=146,
total write count=146

53 ms

Last name (3rd time)

49 ms

read count=438,
hit count=292,
write count=146,
total read count=438,
total hit count=292,
total write count=146

50 ms

By ID (2nd time)

5 ms

read count=2,
hit count=1,
write count=1,
total read count=2,
total hit count=1,
total write count=1

1 ms

By ID (3rd time)

4 ms 

read count=3,
hit count=2,
write count=1,
total read count=3,
total hit count=2,
total write count=1

1 ms


As these test results illustrate, executing subsequent JPA queries with OpenJPA's second level cache produce superior results. Another important behavior illustrated in some of these test cases is that by simply disabling statistics -- and still using the second level cache -- query times improve even more. The OpenJPA statistics also demonstrate how the cache is being used. Notice that on each subsequent query the statistics field 'hit count' is duplicated, which means data is being read from the cache (i.e. a hit). Also notice the statistics field 'write count' remains static, which means data is only written once from the RDBMS to the cache. 

This is pretty basic functionality for a second level cache. On certain occasions a need may arise to interact directly with a cache. These interactions can range from prohibiting an entity from being cached, assigning a particular amount of memory to a cache, forcing an entity to always be cached, flushing all the data contained in a cache, or even plugging-in a third party caching solution to provide a more robust strategy, among other things.

The JPA 2.0 standard provides a very basic feature set in terms of second level caching through javax.persistence.Cache. Upon consulting this interface, you'll realize it only provides four methods charged with verifying the presence of entities and evicting them. This feature set not only proves to be limited, but also cumbersome since it can only be leveraged programmatically (i.e. through an API). In this sense, and as I've already mentioned, JPA implementations have provided a series of features ranging from persistence.xml properties to Java annotations related to second level caching.

OpenJPA offers several of these second level caching features, including a separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will point you directly to OpenJPA's cache documentation available at http://openjpa.apache.org/builds/apache-openjpa-2.1.0-SNAPSHOT/docs/manual/ref_guide_caching.html#ref_guide_cache_query so you can try these parameters for yourself on the accompanying application source code.

Hibernate just like OpenJPA has its second level cache disabled. To enable Hibernate's second level cache you need to add the following properties to an application's persistence.xml file: 

<property name="hibernate.cache.use_second_level_cache" value="true"/>
<property name="hibernate.cache.provider_class" value="org.hibernate.cache.HashtableCacheProvider"/>

 

Its worth mentioning that Hibernate has integral support for other second level caches. The previous properties displayed how to enable the HashtableCacheProvider cache -- the simplest of the integral second level caches -- but Hibernate also provides support for five additional caches, which include: EHCache, OSCache, SwarmCache, JBoss cache 1 and JBoss cache 2, all of which provide distinct features, albeit require additional configuration.  Besides these properties, Hibernate also requires that each JPA entity be declared with a caching strategy. In this case, since the Person entity is read only, a caching strategy like the following would be used:

<class  name="com.webforefront.jpa.domain.Player">
<cache usage="read-only"/>
</class>


Similar to OpenJPA, Hibernate also offers several second level caching features through proprietary annotations and configurations, as well as support for the separate and supplemental cache called a 'query cache' which can further improve JPA performance. For such cases, I will also point you directly to Hibernate's cache documentation available at http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache so you can try these parameters for yourself on the accompanying application source code.

Unlike OpenJPA and Hibernate, EclipseLink's second level cache is enabled by default, therefore there is no need to provide any additional configuration. However, similar to its counterparts, EclipseLink also has a series of proprietary second level cache features which can enhance JPA performance. You can find more information on these features by consulting EclipseLink's cache documentation available at: http://wiki.eclipse.org/Introduction_to_Cache_(ELUG)

With this we bring our discussion on object relational mapping performance with JPA to a close. I hope you found the various tests and metrics presented here a helpful aid in making decisions about your own JPA applications. In addition, don't forget you can rely on the accompanying source code to try out several JPA variations more ad-hoc to your circumstances. 

About the author

Daniel Rubio is an independent technology consultant specializing in enterprise and web-based software. He blogs regularly on these and other software areas at http://www.webforefront.com. He's also authored and co-authored three books on Java technology.  


Source code/Application installation


* Install MySQL on your workstation (Tested on MySQL 5.1.37-64 bits) - http://dev.mysql.com/downloads/

* Install data set on MySQL - Go to http://www.baseball-databank.org/ and click on the link titled 'Database in MySQL form'. This will download a zipped file with a series of MySQL data structures containing baseball statistics.  First create a MySQL database to host the data using the command: 'mysqladmin -p<password> create jpaperformance'. This will create a database named 'jpaperformance'. Next, load the baseball statistics using the following command: 'mysql -p<password> -D jpaperformance < BDB-sql-2009-11-25.sql' where 'BDB-sql-2009-11.25.sql' represents the unzipped SQL script obtained by extracting the zip file you dowloaded.

* Create JPA application WARs - The download includes source code, library dependencies and an Ant build file. This includes all three JPA implementations Hibernate 3.5.3, EclipseLink 2.1 and OpenJPA 2.1.

  • To build the JPA Hibernate WAR - ant hibernate 
  • To build the JPA EclipseLink WAR - ant eclipselink
  • To build the JPA OpenJPA WAR - ant openjpa
  • All builds are placed under the dist/<jpa_implementation> directories.

* Deploy to Tomcat 6.0.26 - Copy the MySQL Java driver and Spring Tomcat Weaver -- included in the download directory 'tomcat_jar_deps' -- to Apache Tomcat's /lib directory.

   - Copy each JPA application WAR to Apache Tomcat's /webapps directory, as needed. 

* Deployment URL's 


AttachmentSize
Table1.png162.75 KB
JPA_Performance.zip45.31 MB
Published at DZone with permission of its author, Daniel Rubio.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jörg Buchberger replied on Wed, 2010/07/21 - 4:10am

Thanks for the detailed writeup! Will download your setup, in order to put Data Nucleus into equation as well, as soon as I have some spare time. Perhaps, you are faster ;-)

Cheers

Slava Lo replied on Wed, 2010/07/21 - 5:44am

One thing to keep in mind when introducing performance monitoring probes is to minimize the impact of the probe itself.

In current implementation of aspects, the aspect itself may introduce a delay buy the virtue of using string concatination instead of StringBuilder.

Matthew Adams replied on Wed, 2010/07/21 - 9:57am in response to: Jörg Buchberger

Disclosure:  I've been a member of the JDO & JPA expert groups.

Yes, once again, DataNucleus gets left out.  Please add this great implementation to your study.  After all, there's a reason that Google chose DataNucleus for its AppEngine for Java.

I'd also be interested to see the code migrated to have a JDO version as well, so as to compare DataNucleus/JDO against, say, Kodo and other JDO implementations.

-matthew

Gérald Quintana replied on Wed, 2010/07/21 - 10:05am

About Hibernate weaving, I don't expect a huge difference, but you could try

  • Buildtime Bytecode instrumentation (see Ant Task inhttp://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html) ?
  • Javassist vs CGLib Bytecode provider: hibernate.bytecode.provider=cglib|javassist
  • Replacing Reflection by Bytecode Generation: hibernate.bytecode.use_reflection_optimizer=true|false

It would be interesting, to compare relation management: lazy loading, join fetching, batch fetching, etc.

 

christiaan_se replied on Wed, 2010/07/21 - 11:05am

Nice article! Great to see that you took the time to actually do the comparison. May be this could be the start of a more offical JPA performance benchmark, similar to what we had when many JDO implementations were available (http://www.torpedo-group.org or http://www.polepos.org/). Performance is one of the major issues different implementations of the same spec can compete on.

Btw, since you are measuring the performance of an ORM I think the most interesting performance is the overall time to go from a JPA query to the actual translation from table record to object, not the time it takes to execute the underlying generated sql query (which as you noted is most likely the same for different implementations anyway). What is the overhead the JPA implementation adds to this process (parsing query, mapping meta data, filling object fields, etc.)? Moreover, what is actually being done with the query? Are all objects completely instantiated with all their fields loaded? Probably at least 2 scenarios (no fields loaded and all fields loaded) you want to know and compare.

Catalin Strimbei replied on Thu, 2010/07/22 - 8:57am

JPA is a mature and well-founded solution. But, as I see also in the introduction of this paper, the confusion between objects and tables still "persist". The relational tables are not objects equivalents. IMHO tables could be assimilated with object collections at most ... This confusion maybe is caused by the fact that when a new table is created in a RDMS space the two effects appears: an new ROW type is created and also new storage location (logically storage location, but there are even physical locations asociated). The modern RDMSs reveal the fact of ROW type existence because they provide support to define types (or even ROW type) and then create tables on these types, and reverse: one can define a variable (as in PL/SQL) borowing the ROW type of an existing table ... So I think is time to stop the confusion between objects and tables ....

Raphael Miranda replied on Sun, 2010/07/25 - 6:49pm

java.dzone.comjava.dzone.com/articlesjava.dzone.com/articles/jpa-performance-optimization

Object orientated design isn't going away but RDBMS are at an ever increasing pace.

People are 20 years too late to realise RDBMS aren't the only way to persist data. 

 

 

 

Pascal Thivent replied on Sat, 2010/08/28 - 3:44pm

I find http://polepos.org/ to be a totally useless benchmark. Sure, it's open, but it's not independent at all (created by the db4o guys to, well, guess what?) and some persistence providers are misconfigured and their API misused (check the Hibernate part), making them under optimized. I don't want to think it's intentional, but it's not serious. So at the end, I find it totally biased and there is nothing to conclude from this bench, even if your application is "similar" to the polepos database.

Doug Clarke replied on Thu, 2010/07/29 - 11:58am

Object-Relational benchmarks are very challenging and are typically biased in one way or another. I help lead the EclipseLink project and Oracle TopLink and struggle to publish our internal performance comparisons where we see superior results against leading competitors.

We need to review this becnhmark (thanks for the source) and compare its configuration and approach to measurement with our internal performance efforts to identify why we get such different results. We'll publish what we find or suggestions to improve this comparison back to this thread.

 Doug Clarke

EclipseLink Project co-lead

chris colman replied on Sat, 2010/08/07 - 3:52pm

I was at first excited by your article but got dissappointing as I read further: You introduced your article by expounding the accuracy of the term 'impedance mismatch' but then proceeded to build a performance test with a completely benign example with a trivial mapping of 'Player' class to your existing 'Master' table. It's hard to demonstrate any impedance mismatch with such a direct mapping. The mismatches come when you introduce inheritance and polymorhism - two essential concepts in OO which RDBMSes are completely oblivious to.

If you mention impedance mismatch then one would assume your performance test would involve at least a non trivial domain modal with some depth in the object graph. After all it's the performance of persistence frameworks (ORMs or otherwise) in 'real life' object oriented applications that we're most interested in. Any old ORM can generate the simple SQL to find object/rows via a column value. The real interesting stuff is how they perform when there are sophisticated object graphs with inheritance and polymorphism and lots of navigation along object relationships - all the concepts that are at play in the software implementations of the 'real life' scenarios of modern, object oriented applications

I'm also dissappointed that you left out DataNucleus. It's perfectly valid and popular JPA/ORM implementation which takes a completely different approach to what goes on underneath the hood which has some real efficiency benefits when it comes to testing for 'object dirtiness' at transaction commit time and very fast 'persistence by reachability - things that are extremely important when you have domain models that are not as trivial as your example.

The old Pole Position performance benchmarks were great because they had a broad spectrum of tests from trivial single object, direct table mappings like your example to more complex, deep object graphs that typically appear in real world object oriented applications. It's when you get into these real world examples that the performance of the different persistence frameworks really start to disperse and where things start to get really exciting.

Warning: if you're committed to RDBMS as the datastore underlying your peristence framework (JPA/JDO) for the next few years (which I probably am) then don't look at the results of performance tests on real world object graph examples for any of the object oriented databases ... it's makes you want to cry, they are just so damn fast in comparison because they don't suffer the impedance mismatch which majorly impedes performance. At the moment we can get by on the slower RDBMS performance but we're using DataNucleus JDO so that as demand for performance increases we can switch to a OODBMS at some point in the future by simply switching JDO's underlying datastore from RDBMS to an OODBMS.

Andy Leung replied on Tue, 2010/08/17 - 10:10am in response to: chris colman

I cannot agree more with Chris!!!

Look at that DataNucleus used in GAE, man they are just super fast and powerful that you CANNOT query with sub-query and CANNOT run joint query on complex data model without hardwired simulation of creating another dummy object. They are super fast that one CANNOT even query 100k objects in pagination manner without using the failed fake pagination cheat that doesn't even work!!! Have you seen business applications that showing business records in pages force the users to go through each page by clicking "Next" like GMail???

Honestly, Object Graph and RDBMS are two different things and are for different people's needs. Direct comparison of pure speed of simple data retrieve is meaningless in this reality. I suggest you to go back and study the problem domain before making such a criticism.

Kevin Sutter replied on Wed, 2010/11/10 - 3:56pm

Daniel, Thanks for the good, informative article. It looks like you attempted to be un-biased in your evaluation of the various technologies. You focused on one area that contributes to performance -- caching. But, one other area that we have found that greatly contributes to better performance is connection pooling. I believe both EclipseLink and Hibernate ship with enabled connection pooling. Until recently, OpenJPA did not provide that connection pooling function "out of the box".

As of svn revision #1023114 of Apache OpenJPA trunk (openjpa-2.1.0-SNAPSHOT), OpenJPA now ships with Apache DBCP and pre-configures its use in a JSE, non-managed environment. In a managed environment, it is assumed that connection pooling will be provided via the application server.

It looks like you might be dependent on Spring to provide the datasource connection management (from context.xml). I'm not sure if Spring provides connection pooling in this case or not. I am also not sure if the use of an external connection pooling mechanism would apply in this case. Just something to be aware of as you continue this benchmarking exercise.

More information on OpenJPA's use of DBCP for connection pooling can be found here: http://openjpa.apache.org/builds/latest/docs/manual/manual.html#ref_guide_dbsetup_auto

Thanks, Kevin Sutter Apache OpenJPA

Andy Jefferson replied on Tue, 2010/11/30 - 3:00pm in response to: Pascal Thivent

I don't find PolePos to be a "totally useless" benchmark. The object graph models presented in it are representative of real-life situations and valid for comparison purposes. When we ran JPOX on it some time back we noticed some inconsistencies in its handling for the different persistence solutions and contributed them back to the polepos "group", and yes, we tuned the JPOX config to run on it. How you configure a persistence solution to run on them is up to the individual. Just because the people involved maybe aren't as experienced as you with a tool doesn't make the benchmark "useless". The benchmark is the object graph models plus the persistence code; and that is basically sound IMHO.

Andy Jefferson replied on Wed, 2010/12/01 - 3:36am in response to: Andy Leung

Please get your facts right about DataNucleus. GAE/J uses a plugin written by Google, and it is this that makes impositions on the user in terms of what can and can't be queried. This post is about persistence to RDBMS, and for that you would use DataNucleus own provided plugins, not third party.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.