Performance Zone is brought to you in partnership with:

Daniel has posted 3 posts at DZone. View Full User Profile

Optimizing JPA Performance: An EclipseLink, Hibernate, and OpenJPA Comparison

07.20.2010
| 97083 views |
  • submit to reddit

'Impedance mismatch'. No two words encompass the troubles, headaches and quirks most developers face when attempting to link applications to relational databases (RDBMS). But lets face it, object orientated designs aren't going away anytime soon from mainstream languages and neither are the relational storage systems used in most applications. One side works with objects, while the other with tables. Resolving these differences  -- or as its technically referred to 'object/relational impedance mismatch' -- can result in substantial overhead, which in turn can materialize into poor application performance.

In Java, the Java Persistence API (JPA) is one of the most popular mechanisms used to bridge the gap between objects (i.e. the Java language) and tables (i.e. relational databases). Though there are other mechanisms that allow Java applications to interact with relational databases -- such as JDBC and JDO -- JPA has gained wider adoption due to its underpinnings: Object Relational Mapping (ORM). ORM's gain in popularity is due precisely to it being specifically designed to address the interaction between object and tables.

In the case of JPA, there is a standard body charged with setting its course, a process which has given way to several JPA implementations, among the three most popular you will find: EclipseLink (evolved from TopLink), Hibernate and OpenJPA. But even though all three are based on the same standard, ORM being such a deep and complex topic, beyond core functionality each implementation has differences ranging from configuration to optimization techniques.

What I will do next is explain a series of topics related to optimizing an application's use of the JPA, using and comparing each of the previous JPA implementations. While JPA is capable of automatically creating relational tables and can work with a series of relational database vendors, I will part from having pre-existing data deployed on a MySQL relational database, in addition to relying on the Spring framework to facilitate the use of the JPA. This will not only make it a fairer comparison, but also make the described techniques appealing to a wider audience, since performance issues become a serious concern once you have a large volume of data, in addition to MySQL and Spring being a common choice due to their community driven (i.e. open-source) roots. See the source code/application section at the end for instructions on setting up the application code discussed in the remainder of the sections.

Download the Source Code associated with this article (~45 MB)

 

The basics: Metrics 

In order to establish JPA performance levels in an application, it's vital to first obtain a series of metrics related to a JPA implementation's inner workings. These include things like:

  • What are the actual queries being performed against a RDBMS?
  • How long does each query take?
  • Are queries being performed constantly against the RDBMS or is a cache being used?

These metrics will be critical to our performance analysis, since they will shed light on the underlying operations performed by a JPA implementation and in the process show the effectiveness or ineffectiveness of certain techniques. 

In this area you will find the first differences among implementations, and I'm not talking about metric results, but actually how to obtain these metrics. To kick things off, I will first address the topic of logging. By default, all three JPA implementations discussed here -- EclipseLink, Hibernate and OpenJPA -- log the query performed against a RDBMS, which will be an advantage in determining if the queries performed by an ORM are optimal for a particular relational data model. Nevertheless, tweaking the logging level of a JPA implementation further can be helpful for one of two things: Getting even more details from the underlying operations made by a JPA -- which can be turned off by default (e.g.  database connection details) -- or getting no logging information at all -- which can benefit a production system's performance. 

Logging in JPA implementations is managed through one of several logging frameworks, such as Apache Commons Logging or Log4J. This requires the presence of such libraries in an application. Logging configuration of a JPA implementation is mostly done through a <property> value in an application's persistence.xml file or in some cases, directly in a logging framework's configuration files. The following table describes JPA logging configuration parameters:  

 

(CLICK HERE FOR LARGER IMAGE)


In addition to the information obtained through logging, there is another set of JPA performance metrics which require different steps to be obtained. One of these metrics is the time it takes to perform a query. Even though some JPA implementations provide this information using certain configurations, some do not. Even so, I opted to use a separate approach and apply it to all three JPA implementations in question. After all, time metrics measured in milliseconds can be skewed in certain ways depending on start and end time criteria. So to measure query times, I will use Aspects with the aid of the Spring framework.

Aspects will allow us to measure the time it takes a method containing a query to be executed, without mixing the timing logic with the actual query logic -- the last feature of which is the whole purpose of using Aspects. Further discussing Aspects would go beyond the scope of performance, so next I will concentrate on the Aspect itself. I advise you to look over the accompanying source code, Aspects and Spring Aspects for more details on these topics and their configuration. The following Aspect is used for measuring execution times in query methods.

package com.webforefront.aop;
import org.apache.commons.lang.time.StopWatch;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Pointcut;
import org.aspectj.lang.annotation.Aspect;

@Aspect
public class DAOInterceptor {
private Log log = LogFactory.getLog(DAOInterceptor.class);

@Around("execution(* com.webforefront.jpa.service..*.*(..))")
public Object logQueryTimes(ProceedingJoinPoint pjp) throws Throwable {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
Object retVal = pjp.proceed();
stopWatch.stop();
String str = pjp.getTarget().toString();
log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": " + stopWatch.getTime() + "ms");
return retVal;
}
}

 
The main part of the Aspect is the @Around annotation. The value assigned to this last annotation indicates to execute the aspect method -- logQueryTimes -- each time a method belonging to a class in the com.webforefront.jpa.service package is executed -- this last package is where all our application's JPA query methods will reside. The logic performed by the logQueryTimes aspect method is tasked with calculating the execution time and outputting it as logging information using Apache Commons Logging.

Another set of important JPA metrics is related to statistics beyond those provided by standard logging. The statistics I'm referring to are things related to caches, sessions and transactions. Since the JPA standard doesn't dictate any particular approach to statistics, each JPA implementation also varies in the type and way it collects statistics. Both Hibernate and OpenJPA have their own statistics class, where as EclipseLink relies on a Profiler to gather similar metrics.


Since I'm already relying on Aspects, I will also use an Aspect to obtain statistics both prior and after the execution of a JPA query method. The following Aspect obtains statistics for an application relying  on Hibernate. 

package com.webforefront.aop;
import org.hibernate.stat.Statistics;
import org.hibernate.SessionFactory;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.beans.factory.annotation.Autowired;
import javax.persistence.EntityManagerFactory;
import org.hibernate.ejb.HibernateEntityManagerFactory;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;


@Aspect
public class CacheHibernateInterceptor {
private Log log = LogFactory.getLog(DAOInterceptor.class);

@Autowired
private EntityManagerFactory entityManagerFactory;

@Around("execution(* com.webforefront.jpa.service..*.*(..))")
public Object log(ProceedingJoinPoint pjp) throws Throwable {
HibernateEntityManagerFactory hbmanagerfactory = (HibernateEntityManagerFactory) entityManagerFactory;
SessionFactory sessionFactory = hbmanagerfactory.getSessionFactory();
Statistics statistics = sessionFactory.getStatistics();
String str = pjp.getTarget().toString();
statistics.setStatisticsEnabled(true);
log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) " + statistics);

Object result = pjp.proceed();
log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) " + statistics);
return result;
}

}

 

Notice the similar structure to the prior timing Aspect, except in this case the logging output contains values that belong to the Statistics Hibernate class obtained via the application's EntityManagerFactory. The next Aspect is used to obtain statistics for an application relying on OpenJPA.

package com.webforefront.aop;
import org.apache.openjpa.datacache.CacheStatistics;
import org.apache.openjpa.persistence.OpenJPAEntityManagerFactory;
import org.apache.openjpa.persistence.OpenJPAPersistence;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.beans.factory.annotation.Autowired;
import javax.persistence.EntityManagerFactory;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

@Aspect
public class CacheOpenJPAInterceptor {
private Log log = LogFactory.getLog(DAOInterceptor.class);

@Autowired
private EntityManagerFactory entityManagerFactory;

@Around("execution(* com.webforefront.jpa.service..*.*(..))")
public Object log(ProceedingJoinPoint pjp) throws Throwable {
OpenJPAEntityManagerFactory ojpamanagerfactory = OpenJPAPersistence.cast(entityManagerFactory);
CacheStatistics statistics = ojpamanagerfactory.getStoreCache().getStatistics();
String str = pjp.getTarget().toString();

log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (Before call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount());

Object result = pjp.proceed();

log.info(str.substring(str.lastIndexOf(".")+1, str.lastIndexOf("@")) + " - " + pjp.getSignature().getName() + ": (After call) Statistics [start time=" + statistics.start() + ",read count=" + statistics.getReadCount() + ",hit count=" + statistics.getHitCount() +",write count=" + statistics.getWriteCount() + ",total read count=" + statistics.getTotalReadCount() + ",total hit count=" + statistics.getTotalHitCount() +",total write count=" + statistics.getTotalWriteCount());

return result;
}
}

Once again, notice the similar Aspect structure to the previous Aspect which relies on an application's EntityManagerFactory. In this case, the logging output contains values that belong to the CacheStatistics OpenJPA class. Since OpenJPA does not enable statistics by default, you will need to add the following two properties to an application's persistence.xml file: 

<property name="openjpa.DataCache" value="true(EnableStatistics=true)"/>
<property name="openjpa.RemoteCommitProvider" value="sjvm"/>

 

The first property ensures statistics are gathered, while the second property is used to indicate the gathering of statistics take place on a single JVM. NOTE: The value "true(EnableStatistics=true)" also enables caching in addition to statistics. 

Since EclipseLink doesn't have any particular statistics class and relies on a Profiler to determine advanced metrics, the simplest way to obtain similar statistics to those of Hibernate and OpenJPA is through the Profiler itself. To active EclipseLink's Profiler you just need to add the following property to an application's persistence.xml file: <property name="eclipselink.profiler" value="PerformanceProfiler"/>. By doing so, the EclipseLink Profiler output's several metrics on each JPA query method execution as logging information.


Now that you know how to obtain several metrics from all three JPA implementations and understand they will be obtained as fairly as possible for all three providers, it's time to put each JPA implementation to the test along with several performance techniques.  

JPQL queries, weaving and class transformations

Lets start by making a query that retrieves data belonging to a pre-existing RDBMS table named "Master". The "Master" table contains over 17,000 records belonging to baseball players. To simplify matters, I will create a Java class named "Player" and map it to the "Master" table in order to retrieve the records as objects. Next, relying on the Spring framework's JpaTemplate functionality, I will setup a query to retrieve all "Player" objects, with the query taking the following form:

getJpaTemplate().find("select e from Player e"); 

See the accompanying source code for more details on this last process.

Next, I deploy the application using each of the three JPA implementations on Apache Tomcat, doing so separately, as well as starting and stopping the server on each deployment to ensure fair results. These are the results of doing so on a 64-bit Ubuntu-4GB RAM box, using Java 1.6:

 All player objects - 17,468 records 

Time

Query

Hibernate

 3558 ms

select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_

EclipseLink (Run-time weaver - Spring ReflectiveLoadTimeWeaver weaver  )

3215 ms

SELECT lahmanID, nameLast, nameFirst FROM Master

EclipseLink (Build-time weaving)

3571 ms

SELECT lahmanID, nameLast, nameFirst FROM Master

EclipseLink (No weaving)

3996 ms

SELECT lahmanID, nameLast, nameFirst FROM Master

OpenJPA  (Build-time enhanced classes)

5998 ms

SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 

OpenJPA  (Run-time enhanced classes- OpenJPA enhancer)

6136 ms

SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 

OpenJPA  (Non enhanced classes)

7677 ms

SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 


As you can observe, the queries performed by each JPA implementation are fairly similar, with two of them using a shortcut notation (e.g. t0 and player0 for the table named 'Master'). This syntax variation though has minimal impact on performance, since <i>directly</i> querying an RDBMS using any of these notation variations shows identical results. However, the query times made through several JPA  implementations using distinct parameters  vary considerably. One important factor leading to this time difference is due to how each implementation handles JPA entities. 

Lets start with the OpenJPA implementation which had the poorest times. OpenJPA can execute an enhancement process on Java entities (e.g. in this case the 'Player' class). This enhancement process can be performed when the entities are built, at run-time or foregone altogether. As you can observe, foregoing entity enhancement altogether in OpenJPA produced the longest query times. Where as enhancing entities at either build-time or run-time produced relatively better results, with the former beating out the latter. By default, OpenJPA expects entities to be enhanced. This means you will either need to explicitly configure an application to support unenhanced classes by adding the following:

<property name="openjpa.RuntimeUnenhancedClasses" value="supported"/> 

...property to an application's persistence.xml file or enhance classes at build-time or at run-time relying on the OpenJPA enhancer, otherwise an application relying on OpenJPA will throw an error.


Given these OpenJPA results, the remaining OpenJPA tests will be based on build-time enhanced entity classes. For more on the topic of OpenJPA enhancement, refer to the OpenJPA documentation in addition to consulting the accompanying source code for this article.  

You may be wondering what exactly constitutes OpenJPA enhancement ? OpenJPA entity enhancement is a processing step applied to the bytecode generated by the Java compiler which adds JPA specific instructions to provide optimal runtime performance, these instructions can include things like flexible lazy loading and dirty read tracking.  So why doesn't Hibernate or EclipseLink enhance entities ?  In short, Hibernate and EclipseLink also enhance JPA entites, they just don't outright call it 'enhancement'.


EclipseLink calls this 'enhancement' process by the more technical term: weaving. Similar to OpenJPA's enhancement process, weaving in EclipseLink can take place at either build-time (a.k.a. static weaving), run-time or forgone altogether. As you can observe in the results, all of EclipseLink's tests present smaller variations compared to OpenJPA. The longest EclipseLink variation involved not using weaving. If you think about it, this is rather logical given that the purpose of weaving consists of altering Java byte code for the purpose of adding optimized JPA instructions that include lazy loading, change tracking, fetch groups and internal optimizations.


For the EclipseLink tests using weaving, both build-time and run-time weaving present better results. For build-time weaving, I used EclipseLink's library along with an Apache Ant task, where as for run-time weaving, I used the Spring framework's ReflectiveLoadTimeWeaver. I can only assume the slightly better performance of using run-time weaving over build-time weaving in EclipseLink was due to the fact of using a weaver integrated with the Spring framework, which in turn could result in better JPA optimizations designed for Spring applications. Nevertheless, considering the test result of forgoing weaving altogether, weaving does not appear to be a major performance impact when using EclipseLink, ceteris paribus.  

By default, EclipseLink expects run-time weaving to be enabled, otherwise you will receive an error in the form 'Cannot apply class transformer without LoadTimeWeaver specified'. This means that for cases using build-time weaving or no weaving at all, you will need to explicitly indicate this behavior. In order to disable EclipseLink weaving you will need to either configure an application's EntityManagerFactory Spring bean with:

<property name="jpaPropertyMap">
<map>
<entry key="eclipselink.weaving" value="false"/>
</map>
</property>

... or add the ....

<property name="eclipselink.weaving" value="false"/> 

...property to an application's persistence.xml file. To indicate an application's entities are built using build-time weaving, substitute the previous property's "false" value with "static". To configure the default run-time weaver expected by EclipseLink, add the following:

<property name="loadTimeWeaver">
<bean class="org.springframework.instrument.classloading.ReflectiveLoadTimeWeaver"/>
</property>

 ...property to an application's EntityManagerFactory Spring bean.


Given these EclipseLink results, the remaining EclipseLink tests will be based on run-time weaving provided by the Spring framework. For more on the topic of EclipseLink weaving, refer to the EclipseLink documentation at http://wiki.eclipse.org/Introduction_to_EclipseLink_Application_Development_(ELUG)#Using_Weaving, in addition to consulting the accompanying source code for this article. 


Hibernate doesn't require neither enhancing JPA entities or weaving. For this reason, there is only one test result. This not only makes Hibernate simpler to setup, but judging by its only test result -- which clock's in at second place with respect to all other tests -- Hibernate's performance ranks high compared to its counterparts. However, in what I would consider Hibernate's equivalent to OpenJPA's enhancement process or EclipseLink's weaving, you will find a series of Hibernate properties. For example, Hibernate has properties such as 

hibernate.default_batch_fetch_size designed to optimize lazy loading. As you might recall, among the purposes of both OpenJPA's enhancement process and EclipseLink's weaving are the optimization of lazy loading. So where as OpenJPA and EclipseLink require a separate and monolithic step -- at build-time or run-time -- to achieve JPA optimization techniques, Hibernate falls back to the use of granular properties specified in an application's persistence.xml file. Nevertheless, given that Hibernate's default behavior proved to be on par with the best query times, I didn't feel a need to further explore with these Hibernate properties. 

To get another sense of the times and mapping procedures of each JPA implementation, I will make more selective queries based on a Player object's first name and last name. These are the results of performing a query for all Player objects whose first name is John and a query for all Player objects whose last name in Smith.

All player objects whose first name is John - 472 records 

Time

Query

EclipseLink

 1265 ms

SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameFirst = ?)

Hibernate

 613 ms

select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameFirst=?

OpenJPA

1643 ms

SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameFirst = ?) [params=?]

All player objects whose last name is Smith - 146 records 

Time

Query

EclipseLink

 986 ms

SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (nameLastt = ?)

Hibernate

 537 ms

select player0_.lahmanID as lahmanID0_, player0_.nameFirst as nameFirst0_, player0_.nameLast as nameLast0_ from Master player0_ where player0_.nameLast=?

OpenJPA

1452 ms

SELECT t0.lahmanID, t0.nameFirst, t0.nameLast FROM Master t0 WHERE (t0.nameLast = ?) [params=?]


These test results tell a slightly different story,with all three JPA implementations presenting substantial time differences amongst one another. At a lower record count, Hibernate's out-of-the-box configuration resulted in almost twice as fast queries as its closest competitor and almost three times faster queries than its other competitor.

To get an even broader sense of the times and mapping procedures of each JPA implementation, I will make a query on a single Player object based on its id. These are the results of performing such a query.

Single player object whose ID is 777- 1 record  

Time

Query

EclipseLink

 521 ms

SELECT lahmanID, nameLast, nameFirst FROM Master WHERE (lahmanID = ?)

Hibernate

 157 ms

select player0_.lahmanID as lahmanID0_0_, player0_.nameFirst as nameFirst0_0_, player0_.nameLast as nameLast0_0_ from Master player0_ where player0_.lahmanID=?

OpenJPA

1052 ms

SELECT t0.nameFirst, t0.nameLast FROM Master t0 WHERE t0.lahmanID = ? [params=?]


With the exception of the faster query times -- due to it being a query for a single Player object -- the times between JPA implementations are practically in proportion to the queries used for extracting multiple Player objects by first and last name. 

This will do it as far as test queries are concerned. However, a word of caution is in order when discussing these topics on optimization/enhancement/weaving. Even though the previous tests consisted of querying over 17,000 records and confirm clear advantages of using one provider and technique over another, they are still one dimensional, since they're based on read operations performed on a single object type and a single RDBMS table. JPA can perform a large array of operations that also include updating, writing and deleting RDBMS records, not to mention the execution of more elaborate queries that can span multiple objects and tables. In addition, RDBMS themselves can have influencing factors (e.g. indexes) over JPA query times.  So all this said, it's not too far fetched to think the use of OpenJPA entity enhancement, EclipseLink weaving or Hibernate properties, could have varying degrees -- either beneficial or detrimental -- depending on the queries (i.e. multi-table, multi-object) and type of JPA operation (i.e. read, write, update, delete) involved.

Next, I will describe one of the most popular techniques used to boost performance in JPA applications.   

AttachmentSize
Table1.png162.75 KB
JPA_Performance.zip45.31 MB
Published at DZone with permission of its author, Daniel Rubio.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jörg Buchberger replied on Wed, 2010/07/21 - 4:10am

Thanks for the detailed writeup! Will download your setup, in order to put Data Nucleus into equation as well, as soon as I have some spare time. Perhaps, you are faster ;-)

Cheers

Slava Lo replied on Wed, 2010/07/21 - 5:44am

One thing to keep in mind when introducing performance monitoring probes is to minimize the impact of the probe itself.

In current implementation of aspects, the aspect itself may introduce a delay buy the virtue of using string concatination instead of StringBuilder.

Matthew Adams replied on Wed, 2010/07/21 - 9:57am in response to: Jörg Buchberger

Disclosure:  I've been a member of the JDO & JPA expert groups.

Yes, once again, DataNucleus gets left out.  Please add this great implementation to your study.  After all, there's a reason that Google chose DataNucleus for its AppEngine for Java.

I'd also be interested to see the code migrated to have a JDO version as well, so as to compare DataNucleus/JDO against, say, Kodo and other JDO implementations.

-matthew

Gérald Quintana replied on Wed, 2010/07/21 - 10:05am

About Hibernate weaving, I don't expect a huge difference, but you could try

  • Buildtime Bytecode instrumentation (see Ant Task inhttp://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html) ?
  • Javassist vs CGLib Bytecode provider: hibernate.bytecode.provider=cglib|javassist
  • Replacing Reflection by Bytecode Generation: hibernate.bytecode.use_reflection_optimizer=true|false

It would be interesting, to compare relation management: lazy loading, join fetching, batch fetching, etc.

 

christiaan_se replied on Wed, 2010/07/21 - 11:05am

Nice article! Great to see that you took the time to actually do the comparison. May be this could be the start of a more offical JPA performance benchmark, similar to what we had when many JDO implementations were available (http://www.torpedo-group.org or http://www.polepos.org/). Performance is one of the major issues different implementations of the same spec can compete on.

Btw, since you are measuring the performance of an ORM I think the most interesting performance is the overall time to go from a JPA query to the actual translation from table record to object, not the time it takes to execute the underlying generated sql query (which as you noted is most likely the same for different implementations anyway). What is the overhead the JPA implementation adds to this process (parsing query, mapping meta data, filling object fields, etc.)? Moreover, what is actually being done with the query? Are all objects completely instantiated with all their fields loaded? Probably at least 2 scenarios (no fields loaded and all fields loaded) you want to know and compare.

Catalin Strimbei replied on Thu, 2010/07/22 - 8:57am

JPA is a mature and well-founded solution. But, as I see also in the introduction of this paper, the confusion between objects and tables still "persist". The relational tables are not objects equivalents. IMHO tables could be assimilated with object collections at most ... This confusion maybe is caused by the fact that when a new table is created in a RDMS space the two effects appears: an new ROW type is created and also new storage location (logically storage location, but there are even physical locations asociated). The modern RDMSs reveal the fact of ROW type existence because they provide support to define types (or even ROW type) and then create tables on these types, and reverse: one can define a variable (as in PL/SQL) borowing the ROW type of an existing table ... So I think is time to stop the confusion between objects and tables ....

Raphael Miranda replied on Sun, 2010/07/25 - 6:49pm

java.dzone.comjava.dzone.com/articlesjava.dzone.com/articles/jpa-performance-optimization

Object orientated design isn't going away but RDBMS are at an ever increasing pace.

People are 20 years too late to realise RDBMS aren't the only way to persist data. 

 

 

 

Pascal Thivent replied on Sat, 2010/08/28 - 3:44pm

I find http://polepos.org/ to be a totally useless benchmark. Sure, it's open, but it's not independent at all (created by the db4o guys to, well, guess what?) and some persistence providers are misconfigured and their API misused (check the Hibernate part), making them under optimized. I don't want to think it's intentional, but it's not serious. So at the end, I find it totally biased and there is nothing to conclude from this bench, even if your application is "similar" to the polepos database.

Doug Clarke replied on Thu, 2010/07/29 - 11:58am

Object-Relational benchmarks are very challenging and are typically biased in one way or another. I help lead the EclipseLink project and Oracle TopLink and struggle to publish our internal performance comparisons where we see superior results against leading competitors.

We need to review this becnhmark (thanks for the source) and compare its configuration and approach to measurement with our internal performance efforts to identify why we get such different results. We'll publish what we find or suggestions to improve this comparison back to this thread.

 Doug Clarke

EclipseLink Project co-lead

chris colman replied on Sat, 2010/08/07 - 3:52pm

I was at first excited by your article but got dissappointing as I read further: You introduced your article by expounding the accuracy of the term 'impedance mismatch' but then proceeded to build a performance test with a completely benign example with a trivial mapping of 'Player' class to your existing 'Master' table. It's hard to demonstrate any impedance mismatch with such a direct mapping. The mismatches come when you introduce inheritance and polymorhism - two essential concepts in OO which RDBMSes are completely oblivious to.

If you mention impedance mismatch then one would assume your performance test would involve at least a non trivial domain modal with some depth in the object graph. After all it's the performance of persistence frameworks (ORMs or otherwise) in 'real life' object oriented applications that we're most interested in. Any old ORM can generate the simple SQL to find object/rows via a column value. The real interesting stuff is how they perform when there are sophisticated object graphs with inheritance and polymorphism and lots of navigation along object relationships - all the concepts that are at play in the software implementations of the 'real life' scenarios of modern, object oriented applications

I'm also dissappointed that you left out DataNucleus. It's perfectly valid and popular JPA/ORM implementation which takes a completely different approach to what goes on underneath the hood which has some real efficiency benefits when it comes to testing for 'object dirtiness' at transaction commit time and very fast 'persistence by reachability - things that are extremely important when you have domain models that are not as trivial as your example.

The old Pole Position performance benchmarks were great because they had a broad spectrum of tests from trivial single object, direct table mappings like your example to more complex, deep object graphs that typically appear in real world object oriented applications. It's when you get into these real world examples that the performance of the different persistence frameworks really start to disperse and where things start to get really exciting.

Warning: if you're committed to RDBMS as the datastore underlying your peristence framework (JPA/JDO) for the next few years (which I probably am) then don't look at the results of performance tests on real world object graph examples for any of the object oriented databases ... it's makes you want to cry, they are just so damn fast in comparison because they don't suffer the impedance mismatch which majorly impedes performance. At the moment we can get by on the slower RDBMS performance but we're using DataNucleus JDO so that as demand for performance increases we can switch to a OODBMS at some point in the future by simply switching JDO's underlying datastore from RDBMS to an OODBMS.

Andy Leung replied on Tue, 2010/08/17 - 10:10am in response to: chris colman

I cannot agree more with Chris!!!

Look at that DataNucleus used in GAE, man they are just super fast and powerful that you CANNOT query with sub-query and CANNOT run joint query on complex data model without hardwired simulation of creating another dummy object. They are super fast that one CANNOT even query 100k objects in pagination manner without using the failed fake pagination cheat that doesn't even work!!! Have you seen business applications that showing business records in pages force the users to go through each page by clicking "Next" like GMail???

Honestly, Object Graph and RDBMS are two different things and are for different people's needs. Direct comparison of pure speed of simple data retrieve is meaningless in this reality. I suggest you to go back and study the problem domain before making such a criticism.

Kevin Sutter replied on Wed, 2010/11/10 - 3:56pm

Daniel, Thanks for the good, informative article. It looks like you attempted to be un-biased in your evaluation of the various technologies. You focused on one area that contributes to performance -- caching. But, one other area that we have found that greatly contributes to better performance is connection pooling. I believe both EclipseLink and Hibernate ship with enabled connection pooling. Until recently, OpenJPA did not provide that connection pooling function "out of the box".

As of svn revision #1023114 of Apache OpenJPA trunk (openjpa-2.1.0-SNAPSHOT), OpenJPA now ships with Apache DBCP and pre-configures its use in a JSE, non-managed environment. In a managed environment, it is assumed that connection pooling will be provided via the application server.

It looks like you might be dependent on Spring to provide the datasource connection management (from context.xml). I'm not sure if Spring provides connection pooling in this case or not. I am also not sure if the use of an external connection pooling mechanism would apply in this case. Just something to be aware of as you continue this benchmarking exercise.

More information on OpenJPA's use of DBCP for connection pooling can be found here: http://openjpa.apache.org/builds/latest/docs/manual/manual.html#ref_guide_dbsetup_auto

Thanks, Kevin Sutter Apache OpenJPA

Andy Jefferson replied on Tue, 2010/11/30 - 3:00pm in response to: Pascal Thivent

I don't find PolePos to be a "totally useless" benchmark. The object graph models presented in it are representative of real-life situations and valid for comparison purposes. When we ran JPOX on it some time back we noticed some inconsistencies in its handling for the different persistence solutions and contributed them back to the polepos "group", and yes, we tuned the JPOX config to run on it. How you configure a persistence solution to run on them is up to the individual. Just because the people involved maybe aren't as experienced as you with a tool doesn't make the benchmark "useless". The benchmark is the object graph models plus the persistence code; and that is basically sound IMHO.

Andy Jefferson replied on Wed, 2010/12/01 - 3:36am in response to: Andy Leung

Please get your facts right about DataNucleus. GAE/J uses a plugin written by Google, and it is this that makes impositions on the user in terms of what can and can't be queried. This post is about persistence to RDBMS, and for that you would use DataNucleus own provided plugins, not third party.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.