Performance Zone is brought to you in partnership with:

As a Java Architect at Sun Microsystems, Carol McDonald has spoken at various conferences including JavaOne, Java University, Sun Tech Days, Sun Network, Sun Code Camps, JDJEdge, and JUGs including Machester, Boston, Maine, Cologne, FAA, Richmond, Memphis, D.C... Carol blogs about the latest technologies that she is speaking about at http://weblogs.java.net/blog/caroljmcdonald/. Before returning to Sun in 2007, Carol worked 2 1/2 yrs as an Architect on massive OLTP Spring/hibernate application to manage > 10 mill loans for the consumer credit division of a leading automoblile manufacturer and a leading bank. Before joining Sun the first time in 1999 Carol worked on Pharmaceutical Intranet applications for Roche in Switzerland, a Telecom Network Management Application for Digital (now HP) in France, a X.400 Email Server for IBM in Germany, and as a student intern for the National Security Agency. Carol holds a M.S. in Computer Science from the University of Tennessee, a B.S. in Geology from Vanderbilt University, and is a Sun Certified Java Architect and Java Language Programmer. Carol is also Fluent in French and German. Carol has posted 11 posts at DZone. View Full User Profile

JPA Performance, Don't Ignore the Database

08.31.2009
| 17953 views |
  • submit to reddit

Good Database schema design is important for performance. One of the most basic optimizations is to design your tables to take as little space on the disk as possible , this makes disk reads faster and uses less memory for query processing.

Data Types

You should use the smallest data types possible, especially for indexed fields. The smaller your data types, the more indexes (and data) can fit into a block of memory, the faster your queries will be.

Normalization

Database Normalization eliminates redundant data, which usually makes updates faster since there is less data to change. However a Normalized schema causes joins for queries, which makes queries slower, denormalization speeds retrieval. More normalized schemas are better for applications involving many transactions, less normalized are better for reporting types of applications.  You should normalize your schema first, then de-normalize later.  Applications often need to mix the approaches, for example use a partially normalized schema, and duplicate, or cache, selected columns from one table in another table. With JPA O/R mapping you can use the @Embedded annotation for denormalized columns to specify a persistent field whose @Embeddable type can be stored as an intrinsic part of the owning entity and share the identity of the entity.



Database Normalization and Mapping Inheritance Hiearchies

The Class Inheritance hierarchy shown below will be used as an example of JPA O/R mapping.


In the Single table per class mapping shown below, all classes in the hierarchy are mapped to a single table in the database. This table has a discriminator column (mapped by @DiscriminatorColumn), which identifies the subclass.  Advantages: This is fast for querying, no joins are required. Disadvantages:  wastage of space since all inherited fields are in every row, a deep inheritance hierarchy will result in wide tables with many, some empty columns.


In the Joined Subclass mapping shown below, the root of the class hierarchy is represented by a single table, and each subclass has a separate table that only contains those fields specific to that subclass. This is normalized (eliminates redundant data) which is better for storage and updates. However queries cause joins which makes queries slower especially for deep hierachies, polymorphic queries and relationships.


In the Table per Class mapping (in JPA 2.0, optional in JPA 1.0),  every concrete class is mapped to a table in the database and all the inherited state is repeated in that table. This is not normlalized, inherited data is repeated which wastes space.  Queries for Entities of the same type are fast, however  polymorphic queries cause unions which are slower.


Know what SQL is executed

You need to understand the SQL queries your application makes and evaluate their performance. Its a good idea to enable SQL logging, then go through a use case scenario to check the executed SQL.  Logging is not part of the JPA specification, With EclipseLink you can enable logging of SQL by setting the following property in the persistence.xml file:

<properties>
<property name="eclipselink.logging.level" value="FINE"/>
</properties>
With Hibernate you set the following property in the persistence.xml file:

 

<properties>
<property name="hibernate.show_sql" value="true" />
</properties>

Basically you want to make your queries access less data, is your application retrieving more data than it needs, are queries accessing too many rows or columns? Is the database query analyzing more rows than it needs? Watch out for the following:

  • queries which execute too often to retrieve needed data
  • retrieving more data than needed
  • queries which are too slow
    • you can use EXPLAIN to see where you should add indexes

With MySQL you can use the slow query log to see which queries are executing slowly, or you can use the MySQL query analyzer to see slow queries, query execution counts, and results of EXPLAIN statements.

Understanding EXPLAIN

For slow queries, you can precede a SELECT statement with the keyword EXPLAIN  to get information about the query execution plan, which explains how it would process the SELECT,  including information about how tables are joined and in which order. This helps find missing indexes early in the development process.



You should index columns that are frequently used in Query WHERE, GROUP BY clauses, and columns frequently used in joins, but be aware that indexes can slow down inserts and updates.

Lazy Loading and JPA

With JPA many-to-one and many-to-many relationships lazy load by default, meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the "many" objects in a relationship, it will cause n+1 selects where n is the number of "many" objects.



You can change the relationship to be loaded eagerly as follows :


However you should be careful with eager loading which could cause SELECT statements that fetch too much data. It can cause a Cartesian product if you eagerly load entities with several related collections.

If you want to override the LAZY fetch type for specific use cases, you can use Fetch Join. For example this query would eagerly load the employee addresses:


In General you should lazily load relationships, test your use case scenarios, check the SQL log, and use @NameQueries with JOIN FETCH to eagerly load when needed.

Partitioning

The main goal of partitioning is to reduce the amount of data read for particular SQL operations so that the overall response time is reduced

Vertical Partitioning  splits tables with many columns into multiple tables with fewer columns, so that only certain columns are included in a particular dataset, with each partition including all rows.

Horizontal Partitioning segments table rows so that distinct groups of physical row-based datasets are formed. All columns defined to a table are found in each set of partitions. An example of horizontal partitioning might be a table that contains historical data being partitioned by date.


Vertical Partitioning

In the example of vertical partitioning below a table that contains a number of very wide text or BLOB columns that aren't referenced often is split into two tables with the most referenced columns in one table and the seldom-referenced text or BLOB columns in another.

By removing the large data columns from the table, you get a faster query response time for the more frequently accessed Customer data. Wide tables can slow down queries, so you should always ensure that all columns defined to a table are actually needed.


The example below shows the JPA mapping for the tables above. The Customer data table with the more frequently accessed and smaller data types  is mapped to the Customer Entity, the CustomerInfo table with the less frequently accessed and larger data types is mapped to the CustomerInfo Entity with a lazily loaded one to one relationship to the Customer.



Horizontal Partitioning

The major forms of horizontal partitioning are by Range, Hash, Hash Key, List, and Composite.

Horizontal partitioning can make queries faster because the query optimizer knows what partitions contain the data that will satisfy a particular query and will access only those necessary partitions during query execution. Horizontal Partitioning works best for large database Applications that contain a lot of query activity that targets specific ranges of database tables.



Hibernate Shards

Partitioning data horizontally into "Shards" is used by google, linkedin, and others to give extreme scalability for very large amounts of data. eBay "shards" data horizontally along its primary access path.

Hibernate Shards is a framework that is designed to encapsulate support for horizontal partitioning into the Hibernate Core.



Caching

JPA Level 2 caching avoids database access for already loaded entities, this makes reading frequently accessed unmodified entities faster, however it can give bad scalability for frequent or concurrently updated entities.

You should configure L2 caching for entities that are:

  • read often
  • modified infrequently
  • Not critical if stale

You should also configure L2 (vendor specific) caching for maxElements, time to expire, refresh...

References and More Information:

JPA Best Practices presentation
MySQL for Developers Article
MySQL for developers presentation
MySQL for developers screencast
Keeping a Relational Perspective for Optimizing Java Persistence
Java Persistence with Hibernate
Pro EJB 3: Java Persistence API
Java Persistence API 2.0: What's New ?
High Performance MySQL book
Pro MySQL, Chapter 6: Benchmarking and Profiling
EJB 3 in Action
sharding the hibernate way
JPA Caching
Best Practices for Large-Scale Web Sites: Lessons from eBay



Published at DZone with permission of its author, Carol Mcdonald.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Erik Post replied on Mon, 2009/08/31 - 5:44am

Great article, thanks!

Cody A_ replied on Mon, 2009/08/31 - 8:00am

 Great article++;

Mario León replied on Mon, 2009/08/31 - 10:35am

No only great, it's an excellent article regarding JPA strategies to achieve distinct performance goal according the business problem.

 

I really liked the "hibernate-shards" section.

Muhammad Khojaye replied on Mon, 2009/08/31 - 12:44pm

Nice Article. Thanks for sharing this.

Robin Bygrave replied on Tue, 2009/09/01 - 2:36am

Can you expand on how you relate @Embedded to DB denormalisation?  Perhaps with an example? I can't see how the 2 relate as @Embedded is essentially a compound type in JPA1 and I don't see how it relates to denormalised/derived columns?

In regards the Vertical partitioning you don't mention the costs/downside related to that approach. For example, inserts would be negatively effected depending on how the ID is generated/provided but it also effects the way JDBC batching can be used. It only works if the columns can be separately by their use in the application which implies you know at DB design time all the application use scenarios.

Now if JPA had "Partial Object" support in its query language you would not need to artifically adjust both your DB and ORM models purely for performance reasons.  What is more there are certain queries and joins that get really significant performance improvements as the DB can resolve the query/join entirely via index(s) and not have to read data blocks.  For example fetching the customer id and email, or joining order lines to products and only selecting the product id and name etc - the DB cost savings via "Partial Object" queries can be really significant.

I think some JPA implementations have proprietory extensions to support this type of query and certainly the ORM I use does.

Erik Post replied on Tue, 2009/09/01 - 7:53am in response to: Robin Bygrave

@Robin: wrt to the @Embedded annotation: if you embed, say, an Address (street, city, ...) into both your Person and Company entities, you end up duplicating the columns defined by Address in the tables for both Person and Company, thus denormalizing your db.

Carol Mcdonald replied on Tue, 2009/09/01 - 10:19am in response to: Robin Bygrave

with JPA or hibernate if you only want to retrieve some of the object attributes and put in new object you can

instantiate a new  DTO object within the query  like this:

@NamedQuery(name="generateReport", 
query=" SELECT NEW com.ssg.article.ReportDTO(p.name, SUM(pen.amount))
FROM Party p LEFT JOIN p.penalties pen GROUP BY p.name""),

http://www.oracle.com/technology/pub/articles/gupta-jpa.html?_template=/ocom/print

Carol Mcdonald replied on Tue, 2009/09/01 - 10:21am in response to: Robin Bygrave

there is a @Embedded example in this JavaOne presentation Keeping a Relational Perspective for Optimizing Java Persistence

Carol Mcdonald replied on Tue, 2009/09/01 - 10:34am in response to: Robin Bygrave

yes vertical partitioning also has disadvantages. But in some circumstances, it can be beneficial to split into two a table that is scanned very often,  smaller tables normally require less main memory while their contents are being actively processed during query execution

German Viscuso replied on Tue, 2009/09/01 - 11:05am

Or you can just work with an oodbms and forget all about this...

Robin Bygrave replied on Tue, 2009/09/01 - 8:11pm

@eriksensei -  well that depends on your definition of Denormalisation. 

From Wikipedia - "Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data."

In this definition I'd argue duplicating columns is not denormalisation as there is no redundant or derived data. The identity of the address has changed (it doesn't have its own identity now) but the data itself is not redundant or derived.

In this same sense changing a OneToOne to an Embedded is not denormalisation in my book (that is the example I think Carol is referring to).

So, I'm guessing you are using another definition of Denormalised?

As an Address example, I'd go more with something like you have a history of addresses (effective dates) and derive/maintain the 'current' address via DB triggers. Most often the user/application wants the current address but occasionally they want the address as at a certain date.


Robin Bygrave replied on Tue, 2009/09/01 - 8:32pm in response to: Carol Mcdonald

@Carol - Yes, the "generateReport" query is close but your example is not really a "partial object" so it has obvious downsides. By "partial object" I mean an entity bean that has only some of its properties populated rather than a separate DTO (which is useful for other things)... but "partial object" support the the query language means you don't HAVE to do the vertical partitioning mentioned above.

Using a ReportDTO means ...

- Is not Updatable (out of the box anyway)

- Can not be further navigated via lazy loading

- Can not easily be used in fetch joins 

- Is itself a query with a fixed set of columns (you always get all the columns in the ReportDTO no matter what your use case is).

 

Example: Pseudo query language with "partial object" support..

 

find customer (id, name)  -- partially populated customer entity bean

 

find order (*)

fetch join orderDetails (*)

fetch join orderDetails.product (name, sku)    -- partially populated product used in join

 

That is, the query language gives you the ability to specify which properties in the entity bean are fetched (rather than the fixed Eager/Lazy annotations).  The beauty here is that your MODEL doesn't change... you control on a use case basis how full/populated your object graphs are.

 I'd love to see this support in JPA's query language but perhaps I'm the only one :(

 

Richard Langlois replied on Wed, 2009/09/02 - 11:59am

Excellent article Carol.

Robin Bygrave replied on Wed, 2009/09/02 - 5:29pm in response to: Robin Bygrave

BTW: Some ORM's have a "fetch group" feature... as I see it, this could be used in a similar way to a "partial object query"... and again means you could get more performant queries (only fetching what you need) without the vertical partitioning.

Obviously it is not in JPA 1 standard... but an option some might be interested in... and might become part of the standard in the future.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.