SQL Zone is brought to you in partnership with:

Eyal Lupu is a Java/JEE software architect, consultant and trainer. Working as a software professional since 1993 Eyal is specialized in architecting, designing and building infrastructures and frameworks for software development teams. Eyal is also specialized in technology leadership of globally distributed development teams. He blogs at http://blog.eyallupu.com Eyal is a DZone MVB and is not an employee of DZone and has posted 6 posts at DZone. You can read more from them at their website. View Full User Profile

Hibernate - Tuning Queries Using Paging, Batch Size, and Fetch Joins

06.09.2008
| 149582 views |
  • submit to reddit

This article covers queries - in particular a tuning test case and the relations between simple queries, join fetch queries, paging query results, and batch size.
 

Paging the Query Results

I will start with a short introduction about paging in EJB3:

To support paging the EJB3 Query interface defines the following two methods:
  • setMaxResults - sets the number of maximum rows to retrieve from the database
  • setFirstResult - sets the first row to retrieve

For example if our GUI displays a list of customers and we have 500,000 customers (database rows) in out database we wouldn't like to display all 500,000 records is one view (even if we put performance considerations aside - nobody can do anything with a list of 500,000 rows). The GUI design would usually include paging - we break the list of records to display into logical pages (for example 100 records per page) and the user can navigate between pages (same as Google's results navigator down the search page).

When using the paging support it is important to remember that the query has to be sorted otherwise we can't be sure that when fetching the "next page" it will really be the next page (since in the absence of the 'order by' clause form a SQL query the order in which rows are fetch is unpredictable).

Here is a sample use, for fetching the first tow pages of 100 rows each:

        Query q = entityManager.createQuery("select c from Customer c order by c.id");
        q.setFirstResult(0).setMaxResults(100);

        .... next page ...

        Query q = entityManager.createQuery("select c from Customer c order by c.id");
        q.setFirstResult(100).setMaxResults(100);


This is a simple API and it's important (for performance) to remember using it when we need to fetch only parts of the results.

Test Case Description

This test cased is based on a real tuning I did for an application, I just changed the class names to Customer and Order. Let's assume that I have a Customer entity with a set of orders (lazily fetched - but it happens in eager fetch as well) and we need to:

  1. Fetch customers and their orders
  2. Do it in a "paging mode" - 100 customers per page

Tuning Requirement #1 - Fetch Customers and Their Orders


There are two possibilities to perform this kind of fetch:

  • Simple select: select c from customer c order by c.id
  • Join fetch: select distinct c from Customer c left outer join fetch c.orders order by c.id

The simple select is as simple as it can be, we load a list of customers with a proxy collection in their orders field. The orders collection will be filled with data once I access it (for example c.getOrders().getSize() ). The 'join fetch' means that we want to fetch an association as an integral part of the query execution. The joined fetched entities (in the example above: c.orders) must be part of an association that is referenced by an entity returned from the query (in the example above: c). The 'join fetch' is one of the tools used for improving queries performance (see more in here). The Hibernate core documentations explains that "a 'fetch' join allows associations or collections of values to be initialized along with their parent objects, using a single select" (see here).


I have in my database 18,998 customer records, each with few orders. Let's compare execution time for the two queries. My code looks the same for both queries (except of the query itself), I execute the query, then I iterate the results checking the size of of each customer orders collection and print the execution time and number of records fetch (as a sanity for the query syntax):

        Query q = entityManager.createQuery(queryStr);

        long a = System.currentTimeMillis();
        List<Customer> l = q.getResultList();
        for (Customer c : l) {
            c.getOrders().size();
        }
        long b = System.currentTimeMillis();

        System.out.println("Execution time: " + (b - a)+ "; Number of records fetch: " + l.size() );


And to the numbers (avg. 3 executions):
  • Simple select: 24,984 millis
  • Join fetch: 1,219 millis
The join fetch query execution time was 20 times faster(!) than the simple query. The reason is obvious, using the join fetch select I had only one round trip to the database. While using a simple select I had to fetch the customers (1 round trip to the database) and each time I accessed a collection I had another round trip (that's 18,998 additional round trips!).

The winner is 'join fetch'. But does it? wait for the next one - the paging...

Published at DZone with permission of Eyal Lupu, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Koen VT replied on Mon, 2008/06/09 - 9:42am

I had a similar problem once. Since our database has millions of rows and we needed optimal performance, I had no choice but to use a native query. This is an example for Oracle, where the first 100 customers are retrieved, together with their orders (if any):

SELECT
A1.*,
ORD.*
FROM (
SELECT
ROWNUM RNUM,
A2.*
FROM(
SELECT
CUST.ID,
CUST.NAME
FROM
CUSTOMER CUST
ORDER BY CUST.NAME ASC
) A2
WHERE ROWNUM < 100
) A1
LEFT OUTER JOIN
ORDER ORD
ON
ORD.IDCUST = A1.ID
WHERE
RNUM >= 0;

In fact, this query returns the 100 customers in 100 records or more (if there are customers with more than one order)!

I'm not a JPA expert, but I know JPA has support for native queries. Sometimes, for optimal performance, you have no other choice.

I agree that a native query (in this case for Oracle) is not portable to other databases, but the same is true for the @BatchSize annotation at a certain level. It only works with Hibernate and you have to rewrite the code when you want to use OpenJPA, Toplink or whatever as a JPA implementation.

S. Radojcin replied on Sun, 2009/08/16 - 4:18am

"I have in my database 18,998 customer records, each with few orders."

What if you have a _lot of orders for each customers, and want to use pagination for those orders (for example, in order to display orders in a separate data table, with dataScroller, as ussual) ? How to determine total number of orders for a particular customer ? I suppose that 

c.getOrders().size();

will be too expensive, or I am wrong ?

 

Further, how to load just , let say, 50 orders for a particular customer ? Iterating throughoutc.getOrders(), or execute separate Query, with setFirstResult()/setMaxResult() ?

- regards, Cvele

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.