SQL Zone is brought to you in partnership with:

Eyal Lupu is a Java/JEE software architect, consultant and trainer. Working as a software professional since 1993 Eyal is specialized in architecting, designing and building infrastructures and frameworks for software development teams. Eyal is also specialized in technology leadership of globally distributed development teams. He blogs at http://blog.eyallupu.com Eyal is a DZone MVB and is not an employee of DZone and has posted 6 posts at DZone. You can read more from them at their website. View Full User Profile

Hibernate - Tuning Queries Using Paging, Batch Size, and Fetch Joins

06.09.2008
| 151120 views |
  • submit to reddit

Tuning Requirement #2 - Use Paging

The second requirement was to do it in paging - each page will have 100 customers (so we will have 18,900/100+1 pages - the last page has 98 customers). So let's change the code above a little bit:
        Query q = entityManager.createQuery(queryStr);
         
          q.setFirstResult(pageNum*100).setMaxResults(100);
        
        long a = System.currentTimeMillis();
        List<Customer> l = q.getResultList();
        for (Customer c : l) {
            c.getOrders().size();
        }
        long b = System.currentTimeMillis();

        System.out.println("Execution time: " + (b - a)+ "; Number of records fetch: " + l.size() );

I added the second line which limits the query result to a specific page with up to 100 records per page. And the numbers are (avg. 3 executions):

  • Simple select: 328 millis
  • Join fetch: 1,660 millis
The wheel has turned over. Why? First a quote from the EJB3 Persistence specification:

"The effect of applying setMaxResults or setFirstResult to a query involving fetch joins over collections is undefined" (section 3.6.1 - Query Interface)

We could have stopped here but it is interesting to understand the issue and to see what Hibernate does.

To implement the paging features Hibernate delegates the work to the database using its syntax to limit the number of records fetched by the query. Each database has its own proprietary syntax for limiting the number of fetched records, some examples:
  • Postgres uses LIMIT and OFFSET
  • Oracle has rownum
  • MySQL uses its version of LIMIT and OFFSET
  • MSSQL has the TOP keyword in the select
  • and so on


The important thing to remember here is meaning of such limit: the database returns a subset of the query result. So if we asked for the first 100 customers which their names contain 'Eyal' the outcome is logically the same as building a table in memory out of all customers that match the criteria and take from there the first 100 rows. And here is the catch: if the query with the limit includes a join clause for a collection than the first 100 row in the "logical table" will not necessarily be the first 100 customers. the outcome of the join might duplicate customers in the "logical tables" but the database doesn't aware or care about that - it performs operations on tables not on objects!. For example think of the extreme case, the customer 'Eyal' has 100 orders. The query will return 100 rows, hibernate will identify that all belong to the same customer and return only one Customer as the query result - this is not what we were asking for.

This also works, of course, the other way around. If a customer had more than 100 orders and the result set size was limited to 100 rots the orders collection would not contain all of the customer's orders.

To deal with that limitation Hibernate actually doesn't issue an SQL statement with a LIMIT clause. Instead it fetches all of the records and performs the paging in memory. This explains why using the 'join fetch' statement with paging took more than the one without paging - the delta is the in-memory paging done by Hibernate. If you look at Hibernate logs you will find the next warning issued by Hibernate:

 

WARNING: firstResult/maxResults specified with collection fetch; applying in memory!

Published at DZone with permission of Eyal Lupu, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Koen VT replied on Mon, 2008/06/09 - 9:42am

I had a similar problem once. Since our database has millions of rows and we needed optimal performance, I had no choice but to use a native query. This is an example for Oracle, where the first 100 customers are retrieved, together with their orders (if any):

SELECT
A1.*,
ORD.*
FROM (
SELECT
ROWNUM RNUM,
A2.*
FROM(
SELECT
CUST.ID,
CUST.NAME
FROM
CUSTOMER CUST
ORDER BY CUST.NAME ASC
) A2
WHERE ROWNUM < 100
) A1
LEFT OUTER JOIN
ORDER ORD
ON
ORD.IDCUST = A1.ID
WHERE
RNUM >= 0;

In fact, this query returns the 100 customers in 100 records or more (if there are customers with more than one order)!

I'm not a JPA expert, but I know JPA has support for native queries. Sometimes, for optimal performance, you have no other choice.

I agree that a native query (in this case for Oracle) is not portable to other databases, but the same is true for the @BatchSize annotation at a certain level. It only works with Hibernate and you have to rewrite the code when you want to use OpenJPA, Toplink or whatever as a JPA implementation.

S. Radojcin replied on Sun, 2009/08/16 - 4:18am

"I have in my database 18,998 customer records, each with few orders."

What if you have a _lot of orders for each customers, and want to use pagination for those orders (for example, in order to display orders in a separate data table, with dataScroller, as ussual) ? How to determine total number of orders for a particular customer ? I suppose that 

c.getOrders().size();

will be too expensive, or I am wrong ?

 

Further, how to load just , let say, 50 orders for a particular customer ? Iterating throughoutc.getOrders(), or execute separate Query, with setFirstResult()/setMaxResult() ?

- regards, Cvele

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.