Big Data/Analytics Zone is brought to you in partnership with:

Koen Serneels is a Senior JAVA software engineer. He holds a keen interest in system architecture and integration, relational databases, security and networks. He is also an addict of clean and testable code. He is currently employed by Hewlett-Packard and in the evenings he enjoys teaching a graduate course on "Software Development with Java" at the GroepT Engineering School in Leuven, Belgium. Koen is also the co-author of the book on Spring MVC with Web Flow. Koen is a DZone MVB and is not an employee of DZone and has posted 5 posts at DZone. You can read more from them at their website. View Full User Profile

Bulk Fetching with Hibernate

01.06.2014
| 11498 views |
  • submit to reddit

If you need to process large database result sets from Java, you can opt for JDBC to give you the low level control required. On the other hand, if you are already using an ORM in your application, falling back to JDBC might imply some extra pain. You would be losing features such as optimistic locking, caching, automatic fetching when navigating the domain model and so forth. Fortunately most ORMs, like Hibernate, have some options to help you with that. While these techniques are not new, there are a couple of possibilities to choose from.

A simplified example; let's assume we have a table (mapped to class "DemoEntity") with 100.000 records. Each record consists of a single column (mapped to the property "property" in DemoEntity) holding some random alphanumerical data of about ~2KB. The JVM is ran with -Xmx250m. Let's assume that 250MB is the overall maximum memory that can be assigned to the JVM on our system. Your job is to read all records currently in the table, doing some not further specified processing, and finally store the result. We'll assume that the entities resulting from our bulk operation are not modified. To start we'll try the obvious first, performing a query to simply retrieve all data:

new TransactionTemplate(txManager).execute(new TransactionCallback<Void>() {
 @Override
 public Void doInTransaction(TransactionStatus status) {
   Session session = sessionFactory.getCurrentSession();
   List<DemoEntity> demoEntitities = (List<DemoEntity>) session.createQuery("from DemoEntity").list();
   for(DemoEntity demoEntity : demoEntitities){
    //Process and write result
   }
   return null;
 }
});

After a couple of seconds:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

Clearly this won't cut it. To fix this we will be switching to Hibernate scrollable result sets as probably most developers are aware of. The above example instructs hibernate to execute the query, map the entire results to entities and return them. When using scrollable result sets records are transformed to entities one at a time:

new TransactionTemplate(txManager).execute(new TransactionCallback<Void>() {
 @Override
 public Void doInTransaction(TransactionStatus status) {
  Session session = sessionFactory.getCurrentSession();
  ScrollableResults scrollableResults = session.createQuery("from DemoEntity").scroll(ScrollMode.FORWARD_ONLY);
 
  int count = 0;
  while (scrollableResults.next()) {
   if (++count > 0 && count % 100 == 0) {
    System.out.println("Fetched " + count + " entities");
   }
   DemoEntity demoEntity = (DemoEntity) scrollableResults.get()[0];
   //Process and write result
  }
 return null;
 }
});

After running this we get:

...
Fetched 49800 entities
Fetched 49900 entities
Fetched 50000 entities
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

Although we are using a scrollable result set, every returned object is an attached object and becomes part of the persistence context (aka session). The result is actually the same as our first example in which we used "session.createQuery("from DemoEntity").list()". However, with that approach we had no control; everything happens behind the scenes and you get a list back with all the data if hibernate has done its job. using a scrollable result set on the other hand gives us a hook into the retrieval process and allows us to free memory up when needed. As we have seen it does not free up memory automatically, you have to instruct Hibernate to actually do it. Following options exist:

  • Evicting the object from the persistent context after processing it
  • Clearing the entire session every now and then

We will opt for the first. In the above example under line 13 (//Process and write result) we'll add:

session.evict(demoEntity);
Important:
  • If you were to perform any modification to the entity (or entities it has associations with that are cascade evicted alongside), make sure to flush the session PRIOR evicting or clearing, otherwise queries hold back because of Hibernate's write behind will not be sent to the database
  • Evicting or clearing does not remove the entities from second level cache. If you enabled second level cache and are using it and you want to remove them as well use the desired sessionFactory.getCache().evictXxx() method
  • From the moment you evict an entity it will be no longer attached (no longer associated with a session). Any modification done to the entity at that stage will no longer be reflected to the database automatically. If you are using lazy loading, accessing any property that was not loaded prior the eviction will yield the famous org.hibernate.LazyInitializationException. So basically, make sure the processing for that entity is done (or it is at least initialized for further needs) before you evict or clear

After we run the application again, we see that it now successfully executes:

...
Fetched 99800 entities
Fetched 99900 entities
Fetched 100000 entities

Btw; you can also set the query read-only allowing hibernate to perform some extra optimizations:

ScrollableResults scrollableResults = session.createQuery("from DemoEntity").setReadOnly(true).scroll(ScrollMode.FORWARD_ONLY);
Doing this only gives a very marginal difference in memory usage, in this specific test setup it enabled us to read about 300 entities extra with the given amount of memory. Personally I would not use this feature merely for memory optimizations alone but only if it suits in your overall immutability strategy. With hibernate you have different options to make entities read-only: on the entity itself, the overall session read-only and so forth. Setting read only false on the query individually is probably the least preferred approach. (eg. entities loaded in the session before will remain unaffected, possibly modifiable. Lazy associations will be loaded modifiable even if the root objects returned by the query are read only).

Published at DZone with permission of Koen Serneels, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Muralidhar Nayani replied on Mon, 2014/01/06 - 1:26pm

 Nice explanation in detail...thanks. I have written a small program using Hibernate : inserting data into table.  using mysql.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.