SQL Zone is brought to you in partnership with:

Jakub is a Java EE developer since 2005 and occasionally a project manager, working currently with Iterate AS. He's highly interested in developer productivity (and tools like Maven and AOP/AspectJ), web frameworks, Java portals, testing and performance and works a lot with IBM technologies. A native to Czech Republic, he lives now in Oslo, Norway. Jakub is a DZone MVB and is not an employee of DZone and has posted 149 posts at DZone. You can read more from them at their website. View Full User Profile

The Power of Batching or Speeding JDBC by 100

  • submit to reddit

We all know that one coarse-grained operation is more efficient than a number of fine-grained ones when communicating over the network boundary but until recently I haven’t realized how big that difference may be. While performing a simple query individually for each input record proceeded with the speed of 11k records per hour, when I grouped each 100 queries together (with “… WHERE id IN (value1, .., value100)), all 200k records were processed in 13 minutes. In other words, using a batch of the size 100 led to the speed-up of nearly two orders of magnitude!

The moral: It really pays of to spend a little more time on writing the more complex batch-enabled JDBC code whenever dealing with larger amounts of data. (And it wasn’t that much more effort thanks to Groovy SQL.)

From http://theholyjava.wordpress.com/2010/09/20/the-power-of-batching-or-speeding-jdbc-by-100/

Your rating: None Average: 1 (1 vote)
Published at DZone with permission of Jakub Holý, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Claude Lalyre replied on Tue, 2010/09/21 - 3:40am

As I understand, in the whole treatement of operations there are both pure processing time and also time spent with the overhead of SQL requests travelling the network. I assume that the time of network transfer is far more expensive (milli seconds) that the pure processing time (micro seconds). So having multiple requests travelling on the network will increase considerably the amount of time for executing all your SQL requests.

Liam Knox replied on Tue, 2010/09/21 - 6:18am in response to: Claude Lalyre

Depends on a vast number of things

Obvisouly straight perpared inserts will perform better if you are using transactions around batches compared to vanilla sql with autocommit on. And as you mention the batch idiom further improves based on reducing network I/O.

Obviously you could parallize batches and depending on you data tweak connection isolation levels, this would further improve your performance. Or if you want real grunt use a native bcp approach.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.