Performance Zone is brought to you in partnership with:

Daan is a software developer with a strong focus on interaction design. Mainly developing in Java. He likes Agile methodologies, and has worked many years with Scrum. Besides work, he absolutely loves experimenting with new (or old!) technology. Daan's blog can be found at Daan has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile

Why Many Java Performance Tests are Wrong

  • submit to reddit

A lot of ‘performance tests’ are posted online lately. Many times these performance tests are implemented and executed in a way that completely ignores the inner workings of the Java VM. In this post you can find some basic knowledge to improve your performance testing. Remember, I am not a professional performance tester, so put your tips in the comments!

An example

For example, some days ago a ‘performance test’ on while loops, iterators and for loops was posted. This test is wrong and inaccurate. I will use this test as an example, but there are many other tests that suffer from the same problems.

So, let’s execute this test for the first time. It tests the relative performance on some loop constructs on the Java VM. The first results:

Iterator - Elapsed time in milliseconds: 78
For - Elapsed time in milliseconds: 28
While - Elapsed time in milliseconds: 30

Allright, looks interesting. Let’s change the test a bit. When I reshuffle the code, putting the Iterator test at the end, I get:

For - Elapsed time in milliseconds: 37
While - Elapsed time in milliseconds: 28
Iterator - Elapsed time in milliseconds: 30

Hey, suddenly the For loop is the slowest! That’s weird!

So, when I run the test again, the results should be the same, right?

For - Elapsed time in milliseconds: 37
While - Elapsed time in milliseconds: 32
Iterator - Elapsed time in milliseconds: 33

And now the While loop is slower! Why is that?

Getting valid test results is not that easy!

The example above shows that obtaining valid test results can be hard. You have to know something about the Java VM to get more accurate numbers, and you have to prepare a good test environment.

Some tips and tricks

  • Quit all other applications. It is a no-brainer, but many people are testing with their systems loaded with music players, RSS-feed readers and word processors still active. Background processes can reduce the amount of resources available to your program in an unpredictable way. For example, when you have a limited amount of memory available, your system may start swapping memory content to disk. This will have not only a negative effect on your test results, it also makes these results non-reproducible.
  • Use a dedicated system. Even better than testing on your developer system is to use a dedicated testing system. Do a clean install of the operating system and the minimum amount of tools needed. Make sure the system stays as clean as possible. If you make an image of the system you can restore it in a previous known state.
  • Repeat your tests. A single test result is worthless without knowing if it is accurate (as you have seen in the example above). Therefore, to draw any conclusions from a test, repeat it and use the average result. When the numbers of the test vary too much from run to run, your test is wrong. Something in your test is not predictable or consistent. Try to fix your test first.
  • Investigate memory usage. If your code under test is memory intensive, the amount of available memory will have a large impact on your test results. Increase the amount of memory available. Buy new memory, fix your program under test.
  • Investigate CPU usage. If your code under test is CPU intensive, try to determine which part of your test uses the most CPU time. If the CPU graphs are fluctuating much, try to determine the root cause. For example Garbage Collection, thread-locking or dependencies on external systems can have a big impact.
  • Investigate dependencies on external systems. If your application does not seem to be CPU-bound or memory intensive, try looking into thread-locking or dependencies on external systems (network connections, database servers, etcetera)
  • Thread-locking can have a big impact, to the extent that running your test on multiple cores will decrease performance. Threads that are waiting on each other are really bad for performance.

The Java HotSpot compiler

The Java HotSpot compiler kicks in when it sees a ‘hot spot’ in your code. It is therefore quite common that your code will run faster over time! So, you should adapt your testing methods.

The HotSpot compiler compiles in the background, eating away CPU cycles. So when the compiler is busy, your program is temporarily slower. But after compiling some hot spots, your program will suddenly run faster!

When you make a graph of the througput of your application over time, you can see when the HotSpot compiler is active:

Throughput of a running application

Througput of a running application over time

The warm up period shows the time the HotSpot compiler needs to get your application up to speed.

Do not draw conclusions from the performance statistics during the warm up time!

  • Execute your test, measure the throughput until it stabilizes. The statistics you get during the warm up time should be discarded.
  • Make sure you know how long the warm up time is for your test scenario. We use a warm up time of 10-15 minutes, which is enough for our needs. But test this yourself! It takes time for the JVM to detect the hot spots and compile the running code.

Remember, I am not a professional performance tester, so put your tips in the comments!


Published at DZone with permission of its author, Daan van Etten.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Dapeng Liu replied on Thu, 2009/01/29 - 2:58am

may even set the JVM process priority to 'real time' (windows)

Artur Biesiadowski replied on Thu, 2009/01/29 - 6:21am

This is all true, but at least as far as most microbenchmarks are concerned, you don't have to worry that much. Just make sure that whatever you are testing is taking longer than one second, make a warmup phase by running FULL test few times before reporting any values.

Trick which I'm using (especially when correcting somebody's else microbenchmarks) is to rename their main method to mainx and write my own main

public static void main(String[] args) {
for ( int i =0; i < 10; i++) {

Very simple, crude, but solves 95% of the problems. I have yet to see any microbenchmark which would not stabilize after 3rd iteration in such loop (again, assuming you are not having sub-second tests).

Outside of microbenchark world, things are a lot more complicated. Things like logging to nfs versus local disk can make an order of magnitude difference, database index can become more or less balanced, etc, etc. At this point performance testing becomes more of an art than science. But in microbenchmark world you can get repeatable, meaningful results, as long as you know what you are measuring. Unfortunately, results can be later misinterpreted (by putting blank statements like "X is faster than Y in java", while it is AMD versus Intel speed difference on given version of JVM).

Bernd Eckenfels replied on Thu, 2009/01/29 - 8:25pm

Thomas Mueller replied on Fri, 2009/01/30 - 3:39am

Garbage collection is an issue. If your test looks like this:
loop {
Then the garbage created in test1 could be collected while test2 is running, slowing it down. Consider running the tests separately.

Mike P(Okidoky) replied on Fri, 2009/01/30 - 1:44pm

Yes, Java has been misrepresented on countless of benchmark websites, blogs, and articles - exactly for the reasons above. I often felt like talking to a brick wall, because some people refused to acknowledge warm up time, which should never be included in any meaningful benchmarks.

Thank you for this article. I'll be sure to bookmark it.

Torbjörn Gannholm replied on Mon, 2009/02/02 - 5:20pm

I believe you should run each different test (e.g. for, while, iterator) in a separate process, too, otherwise the tests affect each other.

Michał Jankowski replied on Tue, 2009/02/03 - 4:53am in response to: Torbjörn Gannholm

I agree. Different programs for each test and run them independantly 100 times and then calculate the avarage.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.