Experienced Java developer focused on performance problems, JVM tuning and algorithm efficiency optimization. Recently devoted to low-level aspects of Java applications performance and newest trends in concurrency and non-blocking algorithms. Wojciech has posted 6 posts at DZone. You can read more from them at their website. View Full User Profile

Java Threads on Steroids

  • submit to reddit

If you're much into concurrency, then you must have stumbled upon the disruptor concurrency framework engineered and open-sourced by LMAX.

Its performance was compared to the ArrayBlockingQueue which is considered one of the most if not the most effective queue implementation. The numbers are indeed pretty impressive:

I recommend downloading the source code and running the tests for yourself.


Martin Fowler recently published a nice insight into the benefits the disruptor approach brings with its application. There is also great series of blog posts by Trish Gee on disruptor internals which is very helpful in not only understanding how this pattern works but also why it is so insanely fast.
But does having this new wonder-weapon in hand mean we have reached the limits of concurrent processing in Java?
Well, not necessarily; the beauty of disruptor's approach lies in its non-blocking behaviour. The only sections involving concurrent reads and writes are handled by memory barriers (using volatile access). This is much better than locking, but does not eliminate problems connected to context switching.

In order to eliminate the cost of context switching we would have to eliminate the switching itself. You can force a thread or a process to run only on a specified set of CPUs thus reducing the probability of kernel migrating it over all cores available to the system. This is called processor affinity. There are several tools that enable setting processor affinity in a very simple manner, ie. Linux control groups or taskset utility. But what if you want to be able to control CPU affinity for individual Java threads?
One way would be to use the RealtimeThread class capabilities from Realtime Specification for Java, but that would imply using non-standard JVM implementation. Poor man's solution could involve using JNI to make native calls to kernel's sched_setaffinity or pthread_setaffinity_np if using POSIX api. To cut the theoretical considerations and learn the practical implications of applying this approach, let's take a look at the results.

This screenshot shows load for all CPUs when the tests were running with default processor affinity. You can see frequent changes in individual CPU loads. This is due to the workload being dynamically distributed among CPUs by system scheduler.

This in turn, shows how the load was distributed when the worker threads were pinned to their dedicated CPUs with fixed processor affinity.

And to illustrate the difference in terms of performance, the below shows the number of operations per second achieved with each approach:

The results not only show significant benefit from applying fixed processor affinity approach in terms of throughput but also do they expose virtual realtime characteristics by offering extremely stable and predictable results which is required by all realtime systems.

Some details:
  • The test being executed was UniCast1P1CPerfTest from the disruptor performance tests suite
  • There were 60 runs with 50.000.000 iterations each
  • CPUs were additionaly occupied by handling IRQs, so reconfiguring irq load balancing by using IRQBALANCE_BANNED_CPUS could render slightly better results
  • The exact number of context switches can be measured using SystemTap or by examining ctxt property value in /proc/stat
  • You can achieve better results by employing Linux cgroups to separate application workload from system tasks by assigning two separate resource pools to those two different groups
  • These results should not be considered a magic trick to speed up your application for every possible scenario. This will be effective only to the CPU-intensive usecases

  • Published at DZone with permission of its author, Wojciech Kudla.

    (Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


    Marek Stanulewicz replied on Sat, 2011/08/13 - 3:07pm

    Great article! Thanks :)

    Jonathan Fisher replied on Fri, 2011/09/02 - 9:08am

    Wow... I would have thought thread schedulers would have been better than that in this day and age. Seems like a sore spot in OS/JVMs; I'm very surprised by the results...

    Wojciech Kudla replied on Tue, 2011/09/06 - 6:16am in response to: Jonathan Fisher

    I got the very same impression and was completely surprised to see the results. It turns out there's a wide field of potential improvement to be introduced in linux kernel's scheduling area.
    I'm curious if the difference is as noticable on Windows platforms as it is on Linux...
    Gotta give it a try and maybe post an update.

    Maciej Gorączka replied on Wed, 2011/10/05 - 2:16am

    You mentioned disruptor and LMAX. But context switching is a more generic problem, isn't it? As far as I understand fixed processor affinity may make performance better in any multithreading environment. Right?

    Wojciech Kudla replied on Mon, 2011/12/19 - 4:59am in response to: Maciej Gorączka


    of course, you are right; context switching is a programming-language-agnostic problem. In this case Disruptor serves as an example illustrating the impact of context switching on overall latency. 

    As far as I understand fixed processor affinity may make performance better in any multithreading environment. Right?

    Not necessarily. If you tamper with cpu affinity within fairly balanced system, you may actually degrade performance. For example - chances are it wouldn't do any good for some applications of Azul + Zing (in this case, to be 100% sure, we'd probably have to ask dr. Cliff Click :)).


    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.