Cloud Zone is brought to you in partnership with:

Krystian Lider is co-owner of the SolDevelo - software development company with strong focus on software quality. Krystian has posted 4 posts at DZone. View Full User Profile

Comparison of the Grid/Cloud Computing Frameworks (Hadoop, GridGain, Hazelcast, DAC) - Part II

  • submit to reddit

What would happen, if 60% of your cloud suddenly goes down? Can you rely on the 'fail-over' capabilities of the framework of your choice? What about consistency of your data? How big would be the performance impact of node failures? Continuing our experiments from the previous article, we have decided to give a try the following frameworks:

As always, we describe all the methods and results and give you access to all the sources. You should be able to repeat all the tests and receive the similar results. If not, please contact us, so we can revise our report. Moreover, taking into consideration comments from the previous article, we have added detailed cpu/memory/network usages from all the machines in our test environment.

You can find all sources used during tests in our code repository. Moreover, full report with detailed results is available on our website.

This is part II of our comparison, where we concentrate on the fail-over capabilities. There were serious node failures (up to 60% nodes went down) and multiple transactions rollbacks.

Test environment

Our test environment consisted with 5 machines (named intel1 - intel5), each one with dual Quad-Core Xeon E5410 2.33GHz, 4GB RAM on board, which gave us 40 processing units. The only difference between current test environment and the one used in part I of our comparison is the JVM version, which has been updated to 1.6.0_18.


We based our benchmark on the same mathematical problem as in part I of our comparison. Because of that, we can easily compare results from both tests, which gives us more wider view on the given frameworks.

In this 'fail-over' comparison we used only one test scenario:

  • compute problem divided into 2705 tasks (CMBF with arguments: n = 4, level = 10000)


We have simulated node failures during our tests in the following order:

  • intel4 went down after 60 seconds from the beginning of computations
  • intel3 went down after 180 seconds from the beginning of computations
  • intel2 went down after 300 seconds from the beginning of computations


All tests were repeated ten times in order to avoid measuring error.

Results - overview

We have compared the following aspects:

  • average time of computations



  • cumulative cost (time of computation multiplied by the amount of available processing units)



  • cumulative cost – difference with CMBF (difference with the optimal solution: single-threaded version of CMBF)



  • total CPU usage




  • maximum memory usage



  • total network usage




You will find the detailed methodology (sources, test environment description) and results (all performed test cases with std deviation and average values) on our website.




Average CPU usage (%user) gathered on all machines:



Average CPU usage (%system) gathered on all machines:




Average memory usage gathered on all machines:




Average network usage (received bytes/s) gathered on all machines:



Average network usage (transmitted bytes/s) gathered on all machines:



The above part II concentrates on the fail-over capabilities. All frameworks properly handle node failures, but we had to slightly modify our code for Hazelcast to catch new exceptions (other frameworks resubmit invalid tasks by default). Taking the above results into consideration, we can infer the following conclusions:

  • Hazelcast and GridGain are the best choice for an easily-parallelized, low-data, CPU-intensive tasks. Moreover, they are even better choice, when some unexpected node failures can happen.
  • Hazelcast consumes the smallest amount of CPU and network bandwidth
  • GridGain consumes the smallest amount of memory
  • Hadoop was designed to manipulate large data sets, so the above not the best results are totally understandable
  • DAC with its default settings do not handle node failures efficiently


Published at DZone with permission of its author, Krystian Lider.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Serge Bornow replied on Fri, 2010/03/19 - 12:54pm

I've worked with GridGain and definitely enjoy it, obviously its all Java.

Expect to see an even more simplified framework in 3rd version of GridGain expected to be out there in Q2 this year.


Nikita Ivanov replied on Fri, 2010/03/19 - 3:26pm

GridGain network overhead is explain by the fact that GridGain auto-loads all necessary code to remote nodes on demand - something that is not supported by other frameworks. It is done with proper distributed semantics, multiple class loading modes, firewalls, etc. If you disable this feature (and manually distribute the code like with other frameworks) - the network load on GridGain will likely win over any other frameworks in the test.

Nikita Ivanov.

Oliver Plohmann replied on Thu, 2012/08/09 - 7:40am

This was a very intersting post. I can imagine how much work was involved to get all these figures. Thank you.

Something else that draw my attention is that DAC is based on some agent approach that is more general than map reduce. For many things map reduce is fine and the simplicity it provides helps in keeping things understandable, but for other things it may be too narrow a harness. Would be interesting to see how the other systems compare in this aspect. But that would be the purpose of some other article I believe ...

Then it strikes my eye that a framework like Hazelcast that is so much simpler than Hadoop and I don't know what else scores that well compared to the other systems with a much longer learning period. Well, I believe that Hadoop may scale better when you need 100 servers and much more. But nevertheless...

Besides that GigSpaces now also supports map reduce and JBoss Infinispan as well.  Choices are plenty and the choice gets more and more difficult. I spent some time on the GridGain homepage to find out where to download the non-commercial free version. That was kind of hard. I stopped looking into Terracotta after they withdrew their non-commercial version when being acquired by Software AG. I think GigaSpaces did things right here by offering a limited free version. Some limitation is still a lot better than no non-commercial version or one that is hard to get which makes people abandon just the whole thing.

Laurent Cohen replied on Sat, 2013/06/15 - 1:41am

 Hi Krystian,

Do you intend to include JPPF in the benchmark at some point in the future? My own tests with only 3 machines show that it's already 25% faster than the best results you collected with 5 machines. You'll also be surprised with the low network traffic level and how well it handles failover and fault tolerance. Took me 30mn to adapt the benchmark code and tune the framework to get these results. If you want to give it a try, I'll be happy to assist.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.