NoSQL Zone is brought to you in partnership with:

Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2573 posts at DZone. You can read more from them at their website. View Full User Profile

Netflix Benchmarks on AWS Show Cassandra NoSQL Still Has the Goods

  • submit to reddit

A little more than a year ago, Apache Cassandra's reputation was untouchable.  It was blowing other NoSQL data stores out of the water in benchmarks and in our very own DZone popularity poll.  What else would you expect from the data solution that was originally designed to handle the data on Facebook.  How could it not be the top solution out there? 

But last year, Cassandra's reputation seemed like it got a little tarnished by stories about its instability and difficult learning curve.  And then there were subsequent migrations which were induced by the emerging and the growing popularity of MongoDB.  What really hurth Cassandra was Twitter announcing that it would hold off on the migration of their tweet storage over to the NoSQL store.  It was still used to store geolocation data and data mining results that feed into things like local trends and @toptweets, but the damage was still done.

Fastforward to last month and we see the stability issues fade away as Apache Cassandra reaches a major milestone in version 1.0. And just this week there's been benchmarks done by Netflix which vindicate their 6-month migration to Cassandra. 

Adrian Cockcroft authored an incredibly detailed blog post about the stess tests on Amazon EC2 instances:

To measure scalability, the same test was run with 48, 96, 144 and 288 instances, with 10, 20, 30 and 60 clients respectively. The load on each instance was very similar in all cases, and the throughput scaled linearly as we increased the number of instances. Our previous benchmarks and production roll-out had resulted in many application specific Cassandra clusters from 6 to 48 instances, so we were very happy to see linear scale to six times the size of our current largest deployment. This benchmark went from concept to final result in five days as a spare time activity alongside other work, using our standard production configuration of Cassandra 0.8.6, running in our test account. The time taken by EC2 to create 288 new instances was about 15 minutes out of our total of 66 minutes. The rest of the time was taken to boot Linux, start the Apache Tomcat JVM that runs our automation tooling, start the Cassandra JVM and join the "ring" that makes up the Cassandra data store. For a more typical 12 instance Cassandra cluster the same sequence takes 8 minutes.  --Adrian Cockcroft

The blog post is kindly divided into the overview and the TL;DR section for performance nerds who are interested in these great feats of cloud and database performance.  Cockcroft even offered Netflix's infrastructure up for your testing curiosity... seriously:  "If you are the kind of performance geek that has read this far and wishes your current employer would let you spin up huge tests in minutes and open source the tools you build, perhaps you should give us a call..."



Amara Amjad replied on Sun, 2012/03/25 - 2:23am

This is extremely valuable information and I love that you guys share it openly. You've saved us all lots of time by being open about the data. Very very cool. Thanks!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.