NoSQL Zone is brought to you in partnership with:

Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2576 posts at DZone. You can read more from them at their website. View Full User Profile

Migrating from Cassandra to MongoDB

07.26.2010
| 14195 views |
  • submit to reddit
I'll start off by saying this article is not intended to be a Cassandra-bashing session, instead it provides an interesting look at one development company's case study to show that Cassandra (although it's fantastic for some) is not for everyone.

The company is Nodeta and the application is Flowdock, a free tool (currently in beta) that functions as a web-based team messenger in place of Campfires, Skype Chats, IRCs, etc.  Otto Hilska, who talked about the migration, says that "All software developers should be using it… because it better supports their actual workflow.."

About a week ago the team finished their transition from the Apache Cassandra NoSQL database to another NoSQL, MongoDB.  The switch was made due to stability issues that the developers were having with Cassandra.  Hilska explained the details of his company's experience with Cassandra:

" All nodes would go into an infinite loop, running GC and trying to compact the data files – occasionally falling off the cluster. We were unable to solve the problem, except that restarting and then compacting a node usually settled it down for a while. Other people had reported similar problems. Last couple of weeks our Cassandra nodes always ate all the resources they were given, slowing down Flowdock.

This was not the first time we had run into problems because of our bleeding edge database choice. When upgrading from 0.4 to 0.5, we had to shut down the cluster, only to find out that it hadn’t flushed everything to the disk (even though we explicitly flushed it, as instructed). Thus we ended up having a couple of minutes of discussions lost, and our custom-built indices were miserably out of date and needed to be rebuilt. I think it was 4 AM when we finally got to leave the office."--Otto Hilska


Flowdock developers became attracted to a new NoSQL store, MongoDB, because of its recent addition of auto-sharding and replica sets.  Hilska wrote the conversion script in a day and it took a week to get Flowdock running purely on MongoDB.  Then Nodeta tested it internally for a few weeks before they deployed it to production.

However, MongoDB is not without flaws as well.  Dots are not allowed in BSON document keys and the document size is limited to 4MB.  It's also not as easy to add new nodes as it is with Cassandra.  On the other hand, Hilska says that the smart (multikey) indices, complex queries directly from the console, MapReduce, GridFS, and lack of issues make up for these minor flaws.

I recently posted a guide for determining the right database solution (Relational or NoSQL) for various use cases.  The article has some very good resources and includes situations where MongoDB or Cassandra might be the best choice.  

Some criticisms of Cassandra emerged a few weeks ago when Twitter announced that it would hold off on the migration of their tweet storage over to the NoSQL store.  By no means has Twitter stopped using Cassandra.  They have stated that it's currently being used to store geolocation data and data mining results that feed into things like local trends and @toptweets.  The NoSQL's creators at Facebook are also still using Cassandra.

Production deployments of MongoDB exist at Foursquare, SourceForge, The New York Times, BoxedIce, GitHub, and SugarCRM.