NoSQL Zone is brought to you in partnership with:

Alec is a Content Curator at DZone and lives in Raleigh, North Carolina. He is interested in Java and Android programming, and databases of all types. When he's not writing for the NoSQL and IoT Zones, you might find him playing bass guitar, writing short stories where nothing happens, or making stuff in Java. Alec is a DZone Zone Leader and has posted 579 posts at DZone. You can read more from them at their website. View Full User Profile

Benchmarking Cassandra: The Right & Wrong Way to Do it

06.24.2014
| 2781 views |
  • submit to reddit

Everybody loves comparing databases. Not everybody agrees on how to do it, though. If you ask Jonathan Ellis at the DataStax Developer Blog, for example, one prime example is Thumbtack Technology's benchmarks comparing Cassandra, Couchbase, MongoDB, and Aerospike. The problem, Ellis says, is that the benchmarks give Cassandra a raw deal.

According to Ellis, the benchmarks were basically set up correctly, but ignored some major factors when it comes to benchmark hygiene:

Our problems start with benchmark hygiene: the read runs were run one after the other rather than properly isolating them by dropping the page cache and warming up each workload separately.  It also looks like no effort was made to isolate the effects of Cassandra compaction; compaction from the read/write workload could have continued into the read-heavy section. 

And those aren't even the biggest problems with the benchmarks, Ellis says. By Thumbtack Technology's numbers, Aerospike comes out on top and/or on par with Couchbase, while Cassandra trails behind, with MongoDB even further behind, and Ellis goes into detail for each aspect of the benchmark to explain what aspect of Cassandra was misunderstood or ignored.

To really nail down the argument, though, Ellis runs his own benchmarks. Due to changes in Aerospike's API, he couldn't include Aerospike in his new benchmarks, but instead substituted HBase as another representative of the top NoSQL solutions. His results came out like this:

(Source: Jonathan Ellis at DataStax)

It's an interesting look at the various factors one must consider when making performance comparisons, or any comparisons, given the complexity of these technologies.

The cynical might observe that benchmarks coming at the request of Aerospike (as Ellis notes) show Aerospike's excellent performance, while benchmarks coming from DataStax show Cassandra's excellent performance. The even-more-cynical might observe that both show MongoDB far below all the others - but hey, MongoDB's always being mistreated.

Check out the full article from Jonathan Ellis for all the details, and if you're looking for more in the way of Cassandra's performance, you might find something interesting here:

And more from Jonathan Ellis: