NoSQL Zone is brought to you in partnership with:

I've been a zone leader with DZone since 2008, and I'm crazy about community. Every day I get to work with the best that JavaScript, HTML5, Android and iOS has to offer, creating apps that truly make at difference, as principal front-end architect at Avego. James is a DZone Zone Leader and has posted 639 posts at DZone. You can read more from them at their website. View Full User Profile

An Introduction To Cassandra: The Data Model

09.14.2010
| 18196 views |
  • submit to reddit

I'm fairly new to the whole NoSQL game, and one thing I keep hearing is how great Cassandra  is. Built by Facebook and open sourced in 2008, Cassandra is probably the most popular NoSQL implementation: "A massively scalable, decentralized, structured data store". Cassandra takes it's distribution features from Dynamo and the data model from BigTable.

Before we look at using Cassandra, we first need to understand the data model. For developers new to Cassandra, coming from a relational database background,  the data model can be a bit confusing. Here's a summary of how the Cassandra data model is composed:

Column

A Column is the most basic element in Cassandra: a simple tuple that contains a name, value and timestamp. All values are set by the client. That's an important consideration for the timestamp,as it means you'll need clock synchronization.



SuperColumn

A SuperColumn is a column that stores an associative array of columns. You could think of it as similar to a HashMap in Java, with an identifying column (name) that stores a list of columns inside (value). The key difference between a Column and a SuperColumn is that the value of a Column is a string, where the value of a SuperColumn is a map of Columns. Note that SuperColumns have no timestamp, just a name and a value.



ColumnFamily

A ColumnFamily hold a number of Rows, a sorted map that matches column names to column values.  A row is a set of columns, similar to the table concept from relational databases. The column family holds an ordered list of columns which you can reference by column name.

The ColumnFamily can be of two types, Standard or Super. Standard ColumnFamilys contain a map of normal columns,

 

meanwhile Super ColumnFamily's contain rows of SuperColumns.



KeySpaces

KeySpaces are the largest container, with an ordered list of ColumnFamilies, similar to a database in RDMS. The KeySpace is normally named after the application.

Multiple KeySpaces reside in clusters, the machines/nodes in a Cassandra instance. 

 

For another summary of the Cassandra data model, check out the (nicely titled) "WTF is a SuperColumn".

In the next article in this introduction series, we'll move onto the good stuff: using Cassandra in Java.