NoSQL Zone is brought to you in partnership with:

Jose has posted 1 posts at DZone. View Full User Profile

Jumping from MySQL to Cassandra: A Success Story

01.18.2012
| 14572 views |
  • submit to reddit

Today I’m gonna share with you my experience when I started with Apache Cassandra…One of the most complicated steps to learn any NoSql stuff, is to take away of your mind the normalization principles and those relational DB structures. Relational databases are designed to persist normalized data and without duplicated data. Well, one of the main changes here is that you need to think or design for your queries, in what your reports or finder methods want, and build a the persistent structure as it need.

Cents of web pages, books, papers treat about What Cassandra is, What Hazelcast is, What Hadoop, MemcacheDB, MongoDB, etc….But none of them treat about HOW TO migrate my data from a relational DB to one of them.

We wanted to migrate the persistent data of two our modules, Turmeric SOA Monitoring and Turmeric SOA Rate Limiting data. In Turmeric we use MySql as relational database. After a week reading and analyzing several NoSql options we decided for Cassandra. <— I hope to write another post about the whys…. btw, I highly recommended this reading: Cassandra: The Definitive Guide

From Relational tables to Keyspaces


The big deal now is How to migrate them. Well this is what we did:
Following an Agile best practice, if something is to hard or complex, just, break it in small challenges. After all we still had a good gap for a MMF (“Minimal Marketable Feature”, refer to Software by Numbers. So:

Step 1: Move our Relational DB tables to Cassandra Colum Family
Step 2: Customize our new Column Families in order to have all needed data without a like JOIN operators
Step 3: Explode those Column Families as finder and query method needs. Typically a finder or query method should use 1 Column Family
Step 4: Customize Creators and Updater methods according previous changes. Don’t be scared if you are saving duplicated data. Keep in mind, “think for your queries!, forget to normalization rules.”
Step 5: while (!pleased) -> do step 3 and 4

A Cassandra DAO


Now, the hardest step is #1. Don’t panic, we developed a kind of generic (in fact it uses Java Generics) Cassandra DAO for your migration. As all this work was needed for the project I’m actually working on, you will find it as a submodule of TurmericSOA, but following the Apache License you can use it through your Maven dependency file.

<dependency>
<groupId>org.ebayopensource.turmeric.utils</groupId>
<artifactId>turmeric-utils-cassandra</artifactId>
<version>1.2.0.0-SNAPSHOT</version>
<type>jar</type>
</dependency>


Features

  • 100% Java code
  • It can runs an Embedded Cassandra Service or just talk to your external Cassandra Service
  • Uses Hector library as Java Cassandra client
  • Dynamically [Super] Column Family creation
  • Key Types and Data Types defined at runtime with the use of Generics
  • Main CRUD methods supported:

 

boolean containsKey(KeyType key);

void delete(KeyType key);

T find(KeyType key);

Map> findItems(final List keys, final Long rangeFrom, final Long rangeTo);

Set findItems(final List keys, final String rangeFrom, final String rangeTo);

Set getKeys();

void save(KeyType key, T model);

 

Main Classes
This util package contains the following package and classes:

org.ebayopensource.turmeric.utils.cassandra.service

  • CassandraManager: initialize a static EmbeddedCassandraService instance based on yaml configuration file

org.ebayopensource.turmeric.utils.cassandra.hector

  • HectorManager: Manages the keyspace and column family creation and reading. It uses Hector Api
  • HectorHelper: Includes some utility methods based on Java Reflection and Java Generics. IE: retrieving the field names from a POJO which are used as column names in cassandra keyspaces

org.ebayopensource.turmeric.utils.cassandra.dao

  • AbstractColumnFamilyDao: As it is called, this should be a base class that every dao should extends. It defines and implements basic DAO operation with the use of Hector Api.

Configuration files

Here is the directory structure of the configuration files:

META-INF/
         security/
                  config/
                         cassandra/
                                   cassandra.properties


An example of this property file:

cassandra-cluster-name=TurmericCluster
cassandra-host-ip=127.0.0.1
cassandra-rpc-port=9160
cassandra-my-keyspace=My-keyspace

#column families
cassandra-foo-column-family=foo
cassandra-bar-column-family=bar


How to use it….


It is very intuitive. Lets suppose we have a Foo table in our relational DB, ie MySql.
So:

Create the BaseDao interface

public interface BaseDao {
		  public void delete(String key);
		  public Set getKeys();
		  public boolean  containsKey(String key);
		  public void save(String key, FooPojoClass  fooPojo);
		  public FooPojoClass find(String key);
}


Create the FooDao interface

public interface FooDao extends BaseDao  {
}

 

Create the FooDao implementation

public class FooDaoImpl extends AbstractColumnFamilyDao
		implements FooDao {
	public FooDaoImpl(final String clusterName, final String host, final String keySpace, final String cf,  final Class kTypeClass) {
		super(clusterName, host, keySpace, kTypeClass, FooPojo.class, cf);
	}

}


… in your code

//initiates an embedded Cassandra Service
CassandraManager.initialize();

//creates our Foo Column Family
FooDao fooDao = new FooDaoImpl("myCluster", "127.0.0.1", "myKeyspace",
				"myColumnFamilyName", String.class);


and voilà, you have your relational table migrated as a Cassandra column family!!!

Anyways your can surf at UT classes to see how are they implemented…

enjoy it!!!


Source: http://itsecrets.wordpress.com/2012/01/12/jumping-from-mysql-to-cassandra-a-success-story/

Published at DZone with permission of its author, Jose Alvarez Muguerza.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)