NoSQL Zone is brought to you in partnership with:

Dustin Marx is a software developer who enjoys identifying and using the correct tool for the job. In addition to writing software and writing a blog on software development, Dustin occasionally presents at conferences and writes articles. Dustin is a DZone MVB and is not an employee of DZone and has posted 236 posts at DZone. You can read more from them at their website. View Full User Profile

Custom Cassandra Data Types

07.09.2014
| 1896 views |
  • submit to reddit

In the blog post Connecting to Cassandra from Java, I mentioned that one advantage for Java developers ofCassandra being implemented in Java is the ability to create custom Cassandra data types. In this post, I outline how to do this in greater detail.

Cassandra has numerous built-in data types, but there are situations in which one may want to add a custom type. Cassandra custom data types are implemented in Java by extending theorg.apache.cassandra.db.marshal.AbstractType class. The class that extends this must ultimately implement three methods with the following signatures:

public ByteBuffer fromString(final String) throws MarshalException  
public TypeSerializer getSerializer()  
public int compare(Object, Object)  

This post's example implementation of AbstractType is shown in the next code listing.

UnitedStatesState.java - Extends AbstractType

package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.db.marshal.AbstractType;
import org.apache.cassandra.serializers.MarshalException;
import org.apache.cassandra.serializers.TypeSerializer;

import java.nio.ByteBuffer;

/**
 * Representation of a state in the United States that
 * can be persisted to Cassandra database.
 */
public class UnitedStatesState extends AbstractType
{
   public static final UnitedStatesState instance = new UnitedStatesState();

   @Override
   public ByteBuffer fromString(final String stateName) throws MarshalException
   {
      return getStateAbbreviationAsByteBuffer(stateName);
   }

   @Override
   public TypeSerializer getSerializer()
   {
      return UnitedStatesStateSerializer.instance;
   }

   @Override
   public int compare(Object o1, Object o2)
   {
      if (o1 == null && o2 == null)
      {
         return 0;
      }
      else if (o1 == null)
      {
         return 1;
      }
      else if (o2 == null)
      {
         return -1;
      }
      else
      {
         return o1.toString().compareTo(o2.toString());
      }
   }

   /**
    * Provide standard two-letter abbreviation for United States
    * state whose state name is provided.
    *
    * @param stateName Name of state whose abbreviation is desired.
    * @return State's abbreviation as a ByteBuffer; will return "UK"
    *    if provided state name is unexpected value.
    */
   private ByteBuffer getStateAbbreviationAsByteBuffer(final String stateName)
   {
      final String upperCaseStateName = stateName != null ? stateName.toUpperCase().replace(" ", "_") : "UNKNOWN";
      String abbreviation;
      try
      {
         abbreviation =  upperCaseStateName.length() == 2
                       ? State.fromAbbreviation(upperCaseStateName).getStateAbbreviation()
                       : State.valueOf(upperCaseStateName).getStateAbbreviation();
      }
      catch (Exception exception)
      {
         abbreviation = State.UNKNOWN.getStateAbbreviation();
      }
      return ByteBuffer.wrap(abbreviation.getBytes());
   }
}

The above class listing references the State enum, which is shown next.

State.java

package dustin.examples.cassandra.cqltypes;

/**
 * Representation of state in the United States.
 */
public enum State
{
   ALABAMA("Alabama", "AL"),
   ALASKA("Alaska", "AK"),
   ARIZONA("Arizona", "AZ"),
   ARKANSAS("Arkansas", "AR"),
   CALIFORNIA("California", "CA"),
   COLORADO("Colorado", "CO"),
   CONNECTICUT("Connecticut", "CT"),
   DELAWARE("Delaware", "DE"),
   DISTRICT_OF_COLUMBIA("District of Columbia", "DC"),
   FLORIDA("Florida", "FL"),
   GEORGIA("Georgia", "GA"),
   HAWAII("Hawaii", "HI"),
   IDAHO("Idaho", "ID"),
   ILLINOIS("Illinois", "IL"),
   INDIANA("Indiana", "IN"),
   IOWA("Iowa", "IA"),
   KANSAS("Kansas", "KS"),
   LOUISIANA("Louisiana", "LA"),
   MAINE("Maine", "ME"),
   MARYLAND("Maryland", "MD"),
   MASSACHUSETTS("Massachusetts", "MA"),
   MICHIGAN("Michigan", "MI"),
   MINNESOTA("Minnesota", "MN"),
   MISSISSIPPI("Mississippi", "MS"),
   MISSOURI("Missouri", "MO"),
   MONTANA("Montana", "MT"),
   NEBRASKA("Nebraska", "NE"),
   NEVADA("Nevada", "NV"),
   NEW_HAMPSHIRE("New Hampshire", "NH"),
   NEW_JERSEY("New Jersey", "NJ"),
   NEW_MEXICO("New Mexico", "NM"),
   NORTH_CAROLINA("North Carolina", "NC"),
   NORTH_DAKOTA("North Dakota", "ND"),
   NEW_YORK("New York", "NY"),
   OHIO("Ohio", "OH"),
   OKLAHOMA("Oklahoma", "OK"),
   OREGON("Oregon", "OR"),
   PENNSYLVANIA("Pennsylvania", "PA"),
   RHODE_ISLAND("Rhode Island", "RI"),
   SOUTH_CAROLINA("South Carolina", "SC"),
   SOUTH_DAKOTA("South Dakota", "SD"),
   TENNESSEE("Tennessee", "TN"),
   TEXAS("Texas", "TX"),
   UTAH("Utah", "UT"),
   VERMONT("Vermont", "VT"),
   VIRGINIA("Virginia", "VA"),
   WASHINGTON("Washington", "WA"),
   WEST_VIRGINIA("West Virginia", "WV"),
   WISCONSIN("Wisconsin", "WI"),
   WYOMING("Wyoming", "WY"),
   UNKNOWN("Unknown", "UK");

   private String stateName;

   private String stateAbbreviation;

   State(final String newStateName, final String newStateAbbreviation)
   {
      this.stateName = newStateName;
      this.stateAbbreviation = newStateAbbreviation;
   }

   public String getStateName()
   {
      return this.stateName;
   }

   public String getStateAbbreviation()
   {
      return this.stateAbbreviation;
   }

   public static State fromAbbreviation(final String candidateAbbreviation)
   {
      State match = UNKNOWN;
      if (candidateAbbreviation != null && candidateAbbreviation.length() == 2)
      {
         final String upperAbbreviation = candidateAbbreviation.toUpperCase();
         for (final State state : State.values())
         {
            if (state.stateAbbreviation.equals(upperAbbreviation))
            {
               match = state;
            }
         }
      }
      return match;
   }
}

We can also provide an implementation of the TypeSerializer interface returned by thegetSerializer() method shown above. That class implementing TypeSerializer is typically most easily written by extending one of the numerous existing implementations of TypeSerializer that Cassandra provides in the org.apache.cassandra.serializers package. In my example, my custom Serializer extends AbstractTextSerializer and the only method I need to add has the signature public void validate(final ByteBuffer bytes) throws MarshalException. Both of my custom classes need to provide a reference to an instance of themselves via static access. Here is the class that implementsTypeSerializer via extension of AbstractTypeSerializer:

UnitedStatesStateSerializer.java - Implements TypeSerializer

package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.serializers.AbstractTextSerializer;
import org.apache.cassandra.serializers.MarshalException;

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

/**
 * Serializer for UnitedStatesState.
 */
public class UnitedStatesStateSerializer extends AbstractTextSerializer
{
   public static final UnitedStatesStateSerializer instance = new UnitedStatesStateSerializer();

   private UnitedStatesStateSerializer()
   {
      super(StandardCharsets.UTF_8);
   }

   /**
    * Validates provided ByteBuffer contents to ensure they can
    * be modeled in the UnitedStatesState Cassandra/CQL data type.
    * This allows for a full state name to be specified or for its
    * two-digit abbreviation to be specified and either is considered
    * valid.
    *
    * @param bytes ByteBuffer whose contents are to be validated.
    * @throws MarshalException Thrown if provided data is invalid.
    */
   @Override
   public void validate(final ByteBuffer bytes) throws MarshalException
   {
      try
      {
         final String stringFormat = new String(bytes.array()).toUpperCase();
         final State state =  stringFormat.length() == 2
                            ? State.fromAbbreviation(stringFormat)
                            : State.valueOf(stringFormat);
      }
      catch (Exception exception)
      {
         throw new MarshalException("Invalid model cannot be marshaled as UnitedStatesState.");
      }
   }
}

With the classes for creating a custom CQL data type written, they need to be compiled into .class files and archived in a JAR file. This process (compiling with javac -cp "C:\Program Files\DataStax Community\apache-cassandra\lib\*" -sourcepath src -d classes src\dustin\examples\cassandra\cqltypes\*.java and archiving the generated .class files into a JAR named CustomCqlTypes.jar with jar cvf CustomCqlTypes.jar *) is shown in the following screen snapshot.

The JAR with the class definitions of the custom CQL type classes needs to be placed in the Cassandra installation's lib directory as demonstrated in the next screen snapshot.

With the JAR containing the custom CQL data type classes implementations in the Cassandra installation's libdirectory, Cassandra should be restarted so that it will be able to "see" these custom data type definitions.

The next code listing shows a Cassandra Query Language (CQL) statement for creating a table using the new custom type dustin.examples.cassandra.cqltypes.UnitedStatesState.

createAddress.cql

CREATE TABLE us_address
(
   id uuid,
   street1 text,
   street2 text,
   city text,
   state 'dustin.examples.cassandra.cqltypes.UnitedStatesState',
   zipcode text,
   PRIMARY KEY(id)
);

The next screen snapshot demonstrates the results of running the createAddress.cql code above by describing the created table in cqlsh.

The above screen snapshot demonstrates that the custom typedustin.examples.cassandra.cqltypes.UnitedStatesState is the type for the state column of theus_address table.

A new row can be added to the US_ADDRESS table with a normal INSERT. For example, the following screen snapshot demonstrates inserting an address with the command INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'New York', '10118');:

Note that while the INSERT statement inserted "New York" for the state, it is stored as "NY".

If I run an INSERT statement in cqlsh using an abbreviation to start with (INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118');), it still works as shown in the output shown below.

In my example, an invalid state does not prevent an INSERT from occurring, but instead persists the state as "UK" (for unknown) [see the implementation of this inUnitedStatesState.getStateAbbreviationAsByteBuffer(String)].

One of the first advantages that comes to mind justifying why one might want to implement a custom CQL datatype in Java is the ability to employ behavior similar to that provided by check constraints in relational databases. For example, in this post, my sample ensured that any state column entered for a new row was either one of the fifty states of the United States, the District of Columbia, or "UK" for unknown. No other values can be inserted into that column's value.

Another advantage of the custom data type is the ability to massage the data into a preferred form. In this example, I changed every state name to an uppercase two-digit abbreviation. In other cases, I might want to always store in uppercase or always store in lowercase or map finite sets of strings to numeric values. The custom CQL datatype allows for customized validation and representation of values in the Cassandra database.

Conclusion

This post has been an introductory look at implementing custom CQL datatypes in Cassandra. As I play with this concept more and try different things out, I hope to write another blog post on some more subtle observations that I make. As this post shows, it is fairly easy to write and use a custom CQL datatype, especially for Java developers.

Published at DZone with permission of Dustin Marx, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)