Cloud Zone is brought to you in partnership with:

Build a Cloud.org is a resource for those users who want to build cloud computing software with both open source and proprietary software. Mark is a DZone MVB and is not an employee of DZone and has posted 71 posts at DZone. You can read more from them at their website. View Full User Profile

A Look at Riak CS from Basho

10.03.2013
| 4808 views |
  • submit to reddit

This post comes from Sebastien Goasguen at the Citrix blog.

Playing with Basho Riak CS Object Store

CloudStack deals with the computing side of an IaaS, the storage side for most of us these days consists of a scalable, fault-tolerant object store is left to other software. Ceph led by inktank and RiakCS from Basho are the two most talked about object stores these days. In this post we look at RiakCS and take it for a quick whirl. CloudStack integrates with RiakCS for secondary storage and together they can offer an EC2 and a true S3 interface, backed by a scalable object store. So here it is.

While RiakCS (Cloud Storage) can be seen as an S3 back end implementation, it is based on Riak. Riak is a highly available, distributed NoSQL database. The use of a consistent hashing algorithm allows Riak to re-balance the data when the node disappears (e.g fails), and when nodes appear (e.g., increased capacity), it also allows you to manage replication of data with an eventual consistency principle typical of large-scale distributed storage systems that favor availability over consistency.

To get functioning RiakCS storage, we need Riak, RiakCS and Stanchion. Stanchion is an interface that serializes HTTP requests made to RiakCS.

A Taste of Riak

To get started, let's play with Riak and build a cluster on our local machine. Basho has some great documentation, the toughest thing will be to install Erlang (and by tough I mean a 2-minute deal), but again the docs are very helpful and give step-by-step instructions for almost any OS.

There is no need for me to recreate step-by-step instructions since the docs are so great, but the gist is that with the quickstart guide we can create a Riak cluster on “localhost”. We are going to start five Riak nodes (we could start more) and join them into a cluster. This is as simple as:

    bin/riak start
    bin/riak-admin cluster join dev1@127.0.0.1

Where “dev1” was the first Riak node started. Creating this cluster will re-balance the ring:

    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid     100.0%     20.3%    'dev1@127.0.0.1'
    valid       0.0%     20.3%    'dev2@127.0.0.1'
    valid       0.0%     20.3%    'dev3@127.0.0.1'
    valid       0.0%     20.3%    'dev4@127.0.0.1'
    valid       0.0%     18.8%    'dev5@127.0.0.1'

The “riak-admin” command is a nice CLI to manage the cluster. We can check the membership of the cluster we just created, after some time the ring will have re-balanced to the expected state.

    dev1/bin/riak-admin member-status
    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid      62.5%     20.3%    'dev1@127.0.0.1'
    valid       9.4%     20.3%    'dev2@127.0.0.1'
    valid       9.4%     20.3%    'dev3@127.0.0.1'
    valid       9.4%     20.3%    'dev4@127.0.0.1'
    valid       9.4%     18.8%    'dev5@127.0.0.1'
    -------------------------------------------------------------------------------
    Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
   
    dev1/bin/riak-admin member-status
    ================================= Membership ==================================
    Status     Ring    Pending    Node
    -------------------------------------------------------------------------------
    valid      20.3%      --      'dev1@127.0.0.1'
    valid      20.3%      --      'dev2@127.0.0.1'
    valid      20.3%      --      'dev3@127.0.0.1'
    valid      20.3%      --      'dev4@127.0.0.1'
    valid      18.8%      --      'dev5@127.0.0.1'
    -------------------------------------------------------------------------------

You can then test your cluster by putting an image as explained in the docs and retrieving it in a browser (an HTTP GET):

    curl -XPUT http://127.0.0.1:10018/riak/images/1.jpg 
         -H "Content-type: image/jpeg" 
         --data-binary @image_name_.jpg

Open the browser to “http://127.0.0.1:10018/riak/images/1.jpg”. It's as easy as 1, 2, 3.

Installing Everything on Ubuntu 12.04

To move forward and build a complete S3-compatible object store, let's set everything up on an Ubuntu 12.04 machine. Back to installing “riak”, get the repo keys and set up a “basho.list” repository:

    curl http://apt.basho.com/gpg/basho.apt.key | sudo apt-key add -
    bash -c "echo deb http://apt.basho.com $(lsb_release -sc) main > /etc/apt/sources.list.d/basho.list"
    apt-get update

And grab “riak”, “riak-cs” and “stanchion”. I am not sure why but their great docs make you download the .debs separately and use “dpkg”.

    apt-get install riak riak-cs stanchion

Check that the binaries are in your path with “which riak”, “which riak-cs” and “which stanchion”, you should find everything in “/usr/sbin”. All configuration will be in “/etc/riak”, “/etc/riak-cs” and “/etc/stanchion” inspect especially the “app.config” which we are going to modify before starting everything. Note that all binaries have a nice usage description, it includes a console, a ping method and a restart among others:

    Usage: riak {start | stop| restart | reboot | ping | console | attach | 
                        attach-direct | ertspath | chkconfig | escript | version | 
                        getpid | top [-interval N] [-sort reductions|memory|msg_q] [-lines N] }

Configuration

Before starting anything we are going to configure every component, which means editing the “app.config” files in each respective directory. For “riak-cs” I only made sure to set “{anonymous_user_creation, true}”, I did nothing for configuring “stanchion” as I used the default ports and ran everything on “localhost” without “ssl”. Just make sure that you are not running any other application on port “8080” as “riak-cs” will use this port by default. For configuring “riak” see the documentation, it sets a different back end than what we used in the “tasting” phase. :) With all these configuration done you should be able to start all three components:

    riak start
    riak-cs start
    stanchion start

You can “ping” every component and check the console with “riak ping”, “riak-cs ping” and “stanchion ping”, I'll let you figure out the console access. Create an admin user for “riak-cs”:

    curl -H 'Content-Type: application/json' -X POST http://localhost:8080/riak-cs/user \
         --data '{"email":"
 foobar@example.com", "name":"admin user"}'

If this returns successfully this should be a good indication that your set-up is working properly. In the response we recognized API and secret keys:

    {"email":"
 foobar@example.com",
     "display_name":"foobar",
     "name":"admin user",
     "key_id":"KVTTBDQSQ1-DY83YQYID",
     "key_secret":"2mNGCBRoqjab1guiI3rtQmV3j2NNVFyXdUAR3A==",
     "id":"1f8c3a88c1b58d4b4369c1bd155c9cb895589d24a5674be789f02d3b94b22e7c",
     "status":"enabled"}

Let's take those and put them in our “riak-cs” configuration file, there are “admin_key” and “admin_secret” variables to set. Then restart with “riak-cs restart”. Don't forget to also add those in the “stanchion” configuration file “/etc/stanchion/app.config” and restart it “stanchion restart”.

Using our New Cloud Storage with Boto

Since Riak-CS is an S3-Compatible cloud storage solution, we should be able to use an S3 client like Python Boto to create buckets and store data. Let's try. You will need Boto of course, “apt-get install python-boto” and then open an interactive shell “python”. Import the modules and create a connection to “riak-cs”:

    >>> from boto.s3.key import Key
    >>> from boto.s3.connection import S3Connection
    >>> from boto.s3.connection import OrdinaryCallingFormat
    >>> apikey='KVTTBDQSQ1-DY83YQYID'
    >>> secretkey='2mNGCBRoqjab1guiI3rtQmV3j2NNVFyXdUAR3A=='
    >>> cf=OrdinaryCallingFormat()
    >>> conn=S3Connection(aws_access_key_id=apikey,aws_secret_access_key=secretkey,
                          is_secure=False,host='localhost',port=8080,calling_format=cf)
 

Now you can list the bucket, which will be empty at first. Then create a bucket and store content in it with various keys:

    >>> conn.get_all_buckets()
    []
    >>> bucket=conn.create_bucket('riakbucket')
    >>> k=Key(bucket)
    >>> k.key='firstkey'
    >>> k.set_contents_from_string('Object from first key')
    >>> k.key='secondkey'
    >>> k.set_contents_from_string('Object from second key')
    >>> b=conn.get_all_buckets()[0]
    >>> k=Key(b)
    >>> k.key='secondkey'
    >>> k.get_contents_as_string()
    'Object from second key'
    >>> k.key='firstkey'
    >>> k.get_contents_as_string()
    'Object from first key'

And that's it, an S3 compatible object store backed by a NoSQL distributed database that uses consistent hashing, all of it in Erlang. Automate all of it with Chef recipe. Hook that up to your CloudStack EC2 compatible cloud, use it as secondary storage to hold templates or make it a public facing offering and you have the second leg of the Cloud: storage. Sweet... In my next post I will show you how to use it with CloudStack.



Published at DZone with permission of Mark Hinkle, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)