Big Data/Analytics Zone is brought to you in partnership with:

Hi, my name is Animesh Kumar. I am a technologist and work in the research lab of a medium scale, innovative, product engineering/outsourcing company, headquartered in India. My primary area of interest is search, high performance computing, distributed computing, web and collaborative machine learning systems. I am also excited about product building and engineering. In my previous avatar, I was a Software Architect and dealt with language solutions and built world’s first Hindi portal, world’s first Hindi TTS, world’s first lingual WAP solution and world’s first light weight embeddable lingual input-output control. My most important work focused on technologies that connect people with low or nil English literacy with the mainstream internet. Mywebdunia was one such medium that let people come forward and speak and share with each other without any language barrier. My life will be spent using the smartness of engineering to affect and empower lives. A good software is a piece of art. It doesn’t intrude, it assists. It empowers. Helps us do more, better. It is beautiful and makes us feel confident about human intellect, and the endless possibilities. I am passionate about ideas, people, behavior and change. I love art, literature, poetry, technology and development. I watch TED. I read Dr. Dobb’s Journal and The New Yorker. I love to read literature, especially by Coetzee, Kundera, Rushdie, Pamuk and Enright. I’m a big fan of a crazy theoretician, freaky thinker, sane poetess, and the most admired product manager. Whenever I get time and inspiration, I write poetry, haiku and short fiction. You might want to read some facts about me. http://anismiles.wordpress.com/me/ Animesh has posted 2 posts at DZone. View Full User Profile

ZooKeeper Primer

06.14.2010
| 18753 views |
  • submit to reddit

Distributed collaborative applications involve a set of processes or agents interacting with one another to accomplish a common goal. They execute on Wide Area environments with little or no knowledge of the infrastructure and almost no control over the resources available. Besides, they need to sequence and order events, and ensure atomicity of actions. Above all, the application needs to keep itself from nightmarish bugs like race conditions, deadlocks and partial failures.

ZooKeeper helps to build a distributed application by working as a coordination service provider.

It’s reliable and highly available. It exposes a simple set of primitives upon which distributed applications can build higher level services for

  • Synchronization,
  • Configuration Maintenance,
  • Groups,
  • Naming,
  • Leader elections and other niche needs.

What lies beneath?

ZooKeeper maintains a shared hierarchical namespace modeled after standard file systems. The namespace consists of data registers, called znodes. They are similar to files and directories.

Note: Znodes store data in Memory primarily, with a logged backup on disk for reliability. It means that whatever data znodes can keep must fit into memory, hence it must be small, max to 1MB. On the other hand, it means high throughput and low latency.

Znodes are identified by unique absolute paths which are “/” delimited Unicode strings. To help achieve uniqueness, ZooKeeper provides sequential znodes where a globally maintained sequence number will be appended by ZooKeeper to paths, i.e. path “/zoo-1/tiger/white-” can be assigned with a sequence, say 5, and will become “/zoo-1/tiger/white-5”.

  1. A client can create a znode, store up to 1MB of data and associate as many as children znodes as it wants.
  2. Data access to and fro a znode is always atomic. Either the data is read and/or written in its entirety or it fails.
  3. There are no renames and no append semantics available.
  4. Each znode has an Access Control List (ACL) that restricts who can do what.
  5. Znodes maintain version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates.

Znodes can be one of two types: ephemeral and persistent. Once set, the type can’t be changed.

  1. Ephemeral znodes are deleted by ZooKeeper when the creating client’s session gets closed, while persistent znodes stay as long as not deleted explicitly.
  2. Ephemeral znodes can’t have children.
  3. Both types of znodes are visible to all clients eligible with ACL policy.

Up and Running

There are enough literature on installing ZooKeeper on Linux machine already. So, I am going to focus how to install ZooKeeper on Windows machines.

  1. Download and install Cygwin. http://www.cygwin.com/
  2. Download stable release of ZooKeeper. http://hadoop.apache.org/zookeeper/releases.html
  3. Unzip ZooKeeper to some directory, say, D:/iLabs/zookeeper-3.3.1
  4. Add a new environment variable ZOOKEEPER_INSTALL and point it to D:/iLabs/zookeeper-3.3.1
  5. Edit PATH variable and append $ZOOKEEPER_INSTALL/bin to it.
  6. Now start Cygwin.

Now, start ZooKeeper server.

$ zkServer.sh start

ouch! It threw an error:

ZooKeeper exited abnormally because it could not find the configuration file, zoo.cfg, which it expects in
$ZOOKEEPER_INSTALL/conf directory. This is a standard Java properties file.

Go ahead and create zoo.cfg file in the conf directory. Open it up, and add below properties:

# The number of milliseconds of each tick
tickTime=2000

# The directory where the snapshot is stored.
dataDir=D:/iLabs/zoo-data/

# The port at which the clients will connect
clientPort=2181

Go back to Cygwin, and issue the same command again. This time ZooKeeper should load properly.

Now, connect to ZooKeeper. You should probably open a new Cygwin window, and issue the following command.

$ zkCli.sh

This will connect to your ZooKeeper server running at localhost:2181 by default, and will open zk console.

Let’s create a znode, say /zoo-1

[zk: localhost:2181<CONNECTED> 1] create -s /zoo-1 “Hello World!” null

Flag –s creates a persistent znode. Hello World! is the data you assign to znode (/zoo-1) and null is its ACL.

To see all znodes,

[zk: localhost:2181<CONNECTED> 2] ls /
[zoo-1, zookeeper]

This means, there are 2 nodes at the root level, /zoo-1 and /zookeeper. ZooKeeper uses the /zookeeper sub-tree to store management information, such as information on quotas.

For more commands, type help. If you want to further explore on the command line tools, refer: http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html

continue reading the primer >>

Published at DZone with permission of its author, Animesh Kumar.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)