Bela has posted 1 posts at DZone. View Full User Profile

A Simple Clustered Task Distribution System

10.06.2008
| 56357 views |
  • submit to reddit

This article will introduce the basic concepts of JGroups and then implement a task distribution system (on top of JGroups), where tasks can be placed into the cluster and are executed by worker nodes.

I'll show that worker nodes can be added at run time to add more processing power, or taken down when we don't have much load. Plus, tasks assigned to workers who subsequently crash are automatically reassigned to live nodes.

We could have implemented this with JMS queues. However, when we have a lot of load, the JMS server tends to become the bottleneck. In our decentralized solution, every node in the cluster can be both a master (who submits tasks) and a slave (who executes the tasks and returns the results).

Overview

JGroups is a clustering library. Applications can use JGroups to join a cluster, send messages to the cluster nodes, get notifications when other nodes join or leave (including crashes), and leave a cluster. Its task is the reliable sending of messages within a cluster. Its scope is much smaller than JMS; JGroups doesn't know about queues, topics and transactions, but only about message sending.

The main feature of JGroups is the protocol stack and the resulting configuration flexibility.
Applications can pick the properties they would like in a cluster by simply editing an XML file.
For example, an application can add compression by simply adding the COMPRESS protocol to the
configuration.

Or it can remove fragmentation because its messages will always be smaller than 65K (over UDP), or because it uses TCP as transport.
Another application might add encryption and authentication, so messages are encrypted and only
nodes which present a valid X.509 certificate can join the cluster.
Applications are even free to write their own protocols (or extend an existing one), and add them to the
configuration. It might be useful for example, to add a protocol which keeps track of all messages sent
and received over a cluster, for auditing or statistics purposes. The architecture of JGroups is shown in fig. 1.


The main API for clients is a Channel (see below) which is used to send and receive messages. When a
message is sent, it is passed down the protocol stack. The stack is a list of protocols, and each protocol
gets a chance to do something with the message.
For example, a fragmentation protocol might check the size of the message. If the message is greater
than the configured size, it might fragment it into multiple smaller messages and send those down the
stack.

On the receive side, the fragmentation protocol would queue the fragments, until all have been
received, then assemble them into the original message and pass it up.
The protocols shipped with JGroups can be divided into the following categories:

● Transport: sending and receiving of messages. UDP uses IP multicasting and/or UDP
datagrams. TCP uses TCP connections.
● Discovery: initial discovery of nodes
● Merging: after a network partition heals, this merges the sub-clusters back into one
● Failure detection: monitoring of cluster nodes and notifications of potential crashes or hangs
● Reliable delivery: makes sure a message is not lost, received only once, and received in the
order in which a sender sent it. This is done through assigning sequence numbers to each
message and through retransmission in case of a missing message.
● Stability: nodes have to buffer all messages (for potential retransmission). The stability protocol
makes sure that periodically (or based on accumulated size), messages that have been received
by all cluster nodes are purged so they can be garbage collected.
● Group membership: keeps track of the nodes in a cluster, and notifies the application of node
joins and leaves (including crashes)
● Flow control: makes sure that a sender cannot send messages faster than the receivers can
process them, over a longer time. This is necessary to prevent out-of-memory situations. Flow
control is a counter part to stability.
● Fragmentation: fragments large messages into smaller ones and re-assembles them at the
receivers
● State transfer: makes sure that the shared state of a cluster (e.g. all HTTP sessions) is transferred
correctly to a new node
● Compression: compresses messages and uncompresses them at the receivers
● Encryption: encrypts messages
● Authentication: prevents unauthorized node from joining a cluster

Published at DZone with permission of its author, Bela Ban.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)