Bela has posted 1 posts at DZone. View Full User Profile

A Simple Clustered Task Distribution System

10.06.2008
| 56194 views |
  • submit to reddit

Demo

Let's see whether this thing works ! The demo JAR can be downloaded here. Let's start some instances and submit some tasks. To start an instance, we run:

[linux]/home/bela/JGroupsArticles/dist$ java -Djgroups.bind_addr=192.168.1.5 -jar demo.jar
-------------------------------------------------------
GMS: address is 192.168.1.5:33795
-------------------------------------------------------
view: [192.168.1.5:33795|0] [192.168.1.5:33795]
my rank is 0
[1] Submit [2] Submit long running task [3] Info [q] Quit

Replace the IP address set with -Djgroups.bind_addr with the one of a valid NIC. If you don't set this property, JGroups picks a random NIC.

We can see that we're the first node in the cluster, our local address is 192.168.1.5:33795 and our rank
is 0. When we submit a task, we see that it is executed by our self, since we're the only node in the cluster:

[1] Submit [2] Submit long running task [3] Info [q] Quit
1
==> submitting 192.168.1.5:33795::1
executing 192.168.1.5:33795::1
<== result = Tue Sep 09 12:38:55 CEST 2008

Let's start a second instance:

[linux]/home/bela/JGroupsArticles/dist$ java -Djgroups.bind_addr=192.168.1.5 -jar demo.jar
-------------------------------------------------------
GMS: address is 192.168.1.5:41232
-------------------------------------------------------
view: [192.168.1.5:33795|1] [192.168.1.5:33795, 192.168.1.5:41232]
my rank is 1
[1] Submit [2] Submit long running task [3] Info [q] Quit

We can see that the view now has 2 members: 192.168.1.5:33795 with rank=0 and 192.168.1.5:41232
(the second instance started) with rank=1. Note that for this demo, we start all instances as separate
processes on the same host, but of course we would place those processes on different hosts in real life.

If we now go back to the first instance and submit 2 tasks, we can see that they are assigned to both
instances:

1
==> submitting 192.168.1.5:33795::2
executing 192.168.1.5:33795::2
<== result = Tue Sep 09 12:43:48 CEST 2008
[1] Submit [2] Submit long running task [3] Info [q] Quit
1
==> submitting 192.168.1.5:33795::3
<== result = Tue Sep 09 12:43:49 CEST 2008


Task #2 was executed by our self, but task #3 was executed by the second instance (this can be verified
by looking at the output of the second instance).

Let's now start a third instance:

[linux]/home/bela/JGroupsArticles/dist$ java -Djgroups.bind_addr=192.168.1.5 -jar demo.jar
-------------------------------------------------------
GMS: address is 192.168.1.5:45532
-------------------------------------------------------
view: [192.168.1.5:33795|2] [192.168.1.5:33795, 192.168.1.5:41232, 192.168.1.5:45532]
my rank is 2
[1] Submit [2] Submit long running task [3] Info [q] Quit



We see that the cluster now has 3 nodes, and the rank of the freshly started instance is 2.
Now we'll submit a long running task T and - before T completes - kill the node which is processing T.

Let's submit that task on the third instance:

[1] Submit [2] Submit long running task [3] Info [q] Quit
2
==> submitting 192.168.1.5:45532::1

Because the second instance has rank=1, task #1 from 192.168.1.5:45532 is executed on that instance.
Before the 15 seconds elapse, let's kill the second instance. After a few seconds, the output of the third
instance shows the following:

view: [192.168.1.5:33795|3] [192.168.1.5:33795, 192.168.1.5:45532]
my rank is 1
**** taking over task 192.168.1.5:45532::1 from 192.168.1.5:41232 (submitted by
192.168.1.5:45532)
executing 192.168.1.5:45532::1
sleeping for 15 secs...
done
<== result = Tue Sep 09 12:55:10 CEST 2008


This might be somewhat surprising, but correct. Let's see what's happening.

First we get a view change, the new view is 192.168.1.5:33795, 192.168.1.5:45532. This means that the
third instance has now rank=1, which is exactly the rank the killed instance had. Therefore when task
#1 is reassigned, it is the third node 192.168.1.5:41232 which executes #1.

This happens to be the same node as the submitter, but that's okay: since we only have 2 node left in
our cluster, there is a 50% chance of the submitter processing its own task. If we had more nodes in the
cluster, the likelihood of a submitter processing its own task would decrease.

Published at DZone with permission of its author, Bela Ban.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)