We implemented a simple, highly decentralized, clustered task distribution system in roughly 500 lines
of code and 5 classes. The system is failure resilient, because all nodes are peers and there's no central
All peers are equal (every peer can act as both master and slave) and tasks are grabbed by a node based
on an ID assigned by the submitter (master).
Crashes or graceful termination of nodes doesn't lead to missing tasks, as the system re-balances itself
and assigns orphaned tasks to another node in the cluster.
The system is so small because it runs on top of JGroups. Had we written it without JGroups, we would
have had to implement the following functionality ourselves:
- Cluster membership and failure detection: we need to know when the membership changes, and
all nodes in a cluster need to get these views in exactly the same order
- Simulcasting (with UDP datagrams): fragmentation (if a task or result is greater than 65K) and
retransmission (UDP datagrams are lost), plus suppression of duplicate messages (we cannot receive the same task multiple times !).
- Simple switching between transports (UDP, TCP), and configuration/customization of the
transport: e.g. adding compression or encryption
The current task distribution is far from complete (after all, this is just a demo of what can be done with JGroups !); possible further improvements include:
- Implementation java.util.concurrent.ExecutorService. This would extend the in-VM thread pool
to become a clustered thread pool, where tasks are executed not only by threads in the same
JVM, but also by threads on different hosts. This would allow masters to submit tasks (for
example a collection of tasks) and to wait for their completion later. In our current solution, the
thread of the caller of submit() is blocked until a timeout occurs or the result becomes available.
- Not all nodes store the task, but only a subset of the nodes. When a node X crashes, we ask
everyone for the tasks assigned to X, and these are returned by the nodes who stored them.
- Use random numbers to create ClusterIDs rather than monotonically increasing ints. Currently,
we use a round robin ID allocation scheme. While this is pretty good at distributing all tasks
evenly, it might be better in some cases to assign weights to individual cluster nodes, according
to number of cores, memory etc. Thus, tasks could be assigned more optimally, whereas in the
current solution we assign all tasks evenly, which means slower hosts get the same number of
tasks as faster hosts. We should probably externalize the policy which creates the IDs and/or
picks the nodes, so it can be replaced.
The full demo code can be downloaded here