Dmitriy Setrakyan manages daily operations of GridGain Systems and brings over 12 years of experience to GridGain Systems which spans all areas of application software development from design and architecture to team management and quality assurance. His experience includes architecture and leadership in development of distributed middleware platforms, financial trading systems, CRM applications, and more. Dmitriy is a DZone MVB and is not an employee of DZone and has posted 57 posts at DZone. You can read more from them at their website. View Full User Profile

GridGain - Why Runtime Metrics Are Important

  • submit to reddit

Let me ask you - how many grid computing products do you know that provide runtime statistics of of the grid via regular API? I assume not many (if any). Some grid products I know don't even expose their grid topology - they treat cluster as one black box.

Well, in GridGain we have a concept of Node Metrics which provide almost real-time information about activity on every grid node. These metrics include current and average values for CPU utilization, heap, thread stats, job execution time (current and average), number of running/rejected/cancelled jobs, size of waiting queue, job wait time, total number of executed jobs and a lot more useful node runtime information.

So, why do we do that you may ask. The answer is simple - this data is very useful when you really want to have a fine-grained control on how your jobs are distributed across grid nodes. For example, what if you want to segment your grid based on CPU utilization and execute your jobs only on nodes with CPU load under 50%? Or what if you need to adapt to average CPU load or job execution time in order to send more jobs to the nodes that can process your computations faster?

In fact, our Adaptive Load Balancing SPI does just that. On top of providing several out-of-the-box implementations, we allow users to plug any custom adaptive behavior suitable for their applications. Here is how simple it is to implement a policy that adapts to job processing time on top of GridGain and returns a load score for a node in near-real time (note that we use Node Metrics to detect current and average job processing time):

public class GridAdaptiveProcessingTimeLoadProbe
implements GridAdaptiveLoadProbe {
* Returns node's load score
* based on job processing time.
public double getLoad(
GridNode node,
int jobsSentSinceLastUpdate) {
// Acquire node metrics.
GridNodeMetrics metrics = node.getMetrics();

if (useAverageMetrics == true) {
// Return score based on average data.
metrics.getAverageJobExecuteTime() +

// Return current metrics score.
metrics.getCurrentJobExecuteTime() +

You can download GridGain here. Enjoy grid computing!

Published at DZone with permission of Dmitriy Setrakyan, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)