DevOps Zone is brought to you in partnership with:

Patrick Debois has been working on closing the gap between development and operations for many years. In 2009 he organized the first devopsdays.org conference and since then the world is stuck with the term 'devops'. Always seeking for opportunities to optimize the global IT instead of local optimizations. Patrick is a DZone MVB and is not an employee of DZone and has posted 39 posts at DZone. You can read more from them at their website. View Full User Profile

Monitoring Wonderland Survey - Metrics - API - Gateways

01.13.2012
| 6304 views |
  • submit to reddit
Patrick Dubois will give an overview of somesome of the more popular open source monitoring tools and he'll show you how to make several of these tools work together efficiently and effectively so that you're not pulling your hair out because of all the disparate monitoirng utilities.

One tool to rule them all? Not.

If you are working within an enterprise , chances are that you have different metric systems in place: You might have some Cacti, Ganglia, Collectd, etc... due to historical reasons, different departments,

This reminded me of the situation while I was working in Identity Management: you might have an LDAP, Active Directory, local HR database etc. There would be plans and discussions of using one over the other, and gateways would need to be written. I learned a few lessons there:

  1. Have as few sources/stores of information as possible
  2. DON't try to chase the one tool to rule them all, aka don't use a tool for something it's not made for
  3. Make it self-servicing to user and automate processes



1 to 1 gateways

Take the new Metrics hotness Graphite as an example, it has some nice graphing advantages over other tools . So people wonder , should I migrate my Ganglia, Collectd to Graphite? Graphite doesn't come with elaborate collection scripts for memory/disk/etc ... , so we have to rely on other tools like Cacti,Munin,Collectd,Ganglia to first collect the data.

So we start writing gateways to get data into Graphite:



But what happens if we also use Opentsdb for storing long term data ? We have to re-implement those gateways:



Issue 1 : Effort duplication

This just seems like a waste of energy implementing the protocol in every tool.This sure isn't the first time this happens in history: the same thing happened for Collectd -> Ganglia Plugin

If you look at the data that is transmitted it is actually pretty much the same:

a metric name, value, timestamp, optionally hostname, some metadata tags

So we could easily envision a 'universal' format that would be used to translate from and to.

Ganglia  <-> Intermediate format <-> Graphite
Collectd <-> Intermediate format <-> Opentsdb

With this intermedia format, we would only have to write one end of the equation once.

I started thinking of this like an ffmpeg for monitoring

Issue 2: Difficult to hook in additional listeners

Let's add another system that wants to listen into the metrics, something like Esper, Nagios alerting, some Dataware house tools etc... We could reuse the libraries from end to the other, but we'll have to add more gateways and put these in place everytime.

A better approach would be to use a message bus approach: every tools puts and listens on a bus and gets the data it needed. RI Pienaar has written about this approach extensively in his Series on Common Messaging Patterns. Aso John Bergmans has a great post on using AMQP and Websockets to get realtime graphics.

Some of the tools already have Message queue integrations, but there seems to be a common intermediate format missing



As a proof of concept I've created :



Building blocks

In this section I'll look for API's (ruby oriented) to get data in and out of the different metrics systems:

Graphite - IN

Sending metrics from ruby to Graphite:



These both implement the Simple Protocol, but for high performance we'd like to use the batching facility through the Pickle Format. I could not find a Pickle gem for ruby, but his could work through Ruby-Python gateway http://rubypython.rubyforge.org/.

Faster - a Java Netty based graphite relay takes the same approach https://github.com/markchadwick/graphite-relay

Another way to get your data into graphite is using Etsy's Logster https://github.com/etsy/logster

Mike Brittain greatly explains it's use in Take my logs... Please! - A velocity Online Conference SessionVideoPDF


Graphite - OUT

To get all the data out of Graphite is impossible through the standard API. You get a graph out as Raw data, but that hardly counts.

The best option seems to be to listen in to the graphite - udp receiver and duplicate the information onto a message bus.

An alternative might be to directly read from the Whisper storage, inspiration for that can be found in:



Opentsdb - IN

I could not find any ruby gem that implements the Opentsdb protocol for sending data, but creating one should be trivial. Opentsdb just use a plain TCP socket to get the data in

Opentsdb - OUT

Getting data out of Opentsdb suffers the same problem as Graphite: you can do queries on specific graph data



But you can't get it out, maybe if you directly interface with the Hbase/Java API. So again the best bet is to create a listener/proxy for the simple TCP protocol.


Ganglia - IN

Sending metrics to Ganglia is easy using the gmetric shell command. Early days code describing this can still be found at http://code.google.com/p/embeddedgmetric/

Igrigorik has written up nicely on how to use the Gmetric Ruby gem to send metrics



If you want to feed in log files into ganglia Logtailer might be your thing https://bitbucket.org/maplebed/ganglia-logtailer


Ganglia - OUT

Vladimir describes the options while he explains on how to get Ganglia data to graphite

Option 1 is to poll the Gmond over TCP and get the XML from it's current data:



Options 2 is to listen into the UDP protocol as a additional receiver.

I implemented both approaches in the https://github.com/jedi4ever/gmond-zmq

Note: As a side effect I found that the metrics send to the UDP are actualy more acurate then the values when you query the XML.

Collectd - IN

So send metrics to Collectd, you can use ruby gem from Astro that implements most of the UDP protocol



Collectd - OUT

I give Collectd for the price of best output.

It currently implements different writers:

  • Network plugin
  • UnixSock plugin
  • Carbon plugin
  • CSV
  • RRDCacheD
  • RRDtool
  • Write HTTP plugin



And the deactived ZeroMQ - https://github.com/deactivated/collectd-write-zmq

The Binary Protocol http://collectd.org/wiki/index.php/Binary_protocol is pretty simple to listen into.



Munin

If you happen to use Munin, here's some inspiration, but I haven't researched it much



Circonus

If you happen to use Circonus, here's some inspiration, but I haven't researched it much



RRD interaction from ruby

For those who want to read and write directly from RRD's in ruby, please have fun:



Alert on metrics:

With all the tools in and out, and a unified intermediate format, it will be trivial to rewrite the traditional alert check tools to listen into the bus for values. This means you can listen into for your Nagios, your ticket system, your pager system etc.. from the same source.

Graphite



Opentsdb



Ganglia



New Relic

https://github.com/kogent/check_newrelic


Conclusion

It should be feasible to create an intermediate format and reuse some of these libraries to implement both IN and OUT functionality. Why not create a Fog for monitoring information? Like implements metric receive, send,

Next stop Nagios because it deserves a blogpost on it's own ...

Published at DZone with permission of Patrick Debois, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)