NoSQL Zone is brought to you in partnership with:

Developer Advocate at MongoLab. Previously interned at VMware and NetApp. UCSD CSE 2013 Alum. Chris is a DZone MVB and is not an employee of DZone and has posted 20 posts at DZone. You can read more from them at their website. View Full User Profile

Using Fluentd and MongoDB serverStatus for Real-Time Metrics

07.08.2014
| 2826 views |
  • submit to reddit

As developers, we often look for tools to make our work and processes more efficient. Sometimes we have to search for what we’re looking for and sometimes we’re lucky enough that it finds us! When our friends over at Treasure Data wrote to me about Fluentd, an open-source logging daemon written in Ruby that they created and maintain, I immediately saw value for MongoDB users looking for a quick way to collect data streams and store information in MongoDB.

Intro to Fluentd

Fluentd is an open source data collector designed to simplify and scale log management. Open-sourced in October 2011, it has gained traction steadily over the last 2.5 years: today, Fluentd has a thriving community of ~50 contributors and 1,900+ stargazers on GitHub with companies like Slideshare and Nintendo deploying it across hundreds of machines in production.

Fluentd has broad use cases: Slideshare integrates it into their company-wide infrastructure monitoring system, and Change.org uses it to route their log streams into various backends.

Most relevant to MongoDB developers, many folks use Fluentd to aggregate logs into MongoDB. The MongoDB community was one of the first to take notice of Fluentd, and the MongoDB plugin is one of the most downloaded Fluentd plugins to date.

Tutorial: Using MongoDB serverStatus for real-time & historical metrics

Today we’ll provide a tutorial on using Fluentd with MongoDB. To make things interesting, we decided to get a bit meta; we’ll be showing you how to store MongoDB serverStatus output into a MongoDB. The serverStatus command returns a document that provides an overview of the database process’s state.

With this data you can easily create real-time and/or historical metrics that you’re interested in. These metrics may be particularly useful for benchmarking, testing in development or monitoring your MongoDB’s overall health.

Installing Fluentd

If you need to install Fluentd, you can find detailed installation instructions on the project site. Fluentd is written in Ruby for flexibility, with performance-sensitive parts in C. However, since not all developers use Ruby, a stable distribution of Fluentd called td-agent was created. This allows developers unfamiliar with Ruby to quickly get up and running with Fluentd and avoid having to install the “fluentd” gem. The differences between td-agent and the fluentd gem can be found here.

For the purposes of this tutorial, we’ll assume you’ve installed td-agent for Mac; I’ll be using the Mac OSX distribution. However, if you’re using fluentd just replace all instances of “td-agent” with “fluentd” and all the steps will still apply.

Setting up your Fluentd configuration file

First, you’ll need to locate your td-agent.conf file. This is the config file that allows the user to control the input and output behavior of Fluentd by selecting plugins and specifying plugin parameters. If you don’t know where it is, you can run the command “td-agent” from your terminal and the streaming logs will output the config file path location (amongst other information). By default on OSX, the file path is /usr/local/etc/td-agent/td-agent.conf.

Configuring the serverStatus input plugin

Once you’ve found the config file, you can define a data input source to collect from.

First we’ll specify an input plugin – the serverStatus plugin that we’ve written for this tutorial. You’ll want to change your config file to look like the following:

<source>
  type serverstatus
  uri mongodb://dbUser:dbPass@host:port/admin
  # Replica sets use "uris" array param
  # uris ["mongodb://dbUser:dbPass@host1:port1/admin", "mongodb://dbUser:dbPass@host2:port2/admin", ...]
  stats_interval 5s # How frequently you get the server status. Every minute by default
</source>

Next you’ll need to save the serverStatus plugin code so that Fluentd can load and run the plugin. In the same directory as your config file there resides a “plugins” folder. Go ahead and save the serverStatus plugin code in a file named “in_serverstatus.rb” in the “plugins” folder.

module Fluent
  class ServerStatusInput < Input
    Plugin.register_input('serverstatus', self)
 
    config_param :uris, :array, :default => nil
    config_param :uri, :string, :default => "mongodb://localhost:27017"
    config_param :stats_interval, :time, :default => 60 # every minute
    config_param :tag_prefix, :string, :default => "serverstatus"
 
    def initialize
      super
      require 'mongo'
    end
 
    def configure(conf)
      super
 
      unless @uris or @uri
        raise ConfigError, 'uris or uri must be specified'
      end
 
      if @uris.nil?
        @uris = [@uri]
      end
 
      @conns = @uris.map do |uri_str|
        uri_str = "mongodb://#{uri_str}" if not uri_str.start_with?("mongodb://")
        uri = Mongo::URIParser.new(uri_str)
        [Mongo::MongoClient.from_uri(uri_str), uri]
      end
    end
 
    def start
      @loop = Coolio::Loop.new
      tw = TimerWatcher.new(@stats_interval, true, @log, &method(:collect_serverstatus))
      tw.attach(@loop)
      @thread = Thread.new(&method(:run))
    end
    def run
      @loop.run
    rescue
      log.error "unexpected error", :error=>$!.to_s
      log.error_backtrace
    end
 
    def shutdown
      @loop.stop
      @thread.join
    end
 
    def collect_serverstatus
      begin
 
        for conn, conn_uri in @conns
          stats = conn.db('admin').command(:serverStatus => true)
          make_data_msgpack_compatible(stats)
          tag = [@tag_prefix, conn_uri.host.gsub(/[\.-]/, "_"), conn_uri.port].join(".")
          Engine.emit(tag, Engine.now, stats)
        end
 
      rescue => e
        log.error "failed to collect MongoDB stats", :error_class => e.class, :error => e
      end
    end
 
    # MessagePack doesn't like it when the field is of Time class.
    # This is a convenient method that traverses through the
    # getServerStatus response and update any field that is of Time class.
    def make_data_msgpack_compatible(data)
      if [Hash, BSON::OrderedHash].include?(data.class)
        data.each {|k, v|
          if v.respond_to?(:each)
            make_data_msgpack_compatible(v)
          elsif v.class == Time
            data[k] = v.to_i
          end
        }
        # serverStatus's "locks" field has "." as a key, which can't be
        # inserted back to MongoDB withou wreaking havoc. Replace it with
        # "global"
        data["global"] = data.delete(".") if data["."]
      elsif data.class == Array
        data.each_with_index { |v, i|
          if v.respond_to?(:each)
            make_data_msgpack_compatible(v)
          elsif v.class == Time
            data[i] = v.to_i
          end
        }
      end
    end
 
    class TimerWatcher < Coolio::TimerWatcher
 
      def initialize(interval, repeat, log, &callback)
        @callback = callback
        @log = log
        super(interval, repeat)
      end
      def on_timer
        @callback.call
      rescue
        @log.error $!.to_s
        @log.error_backtrace
      end
    end
  end
end

The serverStatus input plugin executes the serverStatus() command every stats_interval seconds and also applies a tag to the data- in this case, serverstatus.hostName.portNumber. The tag is used by the output plugin to easily identify and store tagged data. For more on tags, I recommend checking out these 5 quick slides about the “Life of a Fluentd event”.

Configuring the out_mongo output plugin

Now that we have our input plugin set up, we need to set up an output plugin to store our data to our target destination (a MongoDB). If you’re using td-agent, it already comes bundled with a MongoDB output plugin called out_mongo or out_mongo_replset. If you’re using fluentd, you can install it by running the command below.

% fluent-gem install fluent-plugin-mongo

With the output plugin installed, we can now add output parameters to our config file such as database location, credentials and other options. We’ll add to our existing config file the following code.

<source>
  type serverstatus
  uri mongodb://dbUser:dbPass@host:port/admin
  stats_interval 5s # How frequently you get the server statuses. Every minute by default
</source>

<match serverstatus.**>
  type mongo
  user dbUser    
  pass dbPass
  host hostName
  port portNumber
  database dbName
    
  # See https://github.com/fluent/fluent-plugin-mongo#mongotag-mapped-mode for details
  tag_mapped
  remove_tag_prefix serverstatus.
  collection misc
  flush_interval 10s
</match>

The output plugin begins with a match regex that we’ve set to match the tag (“serverstatus”) tagged by the input plugin. Specify where you’d like the output to be stored (your database information) and you’re good to go!

Using multiple inputs/outputs

If you’d like to monitor multiple MongoDB deployments and/or use multiple outputs, the plugins support this too! To get serverStatus of more than one MongoDB, you can list URIs in the config file using the “uris” array parameter. The output for these MongoDBs will have different tags, making it easy to determine what data came from where.

To configure multiple outputs, you’ll need to use the “copy” output plugin. In the example below, we’ve modified our existing configuration file’s output code to also print the input results to the console.

# Input code goes here

<match serverstatus.**>
  type copy                                                                        
  <store>                                                 
    type stdout                                                                    
  </store>                                                                         
  <store>               
    type mongo
    user dbUser    
    pass dbPass
    host hostName
    port portNumber
    database dbName
    
    # See https://github.com/fluent/fluent-plugin-mongo#mongotag-mapped-mode for details
    tag_mapped
    remove_tag_prefix serverstatus.
    collection misc
    flush_interval 10s
  </store>
</match>

Running Fluentd

Once you have the plugins set up, you can run Fluentd with either the ‘fluentd’ or ‘td-agent’ command from the command line. If you’re using the multiple output configuration from above, you’ll instantly see the serverStatus data printing in your console and storing to your MongoDB every 10 seconds.

With access to this data, you can calculate many interesting metrics that can help monitor the health of your MongoDB. However, you may notice that a lot of the metrics reported in serverStatus are growing totals as opposed to rates. For instance, instead of getting a simple updates-per-second number, serverStatus will give you the total number of update queries that have been made against the server since it started.

Creating useful metrics

Luckily it’s very simple to extract rates from multiple serverStatus documents. Since we’ve set the stats interval to every 5 seconds, to get updates-per-second we take the update numbers from 2 sequential serverStatus documents, subtract them and divide by 5. Assuming serverStatusB was recorded after serverStatusA:

(serverStatusB.opcounters.updates – serverStatusA.opcounters.updates) / 5 seconds

This will give you the average rate of update queries during that 5 second period.

You can use this same technique with any of the counted metrics in serverStatus, including other opcounters, asserts, network bytesIn and bytesOut, page faults and index hits and misses.

Happy hacking!

We hope you found this tutorial helpful and informative… the possibilities with Fluentd and MongoDB are endless! Be sure to reach out to support@mongolab.com if you have any questions about MongoDB and check out the Fluentd mailing list for any questions about Fluentd!

-Chris@MongoLab


Published at DZone with permission of Chris Chang, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)