DevOps Zone is brought to you in partnership with:

Geoff Papilion has made a living running infrastructure for the past 15 years. He is currently employeed at Wikia.com, scaling the infrastructure to 1.5 billion request per day. Geoffrey is a DZone MVB and is not an employee of DZone and has posted 26 posts at DZone. You can read more from them at their website. View Full User Profile

The Challenge of Small Ops: Part 2 - Monitoring

07.10.2012
| 4224 views |
  • submit to reddit

So your building or have built a web service, you’ve got a lot of challenges a head. You’ve got to scale software and keep customers happy. Not surprisingly that likely involves keeping your web service up, and that typically starts by setting up some form of monitoring when something goes wrong. In a large organization this may be the job of an entire team, but in a small organization its likely everyone’s job. So, given that everyone will tell you how much monitoring sucks, how do you make it better and hopefully manageable.

Curators Resource: Some downloads on deployment maturity models.

 


A Subject Matter Expert Election

Elect someone as your monitoring expert. I’d suggest someone with an Ops background, since they probably used several tools, and are familiar with configuration and options for this type of software. Its important that this person is in the loop with regard product decisions, and need to be notified when applications are goring to be deployed.

What Gets Monitored?

The simplistic answer you’ll hear over and over again is “everything”, but you’re a small organization with limited headcount. Your monitoring expert should have an idea since he’s in the product loop. They should be making suggestions as to what metrics the application should expose, and getting the basic information from the development team for alerting (process names, ports, etc..). They should prioritize the list of tests and metrics, starting with what you need to know that the application is up, and finishing with nice to haves.

Who Builds It?

Probably everyone. If that’s all you want you monitoring expert to do, then have them do it, but its far more efficient to spread the load. If people are having difficulty integrating with particular tools like Nagios or Ganglia, that’s where you expert steps in. They should know the in’s and out’s of the tools your using, and be able to train someone to extend a test, or get a metric collected within a few minutes.

These metrics and tests should be treated as part of the product, and should follow the same process you use to get code in production. You should consider making them goals for the project, since if you wait till after shipping there will be some lag between pushing code and getting it monitored.

Wrapping It Up

Its really not that hard; you just need someone to take a breath and plan. The part of monitoring that sucks is the human part, and since your app will not tell people when its up and down, you need a person to think about that for you.

Take a look at Part 1 if you liked this article.

Published at DZone with permission of Geoffrey Papilion, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)