DevOps Zone is brought to you in partnership with:

I'm an Agile and Lean Strategist specialised in coaching and managing the transformation of IT departments, from startups to enterprise-scale organisations, to highly efficient, productive and energised environments. I also have experience as senior development manager and architecture governance in large enterprises, especially in the finance sector. Marco is a DZone MVB and is not an employee of DZone and has posted 26 posts at DZone. You can read more from them at their website. View Full User Profile

Measuring Your IT Ops - Part 1

06.22.2012
| 3307 views |
  • submit to reddit

In my previous article I briefly explained the importance of measuring IT OPS to lay the foundations for Continuous Improvement (CI). I then listed what I think are few, indispensable IT OPS measurements that form the basis for a CI environment. The first of these is FALT, (Feature Average Lead Time). Which kind of measure is this and why is it important?

FALT gives us at a glance the average lead time that passed between the time when such feature was requested by the business (or found its place in the backlog if you talk Agile, or in the processing queue if you are talking Lean) and the time when such feature was delivered into production. If it's true that a deliverable requested by the business is more valuable today than tomorrow because earlier delivery maximises the Return On Investment (ROI), then it's true that a shorter lead time means greater business value. Additionally, the longer it takes for a feature to be deployed into production the higher the cost (think simple multiplication of man days x average resource cost) and because of what we just said, the lower the ROI. For certain typologies of companies such as startups, small differences in lead times might make the difference between survival and failure.

This second aspect is important; in recent months Continuous Delivery has increased in popularity; with their book, Humble and Farley talk about this new way of looking at software delivery as they realise that a software deliverable is all the more valuable the shorter the time for it to hit production, delivering value to the stakeholders who asked for it. Whereas in Agile a story is considered "Done" when it passes the demo to the Product Owner's satisfaction (I like the definition of done in Agile given by Mayank Gupta in this article) with Continuous Delivery a story is done when the functionality has been deployed to production and it's ready to be used. This to the novice Agile practitioner might seem a small difference, but imagine you had 10+ features which passed UAT and ready to go into production; could you say that you are done? You'd be surprised how many Agile practitioners would answer "yes" but the reality is that none of those 10+ features is delivering any business value because the targeted audience can't make use of them. 

Therefore to measure the Average Lead Time of a Feature we consider two dates: the date this requirement found its place is some queue (a backlog is just another queue without limits) and the date the feature hit production. 

One way of measuring FALT could be the following: 

  • Define your Classes Of Service (COS - A concept welcomed in Kanban). A Class Of Service (COS) is just a type of production deliverable; each organisation has got its types but just to name a few we could consider amongst them: business deliverable, production bug fix, evergreening, maintenance of legacy systems, etc.
  • Define a spreadsheet with three sections: one to collect detailed data per COS; one to calculate averages and one to define validation lists (in our case the only one is the list of COS)

An example of such spreadsheet can be found below:

FALT-histogram

The first worksheet contains the detailed data per COS and a graph which has been created using the average lead times per COS calculated in the second worksheet, shown below:

FALT-histogram-ws2

For this example, I also created a third worksheet containing a validation list for COS as shown below:

FALT-COS-validation-list

The validation list could then be used to constrain the values in the COS column.

Looking at the graph (and at worksheet 2 if you like) it becomes then pretty obvious (and helpful) to see where the attention needs to be focused; it would appear that in this case Evergreening projects (which represent pure cost) are the major bottleneck, with an average 288 days from when the project entered the work queue to when it was finally deployed to production.

Figures don't necessarily need to be good or bad, that's the whole point: it's important to simply have them so that IT organisations can make up their own mind as to whether the figures look healthy or there are some bottlenecks that need to be resolved. For instance it might be that upon further investigation it was found that actually the very nature of Evergreening projects in this particular organisation requires long lead times because of the required coordination with downstream systems using the system being upgraded. If this IT organisation didn't measure IT OPS and, say, delivered all the required features within a budget year, the risk of rushing into false positives would have likely been very high and at the question: "How is your IT doing" the (false positive) answer would have been: "Great! We delivered everything that was asked of us this year!". The real question is: could have you done better? 

I hope you found this article useful. In my next one I'll talk about a proposed way of measuring the Development Cost for a Deployed Feature (DECODEF), as mentioned in my previous article.

Published at DZone with permission of Marco Tedone, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)