David Green is a developer and aspiring software craftsman. He has been programming for 20 years but only getting paid to do it for the last 10; in that time he has worked for a variety of companies from small start-ups to global enterprises. David co-founded the London Software Craftsmanship Community (http://www.londonswcraft.com/) - a group of professional programmers who meet regularly to exchange ideas and improve their craft. David is a DZone MVB and is not an employee of DZone and has posted 25 posts at DZone. You can read more from them at their website. View Full User Profile

Measuring Code

  • submit to reddit

How good is your code? If you’re like the other 80% of above average developers, then I bet your code is pretty awesome. But are you sure? How can you tell? Or perhaps you’re working on a legacy code base – just how bad is the code? And is it getting better? Code metrics provide a way of measuring your code – some people love ‘em, but some hate ‘em.

The Good

Personally I’ve always found metrics very useful. For example – code coverage tools like Emma can give you a great insight into where you do and don’t have test coverage. Before embarking on an epic refactor of a particular package, just how much coverage is there? Maybe I should increase the test coverage before I start tearing the code apart.

Man Measuring BoardAnother interesting metric can be lines of code. While working in a legacy code base (and who isn’t?), if you can keep velocity consistent (so you’re still delivering features) but keep the volume of inventory the same or less, then you’re making the code less crappy while still delivering value. Any idiot can implement a feature by writing bucket loads of new code, but it takes real craftsmanship to deliver new features and reduce the size of the code base.

The Bad

The problem with any metric is who consumes it. The last thing you want is for an over eager manager to start monitoring it.

You can’t control what you can’t measure – Tom DeMarco

Before you know it, there’s a bonus attached to the number of defects raised. Or there’s a code coverage target everyone is “encouraged” to meet.

As soon as there’s management pressure on a metric, smart people will game the system. I’ve lost count of the number of times I’ve seen people gaming code coverage metrics. In an effort to please a well meaning but fundamentally misguided manager, developers end up writing tests with no assertions. Sure, the code ran and didn’t blow up. But did it do the right thing? Who knows! And if you introduce bugs, will your tests catch it? Hell, no! So your coverage is useless.

The target was met but the underlying goal – improving software quality – has not only been missed, it’s now harder to meet in future.

The Ugly

The goal of any metric is to measure something useful about the code base. Take code coverage, for example – really what we’re interested in is defect coverage. That is, out of the universe of all possible defects in our code, how many would cause a failure in at least one test? That’s what we want to know – how protected are we against regressions in the code base.

WomanShoutingThe trouble is, how can I measure “the universe of all possible defects” in a system? Its basically unknowable. Instead, we use code coverage as an approximation. Given that tests assert the code did the right thing, the percentage of code that has been executed is a good estimation of the likelihood of bugs being caught by them. If my tests execute 50% of the code, at best I can catch bugs in 50% of the code. If there are bugs in the other 50%, there’s zero chance my tests will find them. Code coverage is an upper bound on test coverage. But, if your tests are shoddy, test coverage can be much lower. To the point where tests with no assertions are basically useless.

And this is the difficulty with metrics: measuring what really matters – the quality of our software – is hard, if not impossible. So instead we have to measure what we can, but it isn’t always clear how that relates to our underlying goal.

But what does it mean?

There are some excellent tools out there like Sonar that give you a great overview of your code using a variety of common metrics. The trouble often is that developers don’t know (or care) what they mean. Is a complexity of 17.0 / class good or bad? I’m 5.6% tangled – but maybe there’s a good reason for that. What’s a reasonable target for this code base? And is LCOM4 a good thing or a bad thing? It sounds like a cancer treatment, to be honest.

Sure, if I’m motivated enough I can dig in and figure out what each metric means and we can try and agree reasonable targets and blah blah blah. C’mon, I’m busy delivering business value. I don’t have time for that crap. It’s all just too subtle so it gets ignored. Except by management.

A Better Way

Surely there’s got to be a better way to measure “code quality”?

1. Agree

Whatever you measure, its important the team agree and understand what it means. If there’s a measure half the team don’t agree with, then its unlikely it will get better. Some people will work towards improving it, others won’t so will let it get worse. The net effect is likely to be heartache and grief all round.

2. Measure What’s Important

Downward chartYou don’t have to measure the “standard” things – like code coverage or cyclomatic complexity. As long as the team agree its a useful thing to measure, everyone agrees it needs improving and can commit to improving it – then its a useful measure.

A colleague of mine at youDevise spent his 10% time building a tool to track and graph various measures of our code base. But, rather unusually, these weren’t the usual metrics that the big static analysis tools gather – these were much more tightly focused, much more specific to the issues we face. So what kind of things can you measure easily yourself?

  • If you have a god class, why not count the number of lines in the file? Less is better.
  • If you have a 3rd party library you’re trying to get rid of, why not count the number of references to it.
  • If you have a class you’re trying to eliminate, why not count the number of times its imported?

These simple measures represent real technical debt we want to remove – by removing technical debt we will be improving the quality of our code base. They can also be incredibly easy to gather, the most naive approach only needs grep & wc.

It doesn’t matter what you measure, as long as the team believe whatever you do measure should be improved; then it gives you an insight into the quality of your code base, using a measure you care about.

3. Make It Visible

Finally, put this on a screen somewhere – next to your build status is good. That way everyone can see how you’re doing and gets a constant reminder that quality is important. This feedback is vital – you can see when things are getting better and, just as importantly, when things start to slip and the graph veers ominously upwards.

Keep It Simple, Stupid

Code quality is such an abstract concept its impossible to measure. Instead, focus on specific things you can measure easily. The simpler the metric is to understand the easier it is to improve. If you have to explain what a metric means you’re doing it wrong. Try and focus on just a few things at any one time – if you’re tracking 100 different metrics its going to be sheer luck that on average they’re all getting better. If we instead focus on half a dozen, I can remember them – the very least I’ll do is not let them get worse; and if I can, they’ll be clear in my mind so I can improve them.

Do you use metrics? If so, what do you measure? If not, do you think there’s something you could measure?


From http://blog.activelylazy.co.uk/2011/06/25/measuring-code/

Published at DZone with permission of David Green, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)


Vinod Kumaar Ra... replied on Mon, 2012/01/02 - 8:58am

It is true that if we concentrate only on a few of them which might have a direct impact on the quality then people do monitor and take some steps to ensure it is in order. I had an experience where I had only three metrics monitored closely and it provided good benefits. I have collected my thoughts here http://vinodkumaar.wordpress.com/2011/02/13/whistle-blowers/

 The summary is we closely monitored and took corrective action immediately on these

  • More than 12 lines in a method
  • NPath complexity more than 4 in a method
  • More than 4 lines of code copy pasted across any file


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.