Mr. Lott has been involved in over 70 software development projects in a career that spans 30 years. He has worked in the capacity of internet strategist, software architect, project leader, DBA, programmer. Since 1993 he has been focused on data warehousing and the associated e-business architectures that make the right data available to the right people to support their business decision-making. Steven is a DZone MVB and is not an employee of DZone and has posted 140 posts at DZone. You can read more from them at their website. View Full User Profile

Numerosity - More Metrics Without Meaning

03.08.2010
| 4411 views |
  • submit to reddit

Common Complaint: "This was the nth time that someone was up in arms that [X] was broken ... PL/SQL that ... has one function that is over 1,500 lines of [code]."

Not a good solution: "Find someway to measure "yucky code"."

Continuing down a path of relatively low value, the question included this reference: "Using Metrics to Find Out if Your Code Base Will Stand the Test of Time," Aaron Erickson, Feb 18, 2010. The article is quite nice, but the question abuses it terribly.

For example: "It mentions cyclomatic complexity, efferent and afferent coupling. The article mentions some tools." Mentions? I believe the article defines cyclomatic complexity and gives examples of it's use.

Red Alert. There's no easy way to "measure" code smell. Stop trying.

How is this a path of low value? How can I say that proven metrics like cyclomatic complexity are of low value? How dare I?

Excessive Measurment

Here's why the question devolves into numerosity. The initial problem is that a piece of code is actually breaking. Code that breaks repeatedly is costly: disrupted production, time to repair, etc.

What further metric do you need? It breaks. It costs. That's all you need to know. You can judge the cost in dollars. Everything else is numerosity.

A good quote from the article: "By providing visibility into the maintainability of your code base—and being proactive about reducing these risks—companies can significantly reduce spend on maintenance". The article is trying to help identify possible future maintenance.

The code in question is already known to be bad. What more information is needed?

What level of Cyclomatic Complexity is too high? Clearly, that piece of code was already too much. Do you need a Cyclomatic Complexity number to know it's broken? No, you have simple, direct labor cost that tells you it's broken. Everyone already agrees it's broken. What more is required?

First things first: It's already broken. Stop trying to measure. When the brakes have already failed, you don't need to measure hydraulic pressure in the brake lines. They've failed. Fix them.

The Magical Number

The best part is this. Here's a question that provides much insight in to the practical use of Cyclomatic Complexity. http://stackoverflow.com/questions/20702/whats-your-a-good-limit-for-cyclomatic-complexity. Some say 5, some say 10.

What does that mean? Clearly code with a cyclomatic complexity of 10 is twice as bad as a cyclomatic complexity of 5. Right? Or is the cost function relatively flat, and 10 is only 5% worse than 5? Or is the cost function exponential and 10 is 10 times worse than 5? Who knows? How do we interpret these numbers? What does each point of Cyclomatic complexity map to? (Other than if-statements.)

Somehow both 5 and 10 are "acceptable" thresholds.

When folks ask how to use this to measure code smell, it means they're trying to replace thinking with counting. Always a bad policy.

Second Principle: If you want to find code smells, you have to read the code. When the brakes are mushy and ineffective, you don't need to start measuring hydraulic pressure in every car in the parking lot. You need to fix the brakes on the car that's already obviously in need of maintenance.

Management Initiative

Imagine this scenario. Someone decides that the CC threshold is 10. That means they now have to run some metrics tool and gather the CC for every piece of code. Now what? Seriously. What will happen?

Some code will have a CC score of 11. Clearly unacceptable. Some will have a CC score of 300. Also unacceptable. You can't just randomly start reworking everything with CC > 10. What will happen?

You prioritize. The modules with CC scores of 300 will be reworked first.

Guess what? You already knew they stank. You don't need a CC score to find the truly egregious modules. You already know. Ask anyone which modules are the worst. Everyone who reads the code on a regular basis knows exactly where actual problems are.

Indeed, ask a manager. They know which modules are trouble. "Don't touch module [Y], it's a nightmare to get working again."

Third Principle: You already know everything you need to know. The hard part is taking action. Rework of existing code is something that managers are punished for. Rework is a failure mode. Ask any manager about fixing something that's rotten to the core but not actually failing in production. What do they say? Everyone -- absolutely everyone -- will say "if it ain't broke, don't fix it."

Failure to find and fix code smells is entirely a management problem. Metrics don't help.

Dream World

The numerosity dream is that there's some function that maps cyclomatic complexity to maintenance cost. In dollars. Does that mean this formula magically includes organization overheads, time lost in meetings, and process dumbness?

Okay. The sensible numerosity dream is that there's some function between cyclomatic complexity and effort to maintain in applied labor hours. That means the formula magically includes personal learning time, skill level of the developer, etc.

Okay. A more sensible numerosity dream is that there's some function between cyclomatic complexity and effort to maintain in standardized labor hours. Book hours. These have to be adjusted for the person and the organization. That means the formula magically includes factors for technology choices like language and IDE.

Why is it so hard to find any sensible prediction from specific cyclomatic complexity?

Look at previous attempts to measure software development. For example, COCOMO. Basic COCOMO has a nice R×T=D kind of formula. Actually it's aKb=E, but the idea is that you have a simple function with one independent variable (likes of code, K), and one dependent variable (effort, E) and some constants (a, b). A nice Newtonian and Einsteinian model.

Move on to intermediate COCOMO and COCOMO II. At least 15 additional independent variables have shown up. And in COCOMO II, the number of independent variables is yet larger with yet more complex relationships.

Fourth Principle: Software development is a human endeavor. We're talking about human behavior. Measuring hydraulic pressure in the brake lines will never find the the idiot mechanic who forgot to fill the reservoir.

Boehm called his book Software Engineering Economics. Note the parallel. Software engineering -- like economics -- is a dismal science. It has lots of things you can measure. Sadly, the human behavior factors create an unlimited number of independent variables.

Relative Values

Here's a sensible approach: "Code Review and Complexity". They used a relative jump in Cyclomatic Complexity to trigger an in-depth review. Note that this happens at development time.

Once it's in production, no matter how smelly, it's unfixable. After all, if it got to production, "it ain't broke".

Bottom Lines

  1. You already know it's broken. The brakes failed. Stop measuring what you already know.
  2. You can only find smell by reading the code. Don't measure hydraulic pressure in every car: find cars with mushy brakes. Any measurement will be debated down to a subjective judgement. A CC threshold of 10 will have exceptions. Don't waste time creating a rule and then creating a lot of exceptions. Stop trying to use metrics as a way to avoid thinking about the code.
  3. You already know what else smells. The hard part is taking action. You don't need more metrics to tell you where the costs and risks already are. It's in production -- you have all the history you need. A review of trouble tickets is enough.
  4. It's a human enterprise. There are too many independent variables, stop trying to measure things you can't actually control. You need to find the idiot who didn't fill the brake fluid reservoir.

From http://slott-softwarearchitect.blogspot.com

Published at DZone with permission of Steven Lott, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jilles Van Gurp replied on Mon, 2010/03/08 - 3:19am

I think you are right. In my view the value of metrics is primarily pointing out where the problem hotspots are in the code, not how big they are. In my experience, finding a hotspot tends to be like opening a can of worms. It's very likely the case that you will find a multiple problems affecting the same bits of code. It's obvious with a few glances at the code. The main problem therefore is people not seeing the problem.

By far the hardest part of problematic code is dealing with the people who created it. As long as you don't convince them there is a problem they'll do exactly nothing to fix it. I see this as an educational challenge (especially with Junior engineers) and an opportunity to teach people how to do things properly. But not all engineers are open to being educated. Sometimes you have to step in and substantiate things. That's where metrics can come in handy. Though frankly, they are rarely of use in what tends to be quite heated debates.

In the end, however, software quality is an economical problem. Maintenance cost is tied to simple metrics like size in SLOC, complexity and a few other things like cohesiveness and coupling. A class with 2000 LOC and dozens of dependencies stands out like a sore thumb. All I need is the size in kilo bytes to find offending classes. If I see a complex system of 50-100KLOC and ten people working on it, I know that the team is about 2-3 times larger than would be necessary with more competent developers and a clean code base. That metric alone tells me there is a problem and I know I will find out what the problem is if needed. Metrics are great for blackbox problem spotting.

Ultimately, I'm a craftsman and not a statistics guy. I use heuristics like the above to guide me, and sometimes to convince others but never to force them to do things they don't want to. 

Andrew Thompson replied on Mon, 2010/03/08 - 3:33pm

Not sure I completely agree. The level of effort required to gather these statistics is trivial and generally just involves looking at the most recent maven build. And there is a strong correlation between bad code and "cyclomatic complexity, efferent and afferent coupling". It won't catch everything, but it will catch somethings.

Yes, developers usually know what modules are problem children. And yes, peer reviews are invaluable for identifying problems. But sometimes it's not feasible to peer review all code. And sometimes it's necessary to have a number to communicate to nontechnical people. Generally they understand the idea minimum acceptable code quality even if not the meaning of the specific metric.

On a slightly related note, http://www.youtube.com/watch?v=JIASiZQR3nI is one of my favorite IT videos ever. It's from the crap4j team.

Stephane Vaucher replied on Sun, 2010/03/14 - 9:03pm

And there is a strong correlation between bad code and "cyclomatic complexity, efferent and afferent coupling". Oddly enough, the correlation between CC and bug count exists, but it is pretty weak. On publicly available NASA metrics data, CC has a correlation of the order of 20-30% (multiple projets, different languages). For fan out, the correlation exists, but it is less useful than LOCs for bug count prediction. Actually, what is useful in fan out is that it is another size measure. When size is controlled, on about 4 systems I analysed, fan-out has <10% correlation. For fan in, the correlation exists, but is generally negative. Code that is used a lot tends to contain fewer bugs and change less. Not a big surprise considering what is used a lot, gets tested implicitly every version. I do agree that the use of metrics can help locate risks in a code base. This is particularly useful when you do not have an insider's view (controlling an outsourced project) and cannot afford a manual inspection.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.