Ashod Nakashian is well into his second decade leading and developing professional software solutions. Throughout the years, Ashod has participated in projects covering a wide gamut of the software spectrum from web-based services, state-of-the-art audio/video processing and conversion solutions, multi-tier data and email migration and archival solutions to mission-critical real-time electronic trading servers. When not designing, coding or debugging, he enjoys reading, writing and photography. Ashod is a DZone MVB and is not an employee of DZone and has posted 18 posts at DZone. You can read more from them at their website. View Full User Profile

Why Code Readability Matters

06.08.2011
| 9527 views |
  • submit to reddit

Lately my colleagues have been commenting on my relentless notes on code styling in our internal code-reviews. Evidently I’ve been paying more and more attention to what could rightly be called cosmetics. Sure there are bad conventions and, well, better ones. Admittedly, coding conventions are just that, conventions. No matter how much we debate tabs vs. spaces and how much we search for truth, they remain subjective and very much a personal preference. Yet, one can’t deny that there are some better conventions that do in fact improve the overall quality of the code.

But code quality is not what worries me. Not as much as worrying about eye strain and accidental misreadings. Both of which are also factors in the overall code quality, no doubt. It’s not rare that we find ourselves agonizing over a mistake that if it weren’t for haste and misunderstanding, could’ve been completely avoided.

 Function Parameter Duplication.

Image via Wikipedia

Developers don’t write code, not just. We spend the majority of our time adding, removing and editing small pieces of code. In fact, most of the time we do small changes in an otherwise large code-base. Rarely do we write new independent pieces of code, unless of course we are starting a project from scratch. Most projects are legacy projects in the sense that they are passed from generations of developers to the next. We add new functions to implement new features, rewrite some and edit some others to meet our needs. All of which means reading a great deal of code to understand which statement is doing what and decide the best change-set to get the output we plan. But even when we work on a completely independent module or class, once we have the smallest chunk of code written, we already need to go back and forth to interweave the remaining code.

Oddly, as I grew in experience I also spent increasingly more time reading code before making additions or changes. This seems reasonable in hindsight. It’s much easier to write new code, much more difficult to try to understand existing code and to find ways to adapt them. Experience tells us that the easiest route is not necessarily the best in the long run. Sure, we might get faster results if we write that case-insensitive string comparison function for the 30th time whenever we need one, or so we might think. But the fact is that chances are that we’ll get a few incorrect results as well, which we might not even notice until it’s too late. With a little effort, we should find a good implementation that we can reuse and learn how to use it. Indeed, that implementation might very well be in our code-base. My example of string comparison is not all that artificial, considering that some languages don’t come with standard string comparison functions, with Unicode, case and locale options. Some languages do have standardized (or at least canonical) functions, but you need to explicitly include the library reference in your projects and that might involve SCM and build server issues. Not fun when the deadline is looming, and when exactly was the last time we didn’t have an imminent deadline? (Notice that I do not mean to imply that DIY coding is a deadly sin. It probably is in most cases, but surely there are good reasons to write your own. Like early optimization, the first two rules are don’t do it. But that’s another topic for another day.)

Another great side-effect to reading code is that one will eventually find bugs and at least report them, if not fix them on the spot (and send for review). So instead of creating new sources of bugs, reuse reduces code duplication and possibly resolves old bugs.

When we anticipate and reasonably-accurately estimate the overhead of possible bugs in our home-grown functions, scattered around the code-base, in comparison to the overhead of learning and incorporating a higher quality library function, then and only then do we stand a chance of making reasonable decisions. Unfortunately, understanding the difficulty of such comparisons and especially grasping the perils of code duplication combined with the Dunning–Kruger effect when it comes to estimating expected bugs in one’s own code, requires, paradoxically, experience to become obvious.

The whole issue of maintaining legacy code vs. writing from scratch is a different battle altogether. However, I will only say that improving on legacy code in any pace is almost always superior to the demolition approach. It’s less risky, you still have working bits and pieces and you have a reference code right on your plate. And here is my point: to avoid rewriting, one must learn to read existing code, and read them well.

To make things more concrete, here are some of my top readability points:

  1. Consistency.
    I can read code with virtually any convention, no problem. I’ve grown to learn to adapt. If I think I need to read code, then I better accept that I may very well read code not written in my favorite style. One thing that is nearly impossible to adapt to, however, is when the code looks like it’s written by a 5 year old who just switched from Qwerty keyboard to Dvorak.
    Whatever convention you choose or get forced to use, just stick to it. Make sure it’s what you use everywhere in that project. 
  2. Consistency.
    There is no way to stress how important this is. People experiment with braces-on-next-line and braces-on-same-line and forget that their experiments end up in the source repository for the frustration of everyone else on the team. Seriously, experimentation is fine. Finding your preferred style is perfectly Ok. Just don’t commit them. Not in the Trunk at least. 
  3. Space out the code.
    Readable code is like readable text. As much as everyone hates reading text-walls, so do developers who have to read sandwiched code. Put empty lines between all scopes. An end-brace is a great place to add that extra line to make it easier to read and find the start of the next scope. Need to comment? Great. Just avoid newspaper-style coding, where the code is turned into a two-column format, code on the left, comments on the right. There are very few cases that one needs to do that. Comments go above the code block or statement, not next to it. Comments are also more readable if they are preceded by a blank line. There is no shortage of disk space. Break long functions, break mega classes and even break long files. Make your functions reasonable in size. Use cyclomatic complexity metrics to know when your code is too complex and hence hard to read and understand. Refactor. 
  4. Stop banner-commenting.
    If you need to add 5 lines of boxed comments within functions, then it’s a good indication you need to start a new function. The banner can go above the new function. As much as I’d rather not have it there either, it’s much better than having a 1000+ lines in a single function with half a dozen banners. After all, if you need to signal the start of a new logical block, functions do that better. Same goes to breaking long functions, classes and files. (Unlike what the image above may imply, that image is just an image of some code, I do not endorse the style it adopts.) 
  5. Don’t use magic numbers.
    Constants are a great tool to give names to what otherwise is a seemingly magic number that makes the code work. Default values, container limits, timeouts and user-messages are just some examples of good candidates for constants. Just avoid converting every integer into a constant. That’s not the purpose of constants and it reduces code readability. 
  6. Choose identifier names wisely.
    There has been no shortage of literature on naming conventions and the perils of choosing names that lack significant differences. Yet, some still insist on using very ambiguous names and reuse them across logical boundaries. The identifiers not only need to be meaningful and unambiguous, but they also need to be consistent across functions, classes, modules and even the complete project. At least an effort to make them consistent should be made. If you need to use acronyms, then make sure you use the same case convention. Have database or network connections, use the same identifier names everywhere. It’s much, much more readable to see an identifier and immediately know what the type is. Incidentally, this was the original purpose of Hungarian notation (original paper). Same goes to function names. ‘Get’ can be used for read-only functions that have no side-effects. Avoid using verbs and nouns in an identifier then implement logic that contradicts the natural meaning of those words. It’s downright inhumane. If a functions claims to get some value, then that function should never modify any other value (with the possible exception of updating caches, if any, which should only boost performance without affecting correctness.) Prefer using proper names in identifiers. Meaningful names, nouns and verbs can be a great help, use them to their limit. 
  7. Don’t chain declarations.
    Chaining variable declarations is a very common practice in C. This is partially due to the fact that Ansi C didn’t allow declaring variables after any executable statement. Developers had to declare everything up-front and at the top of their functions. This meant that all the variables of a given type were declared in a chain of commas. Declare variables in the innermost scope possible and as close to the usage statement as possible. Avoid recycling variables, unless it’s for the very same purpose. Never re-purpose variables, instead, declare new ones.
    A similar practice is sometimes used to chain calls where a single function call which takes one or more arguments is called by first calling other functions and passing their return values as arguments to the first call. These two types of chaining make readability a painful process. Some people chain 3 or 4 function calls within one another, thinking it’s more compact and therefore better, somehow. It’s not. Not when someone needs to scan the code for side-effects and to find a hard bug. A possible exception to this is chaining calls one after another. Some classes are designed to chain calls such that each call returns a reference to the original instance to call other members. This is often done on string and I/O classes. This type of chaining, if not excessively used and too long, can be helpful and improve readability. 
  8. Limit line and file lengths.
    Scrolling horizontally is a huge pain. No one can be reasonably expected to read anything while scrolling left and right for each line. But some people have larger screens than others. Yet some have weaker eye-sight, so use larger fonts. I’ve seen code apparently aligned to 120 characters per line, with occasional 150+ characters per line, back when 17″ screens were the norm. Some don’t even care or keep track of line length. This is very unfortunate as it makes other people’s lives very hard, especially those with smaller screens, lower resolutions and large fonts (read: bad eye-sight.) Code can’t be word-wrapped like article text (without the help of advanced editors.)
    A good starting point is the legacy standard of 80 characters-per-line (cpl). If we can stick to that, great. We should make an effort to minimize the line-length and 80 characters is not too narrow to make things garbled. For teams who can all work with 100 or 120 cpl, the extra line width can give quite a bit flexibility and improve readability. But again be careful because too long a line and readability suffers yet again. Think how newspapers and magazines have narrow columns and how readable they are. Our wide screens with large footprints are like newspapers and moving our heads to read across the line, as opposed to moving just our eyes, is an extra physical strain and certainly a speed bump that we can do without. 
  9. Make scopes explicit.
    Languages often have implicit scopes. The first statement after a conditional or a loop in C-style languages implicitly fall under the conditional or loop, but not consecutive statements. It’s much harder to parse the end of statements using one’s eyeballs to figure out which statement belongs to the implicit scope. Making these cases explicit makes it much easier to see them without much effort. Adding braces and an empty line after the closing brace, even around single-statement scopes does improve readability quite a bit. 
  10. Comment.
    Code is very well understood by machines, even when obsecure and mangled. Humans on the other hand need to understand purpose and reason, beyond seeing what a piece of code would do. Without understanding the reasons behind particular constructs, operations and values, no developer can maintain the code in any sensible way. Comments are a great tool to communicate the less obvious aspects of the code. Explain the business needs, implicit assumptions, workarounds, special-requests and the temporary solutions. Talk to your fellow developers through comments. But, as every rule comes with exceptions, avoid over-commenting at all costs. Resist the temptation to declare that an error is about to occure right before throwing an exception. The throw statement is self-commenting. Don’t repeat yourself. Rather, explain why the case is considered unrecoverable at that point, if it’s not obvious. Prefer to write code that speaks for itself. Self-descriptive code is the most readable code.

As a corollary, some code should be made hard to read. You read that right. Code that depends on hacks, bad practices or temporary solutions should be made harder to read. Perhaps my best example is casting. It’s not rare that we need to cast instances of objects to specific types, and while this is a standard practice, it isn’t exactly highly recommended either. Making the casting a little less readable is a good way to force the reader to squint and make sure it’s not a bogus cast. This isn’t unlike italic text which makes reading the text harder and therefore makes it stand out. In the same vein, temporary solutions and hacks should be made to stand out. Adding TODO and FIXME markers are a good start. BUGBUG is another marker for known issues or limitations. Actually, if there is any place for that banner-comment, this would be it. Make known issues stand out like a sore thumb.

As with most things, the above is mostly a guideline, rules of thumb if you will. They are not set in stone and they aren’t complete by any stretch of imagination. There are many obviously important cases left out. Languages, ecosystems, frameworks and libraries come with their own nuances. What’s clear in one language might be very ambiguous in another, and therefore frowned upon. The important thing is to write with the reader in mind, to avoid confusing styles and especially to be mindful of the context and when an explanatory comment is due, and when refactoring is a better choice.

Conclusion

Good developers write code fast and with low bug-rate. Great developers read existing code, reuse and improve them, generating higher quality code with no duplication and reduced bug-rate for both new and existing code. We should write readable code; written for humans and not just for machines. And we should read code and improve, rather than rewriting.

From http://blog.ashodnakashian.com/2011/03/code-readability/

Published at DZone with permission of Ashod Nakashian, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jonathan Fisher replied on Wed, 2011/06/08 - 11:54pm

I agree with everything but your last point. I encourage my developers to "Not Comment" because if they've followed your previous 9 steps, chances are comments aren't needed. Too often comments are used to clarify crap code.

Damien Lepage replied on Thu, 2011/06/09 - 12:07am

There's an easy way to ensure consistency, just set-up auto-formatting on save in your IDE and put this config into your source repository. The limit is that all the developers need to use the same IDE. I reviewed code from at least 20 different developers on the same application and I really enjoyed this consistency.

Ashod Nakashian replied on Thu, 2011/06/09 - 12:21am in response to: Jonathan Fisher

Perhaps I haven't emphasized the main point enough. I agree with your sentiment, however, I think there is a very good purpose to comments, that is, to communicate what's beyond the algorithm. Sure we can read some code and understand the algorithm (in most cases at least,) however, why that particular algorithm is used might not be obvious. Perhaps there is some business req. that says "results should also include the total number of dongs." Without commenting this fact, the maintainer might feel at loss as to why we're computing the total dongs.
Another good case is when a certain code is added/modified to fix a particular bug or add a special customer request. The bug-id and summary is a good marker above the code to direct developers to the right place. Granted, such pointers may become superfluous at some point and may be removed.

In short, where over-commenting is a horrible way of polluting the code, under-commenting is not very friendly either. It expects too much from the -human- reader. The code is not the full story, comments help fill in the gaps and give pointers to more information in reqs, bugs, tickets and docs. We can and should learn to strike the right balance. Comment what's not obvious or can't be inferred.

Christof Schoell replied on Thu, 2011/06/09 - 12:35am in response to: Damien Lepage

I agree auto formatting is good and helps to have a consistent layout. But please don't force your Developers to use the same IDE cause that makes them less productive depending on their preference.... There are other options like automatic formatting by the source control system on check-in or having well defined guide lines, which are checked via checkstyle or similar Tool.

Robert Csala replied on Thu, 2011/06/09 - 5:47am

@Ashod: Thanks, this article is very good. I completely agree.

Liam Knox replied on Thu, 2011/06/09 - 9:25pm

I Agree but isn't this article just 'Code Readability Really, Really Matters. Please read Clean Code by Bob Martin' ? :-)

Liam Knox replied on Thu, 2011/06/09 - 9:41pm in response to: Christof Schoell

A team should use a common developer set up.

A team should use the same IDE. Don't muck around here. Eclipse is great

Developers are not less productive in a common IDE. Maybe 10 years ago an Emacs developer could argue this. Now... no chance. If you cant write code in Eclipse you shouldn't be writing code anyway.

I really understand the sentiment in developer preferences but we aren't in the evolution stage now. We work in big teams, big projects and have need to have standards. The skills and originality is in the solution space. Its not about being a Maverick vi expert.

On auto formatting. It is great, though sadly only 95% of the time. If I have an Enum or static Array for example, I may want one entry per line for readability. Auto formatting seems not the grasp this.

I petition IDE developers to understand something @NoAutoFormat, to address this. Perhaps there is something already but I would be glad to hear about it.

Mladen Girazovski replied on Fri, 2011/06/10 - 9:05am in response to: Liam Knox

Agree but isn't this article just 'Code Readability Really, Really Matters. Please read Clean Code by Bob Martin' ? :-) 

 No, it isn't.

The author apperantly hasn't read "Clean Code", otherwise he probably wouldn't recommend practices that go against Bob Martins advices.

6.

..

It’s much, much more readable to see an identifier and immediately know what the type is. Incidentally, this was the original purpose of Hungarian notation (original paper).

..

Nope it isn't, use a modern IDE and then there is no need anymore for the hungarian notation, which btw. works against the code completetion function of a modern IDE.

Then hungarian notation carries redundancy that might mislead the reader, it happened to the Win32 API, when the datatypes changed, but the names didn't.

10.

This should be called rather "Document a public API" not "Comment", one can also use Unittests to document the code, however, just writing a lots of comments is usually a sign that the code is not clean.

I do agree that code redability matters, but i disagree with some points of the author what makes code readable.

Ashod Nakashian replied on Fri, 2011/06/10 - 9:55am in response to: Mladen Girazovski

I should apologize beforehand for what turned out to be a longish response. Thanks for your input and forgive my uncouth response.

The author apperantly hasn't read "Clean Code", otherwise he probably wouldn't recommend practices that go against Bob Martins advices.
Indeed I haven't.

Nope it isn't, use a modern IDE and then there is no need anymore for the hungarian notation, which btw. works against the code completetion function of a modern IDE.
Perhaps I should've stressed the word "readable" more. I'm not concerned with what our modern tools can do. I am concerned with code that is, well, readable! I'm using the word literally. Tools are great, but if you need to navigate and browse for every single identifier, then it's not much readable. To see my point, go the other extreme: let's randomly generate all identifier names and rely on modern IDE features. See my point?

Then hungarian notation carries redundancy that might mislead the reader, it happened to the Win32 API, when the datatypes changed, but the names didn't.
This is why I took the time to both say "original purpose" and link to the original paper. Because Win32 botched Hungarian notation. My statement that the reader should be able to say which identifier is network related and which is database is a better representative of what I meant by "type" in the statement that you quoted. Neither I, nor the Hungarian notation for that matter, suggest to codify the data-types in identifiers, rather, to make clear what purpose the identifier serves. (If we have network and database code in the same scope, "connection" becomes a bad choice for an identifier. Similarly, I'm suggesting we don't use "connection" in one place, "net" in another "socket" in yet another. Whatever we choose should be unambiguous and consistent across the project.) See Joel's article on this topic. He has some good examples, albeit his examples are concrete and here I'm only concerned with the concept of making the code speak for itself. That is, not confuse the reader and not depend on tools to figure things out.

This should be called rather "Document a public API" not "Comment", one can also use Unittests to document the code, however, just writing a lots of comments is usually a sign that the code is not clean.
Not only "document a public API." Commenting should be inline as well, where necessary. I did spend some good few lines to make it clear that over commenting is horrible. But if the business decisions aren't obvious from the algorithms, then comments should be used. Yes, unit-tests can be used to infer what a piece of code is supposed to do, but please don't make it the only way to figure things out. Also, they don't document the "why" question.

I know there is a school of thought that would ban comments from languages if they could. I don't know what could result in such a position, but I'm sure there are some very good use-cases for comments and, save for the few abusive novices, we should all learn to maximize their utility.

Again, I'm concerned with readability. I want to be able to read the code and figure out as much as humanly possible. I want to encourage my colleagues to do the same. To make the code less dependent on developers to explain what's what. To save us time and energy when we need to (re)visit some code for the first time or after we'd moved onto other projects. I don't mind depending on tools, but I can't give that as an excuse to make my identifiers similar, unreadable and/or confusing and expect to be taken serious (not that anyone suggested anything like that.) Tools are a bonus, not a replacement to writing quality code.

Lund Wolfe replied on Sat, 2011/06/11 - 4:16pm

I agree with most of your advice, especially the importance of comments. There are some idealists here who apparently only work on beautiful code, which is simple, elegant, and self documenting. The reality is that code spends a large percentage of its life in maintenance/enhancement. Code has a long life usually because it has important benefit to the company - not because it is well designed and well coded.

I am currently maintaining a project which is at least eight years old and has had many developers before me. There are almost no comments and the vast majority of those include only a developer name, bug # and fix date or less. Because the code base is poor quality in terms of use and readability, comments are that much more important. I can't believe that none of the previous developers, no matter how inexperienced, ever added comments and I have been told that the comments may have been stripped out by one or more of the previous developers. It is very sad that this was not caught by a more mature developer if this is the case.

We are currently interviewing candidates for a junior java developer position and none (only two so far, including a BSCS who gave great answers to all the other questions) have been able to give a decent answer to why and when comments are useful. I'm beginning to think this is really a mid-level or senior level interview question. After many years of maintaining java coding, I suspect comments are still a mystery to most developers.

Totally agree that a standard formatting should be used across the project code by all developers (to facilitate file differencing, which is bad at ignoring whitespace, for one thing). I also agree that developers should be able to use their IDE of choice. Either developers agree that final formatting should be done in a specific IDE or formatting needs to be externalized to a separate tool or the version control system.

Loren Kratzke replied on Sat, 2011/06/11 - 9:27pm in response to: Liam Knox

I see. Developers should use the same IDE as long as it is the IDE that you use. Netbeans is a great IDE too, but I don't force it upon my fellow Eclipse developers. In fact our favorite joke is blaming the "other" IDE for bugs, "must be an Eclipse bug, must be a Netbeans bug...". Belly laugh every time.

First of all, we develop in Maven. There is nothing IDE specific about Maven unless you make it that way on purpose.

Second, if somebody has used an IDE for 8 years, why would you want to mess with that developer by jamming some other IDE down their throat (and I refer you back to point #1).

There is no point in everybody using one IDE unless your projects are IDE specific. That is a major antipattern that is to be avoided. If everybody has to use the same IDE, or your project will only compile under one IDE then there is something very wrong.

Marc Stock replied on Wed, 2011/06/15 - 9:57am in response to: Liam Knox

@Liam Knox Your comment is ridiculous. You only say this because Eclipse is your favorite IDE. If another IDE, (say NetBeans) was forced upon you, you'd probably throw a hissy fit and complain how unfair it is. Instead, you throw this tyranny of the masses crap out there and tell everyone to suck it up. The problem isn't different IDEs. The problem is you built your project around a specific IDE rather than the other way around. And one more thing...Eclipse sucks. That is all.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.