Performance Zone is brought to you in partnership with:

John Sonmez is a Pluralsight author of over 25 courses spanning a wide range of topics from mobile development to IoC containers. He is a frequent guest on podcasts such as DotNetRocks and Hanselminutes. John has created applications for iOS, Android and Windows Phone 7 using native tools, HTML5, and just about every cross platform solution available today. He has a passion for Agile development and is engaged in a personal crusade to make the complex simple. John is a DZone MVB and is not an employee of DZone and has posted 86 posts at DZone. You can read more from them at their website. View Full User Profile

What Makes Code Readable: Not What You Think

04.17.2013
| 18359 views |
  • submit to reddit

You often hear about how important it is to write “readable code.”

Developers have pretty strong opinions about what makes code more readable. The more senior the developer, the stronger the opinion.

But, have you ever stopped to think about what really makes code readable?

The standard answer

You would probably agree that the following things, regardless of programming language, contribute to the readability of code:

  • Good variable, method and class names
  • Variables, classes and methods that have a single purpose
  • Consistent indentation and formatting style
  • Reduction of the nesting level in code

There are many more standard answers and pretty widely held beliefs about what makes code readable, and I am not disagreeing with any of these.

(By the way, an excellent resource for this kind of information about “good code” is Robert Martin’s excellent book, Clean Code, or Steve McConnell’s book that all developers should read, Code Complete. *both of these are affiliate links, thanks for your support.)

Instead, I want to point you to a deeper insight about readability…

The vocabulary and experience of the reader

I can look at code and in 2 seconds tell if you it is well written and highly readable or not.  (At least my opinion thereof.)

At the same time, I can take a sample of my best, well written, highly readable code and give it to a novice or beginner programmer, and they don’t spot how it is different from any other code they are looking at.

Even though my code has nice descriptive variable names, short well named methods with few parameters that do one thing and one thing only, and is structured in a way that clearly groups the sections of functionality together, they don’t find it any easier to read than they do code that has had no thought put into its structure whatsoever.

In fact, the complaint I get most often is that my code has too many methods, which makes it hard to follow, and the variable names are too long, which is confusing.

There is a fundamental difference in the way an experienced coder reads code versus how a beginner does

An experienced developer reading code doesn’t pay attention to the vocabulary of the programming language itself.  An experienced developer is more focused on the actual concept being expressed by the code—what the purpose of the code is, not how it is doing it.

A beginner or less experienced developer reads code much differently.

When a less experienced developer reads code, they are trying to understand the actual structure of the code.  A beginner is more focused on the actual vocabulary of the language than what the expression of that language is trying to convey.

To them, a long named variable isn’t descriptive, it’s deceptive, because it is hiding the fact that NumberOfCoins represents an integer value with its long name and personification of the variable, as something more than just an integer.  They’d rather see the variable named X or Number, because its confusing enough to remember what an integer is.

An experienced developer, doesn’t care about integers versus strings and other variable types.  An experienced developer wants to know what the variable represents in the logical context of the method or system, not what type the variable is or how it works.

Example: learning to read

Think about what it is like to learn to read.

When kids are learning to read, they start off by learning the phonetic sounds of letters.

When young kids are reading books for the first time, they start out by sounding out each word.  When they are reading, they are not focusing on the grammar or the thought being conveyed by the writing, so much as they are focusing on the very structure of the words themselves.

Imagine if this blog post was written in the form of an early reader.

Imagine if I constrained my vocabulary and sentence structure to that of a “See Spot Run” book.

Would you find my blog to be highly “readable?”  Probably not, but kindergarteners would probably find it much more digestible.  (Although they would most likely still snub the content.)

You’d find the same scenario with experienced musicians, who can read sheet music easily versus beginners who would probably much prefer tablature.

An experienced musician would find sheet music much easier to read and understand than a musical description that said what keys on a piano to press or what strings on a guitar to pluck.

Readability constraints

Just like you are limited to the elegance with which you can express thoughts and ideas using the vocabulary and structure of an early reader book, you are also limited in the same way by both the programming language in which you program in and the context in which you program it.

This is better seen in an example though.  Let’s look at some assembly language.


.model small
.stack 100h
 
.data
msg     db      'Hello world!$'
 
.code
start:
        mov     ah, 09h   ; Display the message
        lea     dx, msg
        int     21h
        mov     ax, 4C00h  ; Terminate the executable
        int     21h
 
end start

This assembly code will print “Hello World!” to the screen in DOS.

With x86 assembly language, the vocabulary and grammar of the language is quite limited.  It isn’t easy to express complex code in the language and make it readable.

There is an upper limit on the readability of x86 assembly language, no matter how good of a programmer you are.

Now let’s look at Hello World in C#.

public class Hello1
{
   public static void Main()
   {
      System.Console.WriteLine("Hello, World!");
   }
}

It’s not a straight across the board comparison, because this version is using .NET framework in addition to the C# language, but for the purposes of this post we’ll consider C# to include the base class libraries as well.

The point though, is that with C#’s much larger vocabulary and more complicated grammar, comes the ability to express more complex ideas in a more succinct and readable way.

Want to know why Ruby got so popular for a while?  Here is Hello World in Ruby.


puts "Hello, world"

That’s it, pretty small.

I’m not a huge fan of Ruby myself, but if you understand the large vocabulary and grammar structure of the Ruby language, you’ll find that you can express things very clearly in the language.

Now, I realize I am not comparing apples to apples here and that Hello World is hardly a good representation of a programming language’s vocabulary or grammar.

My point is, the larger the vocabulary you have, the more succinctly ideas can be expressed, thus making them more readable, BUT only to those who have a mastery of that vocabulary and grammar.

What we can draw from all this?

So, you might be thinking “oh ok, that’s interesting… I’m not sure if I totally agree with you, but I kind of get what your saying, so what’s the point?”

Fair question.

There is quite a bit we can draw from understanding how vocabulary and experience affects readability.

First of all, we can target our code for our audience.

We have to think about who is going to be reading our code and what their vocabulary and experience level is.

In C#, it is commonly argued whether or not the conditional operator should be used.

Should we write code like this:

var nextAction = dogIsHungry ? Actions.Feed : Actions.Walk;

Or should we write code like this:


var nextAction = Actions.None
if(dogIsHungry)
{
   nextAction = Actions.Feed
}
else
{
   nextAction = Actions.Walk;
}

I used to be in the camp that said the second way was better, but now I find myself writing the first way more often.  And if someone asks me which is better, my answer will be “it depends.”

The reason why it depends is because if your audience isn’t used to the conditional operator, they’ll probably find code that uses it confusing.  (They’ll have to parse the vocabulary rather than focusing on the story.)  But, if your audience is familiar with the conditional operator, the long version with an if statement, will seem drawn out and like a complete waste of space.

The other piece of information to gather from this observation is the value of having a large vocabulary in a programming language and having a solid understanding of that vocabulary and grammar.

The English language is a large language with a very large vocabulary and a ridiculous number of grammatical rules.  Some people say that it should be easier and have a reduced vocabulary and grammar.

If we made the English language smaller, and reduced the complex rules of grammar to a more much simple structure, we’d make it much easier to learn, but we’d make it harder to convey information.

What we’d gain in reduction of time to mastery, we’d lose in its power of expressiveness.

One language to rule them all?

It’s hard to think of programming languages in the same way, because we typically don’t want to invest in a single programming language and framework with the same fervor as we do a spoken and written language, but as repugnant as it may be, the larger we make programming languages, and the more complex we make their grammars, the more expressive they become and ultimately—for those who achieve mastery of the vocabulary and grammar—the more readable they become. (At least the potential for higher readability is greater.)

Don’t worry though, I’m not advocating the creation of a huge complex programming language that we should all learn… at least not yet.

This type of thing has to evolve with the general knowledge of the population.

What we really need to focus on now is programming languages with small vocabularies that can be easily understood and learned, even though they might not be as expressive as more complicated languages.

Eventually when a larger base of the population understands how to code and programming concepts, I do believe there will be a need for a language as expressive to computers and humans alike, as English and other written languages of the world are.

What do you think?  Should we have more complicated programming languages that take longer to learn and master in order to get the benefit of an increased power of expression, or is it better to keep the language simple and have more complicated and longer code?



Published at DZone with permission of John Sonmez, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Sebastián Open replied on Wed, 2013/04/17 - 9:00pm

Sorry, but I think your reasoning is fundamentally wrong.

It's even more clear with the example you gave.

Just like when you write, you should think about and average ideal reader that has more or less your same knowledge. That is, you should write code you feel anybody with a fair knowledge can be comfortable with.

Otherwise, we should all write code (and letter, songs, and even speak) like if everybody were a two years old dyslexic child, just in case.

We should aim at beginners learn to read code, by reading well written, clear code.

Anybody knowing an if-then-else structure can grasp the meaning of the ternary operator in no more than 20 seconds, max. After that, the rest of the lines is plain boilerplate and noise.

I would only use the second form if I want to add a comment about what I'm trying to achieve.

On the other hand I would face the problem from a communicative approach. I'm not just talking to a compiler, I'm trying to communicate my intentions to other human be (probably myself in a couple of months!)

The problem is that communicating your ideas to other human beings is a whole different skill often neglected by IT people.

Just my two cents, but it's an interesting subject you brought here...

Hikari Shidou replied on Sat, 2013/04/20 - 3:51pm

 As you said... ok, I got you, but what's the point?


I'm not a teacher. I develop software that must have all sorts of quality attributes.

I'm not gonna write it as a retard, or as if the VM executing it is a retard, just because some noob may come touch on the code.


I mean, I'm not in school. I'm in real world. If a noob is unable to handle real world software, he should go to school man up first, to only then come touch real world software.

Lund Wolfe replied on Sun, 2013/04/21 - 4:29am

I definitely prefer a simple language, like Java.  Complexity will come from the requirements or a bad design.  It doesn't need any help from the language and language level constructs/patterns.

A simple design is the most important part of readability.  Unreadable code can be rewritten chunk by chunk, and if it isn't buggy you may not care (or only care that the class and method names are readable).

The high level design and low level code should read like a children's story in the language of the business domain.  It is rare, but you'll know it when you see it.  It shouldn't take a genius programmer (junior or senior) to maintain an application without breaking it.

Leslie Jarrett replied on Wed, 2013/04/24 - 8:08am

 As a software designer with over 35 years in the Computer Industry, I would like to add my t'take' on software readability.

Firstly the coder may fall into different mindsets perhaps depending on where they work.

Do they code in a style that is only easily understood by themselves thus making their presence in the organisation seem more important.  Do mangers and users end up hearing the well used phrases such as ' I am the best person to look after this software as I am the only one who fully understands it'. 

Do they code in a style that is very well documented and readable and opens up the software to a far wider audience of software developers who all would be able to look after the code.  The issue they may have with that is that by coding this way they become easy to replace within an organisation.

These are the two extremes of course and probably most code fall somewhere in-between depending on what the software is.  By this you can take factors such as lifespan of the the software , maybe its a one type fix or conversion piec of code that is done just to resolve an issue.  Other times where the software development is high on the end users radar then the developers will ensure that they get the kudos etc for bringing the software to life and thus a sense of self preservation steps in.

Another aspect of readable code is what I call 'forward engineering trade-off' .

Let me explain ,  software has to change because the end-user and the  business has to change from time to time.  If you only coded a routine to evaluate 10 different case scenarios in such a way that  it would be harder to add the eleventh case then it might be short and quick to read but less maintainable.  Also some software developments in the same market area such as say Finance will possibly be similar across varios department, products and functions of that business , so again coding may be done in a style that reflects portability of code (cloning) with a small change to create a new function. For example if you had a date calculation rioutine that banded date ranges by weeks years and months and then you wanted specific periods , you could clone the first one and make the second one provided the structure is such that this type of  software engineering could be done from the way the coding was originally made.


Gary Berosik replied on Wed, 2013/04/24 - 10:16am

Overall, this seems like an argument for the development and use of well-designed Domain Specific Language!

John Sonmez replied on Wed, 2013/04/24 - 10:33am in response to: Gary Berosik

Yes, I agree.  That is definitely one good solution.

Kurt Schultz replied on Wed, 2013/04/24 - 11:31am

I work with multiple languages and often need to look up operators (like the shorthand if-then-else in your example) and syntax.  I have no problem doing that and could easily work with either of your examples because the intent is clear.

What I do have a problem with is trying to figure out what someone was thinking when they wrote the code.  Here is an example based on source code in a real project (I shortened the actual message down to "xyz" for this example):  <code>

if (logContents.Contains("xyz"))
  {
  log.Warn("xyz");
  }
  else
  {
  log.Fatal("xyz");
  } 
</code>
It is fairly easy for anyone to read this and understand the logic, but what was the reason for doing it in the first place?  In this case, the simple comment below could have explained the reason and given others insight as to whether the code was doing what was needed or perhaps a clue of how to re-write it to accomplish the intended goal another way.

<code>// If this message has already been logged then just log again as a warning; otherwise log as fatal in order to generate e-mail. (The logging framework is configured to send e-mail for fatal events,  but not for warnings.  We only want one e-mail for this event.)</code>

Regarding your theory about vocabulary and grammar, I think that communication should be the primary objective.  Write to communicate - not to impress someone with your vocabulary. For a good example, see http://sheffieldischoolresearchers.blogspot.com/2011/08/notes-for-discussion-on-academic.html

Charlie Hubbard replied on Wed, 2013/04/24 - 1:04pm

I felt like your article made some valid points about language constructs and beginners ability to understand them.  I particularly like the following:

My point is, the larger the vocabulary you have, the more succinctly ideas can be expressed, thus making them more readable, BUT only to those who have a mastery of that vocabulary and grammar.

If I can give another example I find formalized math equations very hard to penetrate.  But I'm better than your average developer and I have no problem understanding math in code.  I find learning math through code much simpler than reading formalized papers because I understand the language it's phrased in much better.  Plus I don't have to read 4 pages of english to grok it.  Formalized math is a very succinct language, but it's really hard for me to read because its so expressive, and I'm not comfortable with the language.  (Wikipedia Math editors I'm looking at you!  Wikipedia Math used to be comprehensible, but now it's just as bad as my math books sitting on my shelf.).

But, over the last 5-7 years we've been moving faster and faster to more complex constructs because its so tedious to express, say the power of closures or mixins, in languages with smaller grammars.  It's really hard to put the functional genie back in the bottle because it's addressing the tedium for the journeymen developer who gets tired of writing the same block of code over and over.

I don't suspect the move to functional languages has lost any steam due to this readability problem either.  However, the last time we went through a simplification was moving away from multiparadigm languages like C++ and Perl to Java.  Java most certainly limited the language of the grammar for this exact reason to increase readability.  And rightly so!  Perl was incomprehensible and C++ programmers went nuts with operator overloading creating bizarre uses for them.  But at the same time Java addressed a lot of the headaches the journeymen developers were having so they had a reason to switch.  Ruby swung the pendulum back towards more expressive languages when it showed how closures, mixins, and OO combine to reduce the repeated code in your program.

We can't just say "Hey let's create smaller vocab languages because larger vocabularies are too much for the beginner developer." and expect it to change. Us masters like these functional languages because they reduce our workload and really change how much we can get done quickly.  Which leads me to believe you'd need to find a new paradigm for expressing code that yields both expressiveness and readability in order to affect the status quo.  

From the 1970s till now we really haven't introduced any new programming constructs.  (I credit Smalltalk for OO others credit Simula so if you're the later 1960s and LISP freaks will say 1950s - whatever). Most of the advancement in commercial settings have taken the form of demonstrating how those constructs change expressiveness not introducing new constructs.  We're running out of language constructs to use in the future to reduce complexity.  Crap that turned out longer than I'd have liked.

Ryan Dew replied on Wed, 2013/04/24 - 1:35pm

Eventually when a larger base of the population understands how to code and programming concepts, I do believe there will be a need for a language as expressive to computers and humans alike, as English and other written languages of the world are.

I think this is a cool idea, but think it would be completely unsustainable. Look at the problem politicians, lawyers and judges are facing today. In many cases they can't come together to agree on the precise meaning of English phrases. Who would be appointed to maintain and standardize such a complex language? 

One the other hand, I think we'd get much more benefit creating many simple context based Domain Specific Languages. Perhaps we could have more standardization around DSLs so moving from one to another would not be so drastic.

Christine Dorothy replied on Wed, 2013/04/24 - 9:32pm

I agree with what you say on it depends on our target audience. However I think I miss the objective of your article / argument. It intrigues me because I've been thinking about good and readable code for a long time (and not to mention commenting on those that I've seen). After reading your article, I'm now left wondering what kind of audience I'm targeting. And I think I can't answer that question.

Having said that, all this while I have always resulted in clear comments. Because I find it may just help other developers understand.

Knut W. Hansson replied on Thu, 2013/04/25 - 1:47pm

Excellent point about the readers abilities! As a professor teaching new students programming, I constantly find myself drawn between two goals:

  1. Writing code that the students understand
  2. Writing code as a role model

A small example can illustrate the choices:

<blockquote>public boolean chkName_A(){
    String text;
    boolean returnTrue;
    text = txtName.getText();
    if (text.equals("") == true) {
      returnTrue = true;
    }
    else {
      returnTrue = false;
    }
    return returnTrue;
 }

  // Me: We don't need the variable 'text', and we never write '== true' as proven by
  // truth table earlier

  public boolean chkName_B() {
    boolean returnTrue;
    returnTrue = txtName.getText().equals("");
    if (returnTrue) {
      return true;
    }
    else {
      return false;
    }
  }
  // Me: We don't need the variable 'returnTrue'
  public boolean chkName_C() {
    if (txtName.getText().equals("")) {
    return true;
    }
    else {
      return false;
    }
  }

  // Me: We don't need 'else' since 'return false' can only be reached if
  // the sentence 'return true' hasn't been executed
  // (We don't need the brackets for if either, but 'good programming style
  // guidelines' insist that we should put them in to facilitate later additions.)   

  public boolean chkName_D() {
    if (txtName.getText().equals("")) {
      return true;
    }
    return false;
  }

  // Me: Some have seen the ternary operator on the Internett and will write   
  public boolean chkName_E() {
    return (txtName.getText().equals("")? true : false);
  }

  // Me: When the expression is true we return true, otherwise false
  // so let's simply return the result of the expression
  // This is how I would like to write it myself. This is so simple that the question arises
  // as to why we would need a special function for this. Possibly we are thinking of
  // further name controls to be added later?
 
public boolean chkName_F() {
    return txtName.getText().equals("");
  }</blockquote>

The first - chkName_A() is a typical beginner's code. The comments shown as we move to chkName_B() are examples of what I could explain.

The point is that in suggested solutions to assignments, I have to chose between the above. My solution to this problem, is that I write the last in my solution. In class show them the first, then I walk the students through the explanations in the hope that this will fullfill both my goals: Understanding and role model.

What do you practionioners think of this stategy?

~Knut W. Hansson
Associate Professor IT
Buskerud College, Norway


John Sonmez replied on Thu, 2013/04/25 - 2:41pm in response to: Knut W. Hansson

Awesome!  That is a fantastic example.  You have identified the problem I am trying to point out in my post.

Philippe Lhoste replied on Fri, 2013/04/26 - 3:59am

Knut, being an active member of the Processing forum, I very often see code like your first example! Actually, I think it can be seen in some of the examples of this graphical framework used by many teachers to teach the bases of programming...

I find myself often preaching to ditch the if (a == true) style, first because it is redundant (and, as a seasoned programmer, less readable!), second because many beginners make the error to make a single = which is the only case Java can tolerate, so it is harder to find the error!

Same for if (a) b = true; else b = false so painful to my eyes, but probably more explicit for a beginner.

I like this article because it highlights the gap between beginners and seasoned practicers. You can see something similar with some languages: when I started to learn Scala, I was confused by the syntax some people used because it was so terse, using constructions unusual for my eye. On the other hand, experienced Scala programmers would find them very readable, showing the logical flow of data processing.

Now, of course, IMO, production code should be written toward seasoned programmers: beginners are here to learn, to progress, and to eventually find this code very readable. They might stumble upon some constructs, like a 5 year old reader can stumble upon some difficult words, but with experience they will fly over the code.

There will be always beginners, but fortunately, that's a temporary state (or they switch to another kind of job!).

Ricky Clarkson replied on Fri, 2013/04/26 - 9:15am

"An experienced developer is more focused on the actual concept being expressed by the code—what the purpose of the code is, not how it is doing it."

Well, no.  He's trying to fix something in it most likely, so he cares how it works.

Edmund Stephen ... replied on Tue, 2013/04/30 - 10:31pm in response to: Knut W. Hansson

You've both glided over what I consider to be the real issue: Readable code has documentation, both in comments and in the code itself, that matches the level of comprehension of the reader. 

The code is the lowest level, but higher levels are usually required.  You can write a program where there are no comments, and claim that the class, method and variable names are self-documenting.  I won't believe you.  You can write a program where every line is commented, and claim that's perfect.  I won't believe you there, either (unless it's assembly, in which case it might be justifiable, because the meaning of ax changes with time).

John's assembly language example contains two levels: comments at the functional level, and instructions that implement it.  The C#/Ruby examples contain enough appropriate-enough semantic information that no extra layer is needed.

Knut's final example -- as presented -- contains two layers too:
<blockquote>
// This is how I would like to write it myself. This is so simple that the question arises
// as to why we would need a special function for this. Possibly we are thinking of
// further name controls to be added later?
public boolean chkName_F() {
      return txtName.getText().equals("");
}
</blockquote>
The lowest level is the code; the higher level is the meta-comment "we would need a special function for this" which is a worthwhile *actual* comment.  Remove it, and we get:
<blockquote>
public boolean chkName_F() {
      return txtName.getText().equals("");
}
</blockquote>
... and many readers will ask that very question: why are we doing it that way?

Ma Bass replied on Sat, 2013/05/11 - 7:47pm

Interesting article.  I agree that good names and only do 1 thing, don't repeat yourself etc.,

Understanding your audience is important.  But don't we expect our less experienced programmers to mature and progress in their skills?  Is writing to the lowest denominator the plan?

On of my issues with reading code is that my brain needs to understand the problem and the solution in layers.  First I need to understand the problem and the big components and how the problem is solved generally.  Then I am ready to start looking at each piece to see how that portion of the problem was addressed.  But when reading code, what I am usually confronted with is a mess of trees.  I see hundreds of files in packages (which are usually not structured in a way that assists understanding) and no good way to get a tool to give me that big picture with drill down capabilities.

So, for me,  the problem isn't at the individual file level. It's at that functional subsystem/packaging level.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.