Mr. Lott has been involved in over 70 software development projects in a career that spans 30 years. He has worked in the capacity of internet strategist, software architect, project leader, DBA, programmer. Since 1993 he has been focused on data warehousing and the associated e-business architectures that make the right data available to the right people to support their business decision-making. Steven is a DZone MVB and is not an employee of DZone and has posted 139 posts at DZone. You can read more from them at their website. View Full User Profile

Comments, Assertions and Unit Tests

09.24.2010
| 5355 views |
  • submit to reddit

See "Commenting the Code". This posting tickled my fancy because it addressed the central issue of "what requires comments outside Python docstrings". All functions, classes, modules and packages require docstrings. That's clear. But which lines of code require additional documentation?

We use Sphinx, so we make extensive use of docstrings. This posting forced me to think about non-docstring commentary. The post makes things a bit more complex than necessary. It enumerated some cases, which is helpful, but didn't see the commonality between them.
The posting lists five cases for comments in the code.
  1. Summarizing the code blocks. Semi-agree. However, many code blocks indicates too few functions or methods. I rarely write a function long enough to have "code blocks". And the few times I did, it became regrettable. We're unwinding a terrible mistake I made regarding an actuarial calculation. It seemed so logical to make it four steps. It's untestable as a 4-step calculation.
  2. Describe every "non-trivial" operation. Hmmm... Hard t0 discern what's trivial and what's non-trivial. The examples on the original post seems to be a repeat of #1. However, it seems more like this is a repeat of #5.
  3. TODO's. I don't use comments for these. These have to be official ".. todo::" notations that will be picked up by Sphinx. So these have to be in docstrings, not comments.
  4. Structures with more than a couple of elements. The example is a tuple of tuples. I'd prefer to use a namedtuple, since that includes documentation.
  5. Any "doubtful" code. This is -- actually -- pretty clear. When in doubt, write it out. This seems to repeat #2.
One of the other cases in the the post was really just a suggestion that comments be "clear as well as short". That's helpful, but not a separate use case for code comments.
So, of the five situations for comments described in the post, I can't distinguish two of them and don't agree with two more.
This leaves me with two use cases for Python code commentary (distinct from docstrings).
  • A "summary" of the blocks in a long-ish method (or function)
  • Any doubtful or "non-trivial" code. I think this is code where the semantics aren't obvious; or code that requires some kind of review of explanation of what the semantics are.
The other situations are better handled through docstrings or named tuples.
Assertions
Comments are interesting and useful, but they aren't real quality assurance.
A slightly stronger form of commentary is the assert statement. Including an assertion formalizes the code into a clear predicate that's actually executable. If the predicate fails, the program was mis-designed or mis-constructed.
Some folks argue that assertions are a lot of overhead. While they are overhead, they aren't a lot of overhead. Assertions in the body of the inner-most, inner-most loops may be expensive. But must of the really important assertions are in the edge and corner cases which (a) occur rarely and (b) are difficult to design and (c) difficult to test.
Since the obscure, oddball cases are rare, cover these with the assert statement in addition to a comment.
That's Fine, But My Colleagues are Imbeciles
There are numerous questions on Stack Overflow that amount to "comments don't work". Look at at the hundreds of question that include the keywords public, protected and private. Here's a particularly bad question with a very common answer.
Because you might not be the only developer in your project and the other developers might not know that they shouldn't change it. ...
This seems silly. "other developers might not know" sounds like "other developers won't read the comments" or "other developers will ignore the comments." In short "comments don't work."
I disagree in general. Comments can work. They work particularly well in languages like Python where the source is always available.
For languages like C++ and Java, where the source can be separated and kept secret, comments don't work. In this case, you have to resort to something even stronger.
Unit Tests
Unit tests are perhaps the best form of documentation. If someone refuses to read the comments, abuses a variable that's supposed to be private, and breaks things, then tests will fail. Done.
Further, the unit test source must be given to all the other developers so they can see how the API is supposed to work. A unit test is a living, breathing document that describes how a class, method or function behaves.
Explanatory Power
Docstrings are essential. Tools can process these.
Comments are important for describing what's supposed to happen. There seem to be two situations that call for comments outside docstrings.
Assertions can be comments which are executable. They aren't always as descriptive and English prose, but they are formal and precise.
Unit tests are important for confirming what actually happens. There's really no alternative to unit testing to supplement the documentation.

From http://slott-softwarearchitect.blogspot.com/2010/09/comments-assertions-and-unit-tests.html

Published at DZone with permission of Steven Lott, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)