I think the article hits on the wrong conclusion. Don't write less documentation because you're going to assume it will all suck anyway. Insist on writing better documentation instead.
Having said that, I find it helpful to write documentation before writing the actual code. Specifically for more complex code pieces for which the behaviour is not immediately obvious.
For me, writing documentation serves as a form of 'rubber duck debugging'[1] before the actual bugs occur. Explicitly writing out the intention of a piece of code in plain English often makes the concept much clearer in my brain and immediately brings out possible problems with my initial design. Problems I can fix before wasting time iterating through code implementations.
This is also the reason I very much enjoy writing thorough READMEs for each library I produce. These explain in abstract concepts what the entire library API is intended to accomplish. Additionally, I try to include actual usage examples. As with code-level documentation, this brings up possible problems before they occur.
The fact that it makes it clear what the code does, months after I last worked on it, is entirely bonus.
The best kind of documentation is that which is checked by the compiler and guaranteed to be correct - the code itself. Code that is well written, in good style using sensible variable names, can be as descriptive as good comments. Using a good static type system allows you to encode properties into your code that are guaranteed to be valid.
I certainly think some comments have their uses - but these are generally at the level of how systems and modules work, and the concepts used therein, as discussed elsewhere in these comments. I agree with the article that only 5% (or less) of functions need individual comments attached.
I've been maintaining some code written with this philosophy, and I find it lacking.
Code is very good at answering "How" but often the reader needs to know "Why" or "Why not".
In fact, the maintainer of your code will rarely be reading it to figure out how it is working - almost always the next person looking at your code will want to know why it is not working, or will be attempting to change the behavior.
Comments can guide as to pitfalls that you've avoided in your implementation and can answer the all-too-frequent question, "What were they thinking?!"
Another problem: It seems like developers in the "clean code needs no comments" crowd are also the ones least likely to write clean code in the first place.
Clean code needs no internal comments, but that's very different from whether software contracts (which are inherently outside the code operation and therefore need comments) need comments.
I strongly dislike extraneous comments. But even some clear code needs some external communication alongside it.
I used to believe in "self-documenting code," but it takes far longer to read through code than it does to read a comment on the code, and you're far more likely to come away with an intuitive and correct understanding, and with far less effort, by reading a comment than code. When reading actual code, your understanding of how the piece of code might be incorrect, which could domino to bigger errors down the line. Also, it might be easy to overlook some critical line in the function which significantly affects its behavior.
Perhaps most importantly, most of the time you don't need to know how something works, only what it does. Unless you have a reason to doubt that the function does, in fact, do what it says it does, it's much nicer to have a few lines of comments -- written in human language -- to describe the inputs and outputs of some function, or the reason we're invoking some function here, than to have to actually go and read through the code of a function. In fact, reading through the actual code can sometimes impede understanding, because of the reasons I stated above. Now with some functions, this purpose can be entirely expressed in the function signature, but with more complicated functions, that's unlikely.
Exactly. This "code is the documentation" mentality is simply laziness or an excuse for poor engineering.
How about "the bridge is the documentation" for civil engineers? Or "the house is the documentation" who needs building plans?
Documentation also has to cover larger scale interactions, that is how objects interact with each other and how they fit into the design.
All that said, in a large software project you need to pick your battles. Maintaining the same level of documentation across the board and throughout the life of the software is very difficult. Make sure though that your core is well documented and you keep that documentation up to date. Libraries and APIs used externally also need to be well documented.
The equivalent of source code in civil engineering (and engineering of physical things in general) is the drawings/plans, not the actual bridge. In practice it turns out that drawings for physical objects are often even worse at comments than software, in part because leaving comments on CAD models is so much less convenient than in code.
Not exactly. The drawings I've worked with had lots of annotations on them (e.g. dimensions and manufacturing instructions). I agree it's not a perfect analogy but perhaps uncommented/undocumented source code lies somewhere between the physical thing and the drawings (or CAD file). My point though is that in theory every bit of information can be observed from the physical object it is a poor way of storing that information.
This is only valid in narrow situations, for example when the person reading the documentation is reading them in the code, and when the code is part of a monolithic (or other single-tech) application.
The code you look at to find the solution might be one 20 or 30 line chunk of Ruby that performs a service for a chunk of 10 year old VB or 20 year old Perl or 30 year old C, or some chain of several languages. A support guy, or apps-level documenter, or maintenance programmer adding a feature, or architect integrating with another system, or business integration consultant helping to decide where the business needs to invest, or some other decision maker really, really doesn't have time to read through 40,000 lines of code in several languages to find out how a feature works.
For example: one of my first jobs was with an established big brand with many years of legacy data and organic "enterprise" systems, integrating data produced by an AS400 green-screen application into VB (on a Windows box) by copying (via FTP on a SCO box) a fixed-width text file produced by a shell script on the AS400, and parsing it so we could put it into Oracle for processing by a C++ application with API hooks into a Nortel Meridian coms system. When an outbound call goes to the wrong number, where's the bug?
Even if I'm reading _my own_ code 6 or 36 months later, I'm much happier if I've logged the checkins correctly so that it narrows it down to which dozen or so of many thousands of commits touched a feature. Whenever I've had to track down someone else's bug, or tried to justify the technical justification for some business decision the system makes, or tried to write high level progress documentation (think changelog for senior managers), the commit messages make the difference between it taking two weeks and taking two years (i.e. never happening).
It's easy to think, in the post-codial glow when you're fresh from the zone, that there's no way this code isn't absolutely obvious. I've been that guy. I've also been the guy that cursed that guy for making it hard to find the needle in the haystack. I've even been both guys separated by 18 months. Commit messages can make the difference between getting it done in 20 minutes, and looking at "code that is well written, in good style using sensible variable names" for a solution for two days.
Who is assuming that? The comments let you write your code quickly, and then of course you test to make sure it works. Comments do not obviate the need for testing, they just mean you don't have to (if up to date) read tons of source code.
Who are you writing the comments for? If yourself as pseudocode, fine, but get rid of them when you are finished. If you are writing them for another developer down the road, that developer will either ignore them because he really doesn't know for sure the comments can be trusted, or will blindly trust the comments and now and then will get burned.
> The best kind of documentation is that which is checked by the compiler and guaranteed to be correct - the code itself.
Which lends your documentation to being hard to read for a certain percentage of developers, varying depending on your project.
This strategy, while easy for developers experienced in the target language and familiar with the code, can have real negative consequences if the people involved in the project are of a different fluency level with the language or even CS in general, or new to the project.
Q: What I just coded is obviously quicksort, so why should I label it?
A1: Because not everyone is used to seeing quicksort implemented manually in C, assembly, python, etc.
A2: Because without knowing at a high level what you are trying to accomplish, it's much harder to ascertain whether that bug you just found is really a bug or an interesting feature which will come into play a page later.
A3: Because knowing immediately that this chunk of code does NOT pertain to some specialized code to select items after sorting saves time and cognitive load.
I don't think I'm advocating that in any way, just at a minimum sanely peppering the code with markers. to keep with the quicksort example, a simple single line comment at the top would suffice:
/* standard quicksort on foo before we make our choice below */
Maybe not insanity, but I would find it a little noisy reading that code, like having to code at a desk near the receptionist.
I wouldn't mind if the comment said "using qucksort because n is expected to be large". But only if the choice of quicksort over some other algorithm was deemed significant.
My point is really to just define the block as containing quicksort when the function it's in does other things as well. Refactoring it into it's own function with a useful name would be just as (or probably more) useful. It's really that lack of either which I see as insufficient
Code is not self documenting. I worked with a bunch of Rubyists that thought this (I love ruby, btw) and I wanted to strangle every single one of them.
The parent comment nails this. These guys who thought "my code is clear therefore self-documenting" were some of the worse system designers and myopic thinkers in the company. What's worse, is this attitude usually extends to "my code is simple and therefore doesn't need to be tested."
Ruby is dynamically typed, so doesn't have the power of a type system to encode invariants. It's very hard writing self-documenting code in a dynamically typed languaged.
That kind of code doesn't explain all the different ways of accomplishing the piece's goal you tried and why they were lacking.
Some random developer comes at a later time, thinks this code is more complicated than required, refactors it and only then sees why you didn't do it that way. This happened to me many times (both in the "original dev" role and the "random future dev" one).
Comments can easily clarify why this particular piece of work is implemented in this manner and not the others you tried, saving a lot of time to the future devs.
No. The best kind of documentation is the kind that explains the non-obvious decisions taken in the code. Or the kind that outlines the interactions between function x and states 1-3. Or the kind left behind by some poor bastard archaeologist who comes in after the fact to fix the appallingly opaque ball of hair I pooped out under terrible deadline pressure.
Documentation has multiple purposes, and multiple audiences, and a good static type checker can't do anything for most all of them.
I think you are neglecting the "API" case. If I want to call some code that you wrote that I think might solve me problem, I have to have access to your source, read all the source code, implement the machine state in my head, and figure out what you are actually doing? No thanks.
Note I don't necessarily mean an externally facing API, which I would assume you'd agree needs good documentation. Even if we are on the same team, I don't want to have to read your code just to figure out which of the frob_XXX functions to call. Maybe that is what you meant by your second paragraph?
A simple example comes from the article itself. Parameter 'strength', the strength of the frognication. What is the range of that? Is it 0 to max int? min int to max int? 1-100?
(A real-life example would be a function that outputs an image in JPEG or PNG and takes a quality parameter. I notice that within the same library often one type of image wants 1-100 where another type wants a different scale, like 0.0 - 1.0. The parameters have the same name.)
This is not something that can be checked by most type systems. Therefore it needs to be in the documentation.
> A simple example comes from the article itself. Parameter 'strength', the strength of the frognication. What is the range of that? Is it 0 to max int? min int to max int? 1-100?
[...]
> This is not something that can be checked by most type systems. Therefore it needs to be in the documentation.
Or, alternatively, we need better type systems. (Of course, it can still be in "documentation", just documentation that can be automatically generated from code that is also given real effect by the compiler, and thus documentation that can't get out of sync with the implementation.)
This depends entirely on your language and environment. Working on embedded systems, where memory is at a premium, the way an algorithm or data structure is coded up may be entirely non-intuitive, merely to get under the memory limits or to eke out a bit more performance. Good documentation + good code is far better than just good code.
Even good code rarely describes by function and variable names alone what intent and business purpose is being provided. Code is the 'how', documentation of some other form usually provides the 'why' (comments, specs, tests, external documentation, whatever).
A well documented code is a combination of both comments and good variable/methods names. Comments should explain any assumptions made, exceptions, or sample input format etc. The variable and method details,including logic explanation , should be take care by better naming and clean code.
Thank you for saying that, I sometimes feel like I'm taking crazy pills when I hear some of the inane arguments against thorough documentation.
I was hoping this was going to be about the real "documentation fallacy:" 'Documentation tends to be of low quality, therefore it is best to avoid writing much documentation.' One common instantiation of this is "thorough documentation is bad because it will inevitably fall behind the code and be inaccurate."
People fall into the trap of assuming there is something inevitable about bad docs. Yet they never assume there is anything inevitable about bad code, even though most of the code in the world is, objectively, complete shit!
I think this whole thing sits in the same boat with error handling and unit tests. Many folks tend to see these as something separate from programming. Some kind of side effect that isn't a lot of fun and is therefore best ignored. "Hey, I can write me some code.. and oh yea.. there's also this bit of stuff I should do, but I'm busy writing the next big thing."
It helps to start thinking of all this as one and the same. No single part of it is more or less important. If you are writing code, you are writing documentation, you are doing correct and thorough error handling and you are producing consistent and relevant tests. There is no difference.
I won't argue the less vs more, that's an age old debate. However, this part:
"Explicitly writing out the intention of a piece of code in plain English often makes the concept much clearer in my brain"
I think is critical for me. I actually write code by first doing a pseudo-code pass of comments, where I just write the flow of what I think the code should be doing. Then go back and fill in the actual functionality behind the comments. Naturally, its not always perfect on the first pass, but you just mod the comment thought process to update your approach, and then refill the functionality. As a programmer, you can then skim down through sections just checking what its "supposed" to do, whether you're a newbie diving in, or the original writer who's just needing a refresh.
>Having said that, I find it helpful to write documentation before writing the actual code.
Do you write/have a technical spec? I think that's what you are describing. If I first and only write the comment, there can often be a disconnect between what the code does right now and what it would ideally do once I'm finished. On the other hand, a spec plus an accurate comment keeps everything in order.
The README is mentioned is more or less a lightweight specification of the public parts. Sometimes it warrants more detail and I add an actual SPEC document which goes into great detail.
I have not had much trouble with the comments diverging from the final implementation of a piece of code. But I have forced myself into a habit of re-reading through the documentation regularly once I've committed a chunk of new code. Just to ensure it all still does what it says on the tin. This takes extra time, but together with learning how to write decent commit messages, this has helped me keep things sane and organized.
Having said that, I find it helpful to write documentation before writing the actual code. Specifically for more complex code pieces for which the behaviour is not immediately obvious.
For me, writing documentation serves as a form of 'rubber duck debugging'[1] before the actual bugs occur. Explicitly writing out the intention of a piece of code in plain English often makes the concept much clearer in my brain and immediately brings out possible problems with my initial design. Problems I can fix before wasting time iterating through code implementations.
This is also the reason I very much enjoy writing thorough READMEs for each library I produce. These explain in abstract concepts what the entire library API is intended to accomplish. Additionally, I try to include actual usage examples. As with code-level documentation, this brings up possible problems before they occur.
The fact that it makes it clear what the code does, months after I last worked on it, is entirely bonus.
[1]: http://en.wikipedia.org/wiki/Rubber_duck_debugging