Hacker News new | past | comments | ask | show | jobs | submit login
The Code Documentation Fallacy (canonical.com)
187 points by beagle3 on Oct 17, 2013 | hide | past | favorite | 129 comments



I think the article hits on the wrong conclusion. Don't write less documentation because you're going to assume it will all suck anyway. Insist on writing better documentation instead.

Having said that, I find it helpful to write documentation before writing the actual code. Specifically for more complex code pieces for which the behaviour is not immediately obvious.

For me, writing documentation serves as a form of 'rubber duck debugging'[1] before the actual bugs occur. Explicitly writing out the intention of a piece of code in plain English often makes the concept much clearer in my brain and immediately brings out possible problems with my initial design. Problems I can fix before wasting time iterating through code implementations.

This is also the reason I very much enjoy writing thorough READMEs for each library I produce. These explain in abstract concepts what the entire library API is intended to accomplish. Additionally, I try to include actual usage examples. As with code-level documentation, this brings up possible problems before they occur.

The fact that it makes it clear what the code does, months after I last worked on it, is entirely bonus.

[1]: http://en.wikipedia.org/wiki/Rubber_duck_debugging


The best kind of documentation is that which is checked by the compiler and guaranteed to be correct - the code itself. Code that is well written, in good style using sensible variable names, can be as descriptive as good comments. Using a good static type system allows you to encode properties into your code that are guaranteed to be valid.

I certainly think some comments have their uses - but these are generally at the level of how systems and modules work, and the concepts used therein, as discussed elsewhere in these comments. I agree with the article that only 5% (or less) of functions need individual comments attached.


I've been maintaining some code written with this philosophy, and I find it lacking.

Code is very good at answering "How" but often the reader needs to know "Why" or "Why not".

In fact, the maintainer of your code will rarely be reading it to figure out how it is working - almost always the next person looking at your code will want to know why it is not working, or will be attempting to change the behavior.

Comments can guide as to pitfalls that you've avoided in your implementation and can answer the all-too-frequent question, "What were they thinking?!"


Another problem: It seems like developers in the "clean code needs no comments" crowd are also the ones least likely to write clean code in the first place.


Clean code needs no internal comments, but that's very different from whether software contracts (which are inherently outside the code operation and therefore need comments) need comments.

I strongly dislike extraneous comments. But even some clear code needs some external communication alongside it.


Yes. At this point, I really only write "why" or "why not" comments.


I used to believe in "self-documenting code," but it takes far longer to read through code than it does to read a comment on the code, and you're far more likely to come away with an intuitive and correct understanding, and with far less effort, by reading a comment than code. When reading actual code, your understanding of how the piece of code might be incorrect, which could domino to bigger errors down the line. Also, it might be easy to overlook some critical line in the function which significantly affects its behavior.

Perhaps most importantly, most of the time you don't need to know how something works, only what it does. Unless you have a reason to doubt that the function does, in fact, do what it says it does, it's much nicer to have a few lines of comments -- written in human language -- to describe the inputs and outputs of some function, or the reason we're invoking some function here, than to have to actually go and read through the code of a function. In fact, reading through the actual code can sometimes impede understanding, because of the reasons I stated above. Now with some functions, this purpose can be entirely expressed in the function signature, but with more complicated functions, that's unlikely.


Exactly. This "code is the documentation" mentality is simply laziness or an excuse for poor engineering.

How about "the bridge is the documentation" for civil engineers? Or "the house is the documentation" who needs building plans?

Documentation also has to cover larger scale interactions, that is how objects interact with each other and how they fit into the design.

All that said, in a large software project you need to pick your battles. Maintaining the same level of documentation across the board and throughout the life of the software is very difficult. Make sure though that your core is well documented and you keep that documentation up to date. Libraries and APIs used externally also need to be well documented.


The equivalent of source code in civil engineering (and engineering of physical things in general) is the drawings/plans, not the actual bridge. In practice it turns out that drawings for physical objects are often even worse at comments than software, in part because leaving comments on CAD models is so much less convenient than in code.


Not exactly. The drawings I've worked with had lots of annotations on them (e.g. dimensions and manufacturing instructions). I agree it's not a perfect analogy but perhaps uncommented/undocumented source code lies somewhere between the physical thing and the drawings (or CAD file). My point though is that in theory every bit of information can be observed from the physical object it is a poor way of storing that information.


This is only valid in narrow situations, for example when the person reading the documentation is reading them in the code, and when the code is part of a monolithic (or other single-tech) application.

The code you look at to find the solution might be one 20 or 30 line chunk of Ruby that performs a service for a chunk of 10 year old VB or 20 year old Perl or 30 year old C, or some chain of several languages. A support guy, or apps-level documenter, or maintenance programmer adding a feature, or architect integrating with another system, or business integration consultant helping to decide where the business needs to invest, or some other decision maker really, really doesn't have time to read through 40,000 lines of code in several languages to find out how a feature works.

For example: one of my first jobs was with an established big brand with many years of legacy data and organic "enterprise" systems, integrating data produced by an AS400 green-screen application into VB (on a Windows box) by copying (via FTP on a SCO box) a fixed-width text file produced by a shell script on the AS400, and parsing it so we could put it into Oracle for processing by a C++ application with API hooks into a Nortel Meridian coms system. When an outbound call goes to the wrong number, where's the bug?

Even if I'm reading _my own_ code 6 or 36 months later, I'm much happier if I've logged the checkins correctly so that it narrows it down to which dozen or so of many thousands of commits touched a feature. Whenever I've had to track down someone else's bug, or tried to justify the technical justification for some business decision the system makes, or tried to write high level progress documentation (think changelog for senior managers), the commit messages make the difference between it taking two weeks and taking two years (i.e. never happening).

It's easy to think, in the post-codial glow when you're fresh from the zone, that there's no way this code isn't absolutely obvious. I've been that guy. I've also been the guy that cursed that guy for making it hard to find the needle in the haystack. I've even been both guys separated by 18 months. Commit messages can make the difference between getting it done in 20 minutes, and looking at "code that is well written, in good style using sensible variable names" for a solution for two days.


What happens when someone makes the assumption that the comments are up to date , correct and unambiguous? They get fired.


Who is assuming that? The comments let you write your code quickly, and then of course you test to make sure it works. Comments do not obviate the need for testing, they just mean you don't have to (if up to date) read tons of source code.


Who are you writing the comments for? If yourself as pseudocode, fine, but get rid of them when you are finished. If you are writing them for another developer down the road, that developer will either ignore them because he really doesn't know for sure the comments can be trusted, or will blindly trust the comments and now and then will get burned.


> The best kind of documentation is that which is checked by the compiler and guaranteed to be correct - the code itself.

Which lends your documentation to being hard to read for a certain percentage of developers, varying depending on your project.

This strategy, while easy for developers experienced in the target language and familiar with the code, can have real negative consequences if the people involved in the project are of a different fluency level with the language or even CS in general, or new to the project.

Q: What I just coded is obviously quicksort, so why should I label it?

A1: Because not everyone is used to seeing quicksort implemented manually in C, assembly, python, etc.

A2: Because without knowing at a high level what you are trying to accomplish, it's much harder to ascertain whether that bug you just found is really a bug or an interesting feature which will come into play a page later.

A3: Because knowing immediately that this chunk of code does NOT pertain to some specialized code to select items after sorting saves time and cognitive load.


You can't go around writing comments for the lowest common denominator. Insanity ensues.


I don't think I'm advocating that in any way, just at a minimum sanely peppering the code with markers. to keep with the quicksort example, a simple single line comment at the top would suffice:

/* standard quicksort on foo before we make our choice below */

That hardly seems like insanity to me.


Maybe not insanity, but I would find it a little noisy reading that code, like having to code at a desk near the receptionist.

I wouldn't mind if the comment said "using qucksort because n is expected to be large". But only if the choice of quicksort over some other algorithm was deemed significant.


My point is really to just define the block as containing quicksort when the function it's in does other things as well. Refactoring it into it's own function with a useful name would be just as (or probably more) useful. It's really that lack of either which I see as insufficient


Code is not self documenting. I worked with a bunch of Rubyists that thought this (I love ruby, btw) and I wanted to strangle every single one of them.

The parent comment nails this. These guys who thought "my code is clear therefore self-documenting" were some of the worse system designers and myopic thinkers in the company. What's worse, is this attitude usually extends to "my code is simple and therefore doesn't need to be tested."


Ruby is dynamically typed, so doesn't have the power of a type system to encode invariants. It's very hard writing self-documenting code in a dynamically typed languaged.


That kind of code doesn't explain all the different ways of accomplishing the piece's goal you tried and why they were lacking.

Some random developer comes at a later time, thinks this code is more complicated than required, refactors it and only then sees why you didn't do it that way. This happened to me many times (both in the "original dev" role and the "random future dev" one).

Comments can easily clarify why this particular piece of work is implemented in this manner and not the others you tried, saving a lot of time to the future devs.


I don't think you disagree with the article here. Most code should be straightforward and no comment of the sort you describe is necessary.


No. The best kind of documentation is the kind that explains the non-obvious decisions taken in the code. Or the kind that outlines the interactions between function x and states 1-3. Or the kind left behind by some poor bastard archaeologist who comes in after the fact to fix the appallingly opaque ball of hair I pooped out under terrible deadline pressure.

Documentation has multiple purposes, and multiple audiences, and a good static type checker can't do anything for most all of them.


I think you are neglecting the "API" case. If I want to call some code that you wrote that I think might solve me problem, I have to have access to your source, read all the source code, implement the machine state in my head, and figure out what you are actually doing? No thanks.

Note I don't necessarily mean an externally facing API, which I would assume you'd agree needs good documentation. Even if we are on the same team, I don't want to have to read your code just to figure out which of the frob_XXX functions to call. Maybe that is what you meant by your second paragraph?


A simple example comes from the article itself. Parameter 'strength', the strength of the frognication. What is the range of that? Is it 0 to max int? min int to max int? 1-100?

(A real-life example would be a function that outputs an image in JPEG or PNG and takes a quality parameter. I notice that within the same library often one type of image wants 1-100 where another type wants a different scale, like 0.0 - 1.0. The parameters have the same name.)

This is not something that can be checked by most type systems. Therefore it needs to be in the documentation.


> A simple example comes from the article itself. Parameter 'strength', the strength of the frognication. What is the range of that? Is it 0 to max int? min int to max int? 1-100?

[...]

> This is not something that can be checked by most type systems. Therefore it needs to be in the documentation.

Or, alternatively, we need better type systems. (Of course, it can still be in "documentation", just documentation that can be automatically generated from code that is also given real effect by the compiler, and thus documentation that can't get out of sync with the implementation.)


This depends entirely on your language and environment. Working on embedded systems, where memory is at a premium, the way an algorithm or data structure is coded up may be entirely non-intuitive, merely to get under the memory limits or to eke out a bit more performance. Good documentation + good code is far better than just good code.


Even good code rarely describes by function and variable names alone what intent and business purpose is being provided. Code is the 'how', documentation of some other form usually provides the 'why' (comments, specs, tests, external documentation, whatever).


A well documented code is a combination of both comments and good variable/methods names. Comments should explain any assumptions made, exceptions, or sample input format etc. The variable and method details,including logic explanation , should be take care by better naming and clean code.


Thank you for saying that, I sometimes feel like I'm taking crazy pills when I hear some of the inane arguments against thorough documentation.

I was hoping this was going to be about the real "documentation fallacy:" 'Documentation tends to be of low quality, therefore it is best to avoid writing much documentation.' One common instantiation of this is "thorough documentation is bad because it will inevitably fall behind the code and be inaccurate."

People fall into the trap of assuming there is something inevitable about bad docs. Yet they never assume there is anything inevitable about bad code, even though most of the code in the world is, objectively, complete shit!


I think this whole thing sits in the same boat with error handling and unit tests. Many folks tend to see these as something separate from programming. Some kind of side effect that isn't a lot of fun and is therefore best ignored. "Hey, I can write me some code.. and oh yea.. there's also this bit of stuff I should do, but I'm busy writing the next big thing."

It helps to start thinking of all this as one and the same. No single part of it is more or less important. If you are writing code, you are writing documentation, you are doing correct and thorough error handling and you are producing consistent and relevant tests. There is no difference.


I won't argue the less vs more, that's an age old debate. However, this part:

"Explicitly writing out the intention of a piece of code in plain English often makes the concept much clearer in my brain"

I think is critical for me. I actually write code by first doing a pseudo-code pass of comments, where I just write the flow of what I think the code should be doing. Then go back and fill in the actual functionality behind the comments. Naturally, its not always perfect on the first pass, but you just mod the comment thought process to update your approach, and then refill the functionality. As a programmer, you can then skim down through sections just checking what its "supposed" to do, whether you're a newbie diving in, or the original writer who's just needing a refresh.


>Having said that, I find it helpful to write documentation before writing the actual code.

Do you write/have a technical spec? I think that's what you are describing. If I first and only write the comment, there can often be a disconnect between what the code does right now and what it would ideally do once I'm finished. On the other hand, a spec plus an accurate comment keeps everything in order.


The README is mentioned is more or less a lightweight specification of the public parts. Sometimes it warrants more detail and I add an actual SPEC document which goes into great detail.

I have not had much trouble with the comments diverging from the final implementation of a piece of code. But I have forced myself into a habit of re-reading through the documentation regularly once I've committed a chunk of new code. Just to ensure it all still does what it says on the tin. This takes extra time, but together with learning how to write decent commit messages, this has helped me keep things sane and organized.


Yup. Literate programming + clear not clever + less is more (doc and code must all serve a purpose.)


This of course assumes you have enough time to do this.


My problem with code documentation is that documentation is done at the function or class level. When I'm looking at new code I would prefer a "concept of operations" describing how the whole thing works together rather than piecemeal function documentation.

This is especially important with open source code. I'm not going to donate my time to working with an existing codebase if it is going to take hours to figure out how it all pieces together. Examples are fine but what I really need to know is the why. At least when I put up with this at work I'm getting paid by the hour.


The old classic literate programming paradigm helps with this. It allows one to write documentation that gives you an overview, provides you with whatever ordering and connections you feel appropriate and, with my implementation of it, even takes care of most of the tool chain. https://npmjs.org/package/literate-programming


this brings a smile to Knuth's face


+1 to this, whether I'm making a quick fix or intend to do some significant work, if I'm diving into an unfamiliar project I am always overjoyed if I find 'concepts and metaphors' documentation.


I opened the comments with an intention to write basically the same thing.

Higher level documentation - classes, packages, groups of packages - makes the project much more approachable. It answers the question "Here's a 3 levels deep hierarchy - where do I start, how are the pieces connected to each other?"

Documenting classes is fairly common, but only the public API. I'm not only interested in how to use the class, but also how does it work internally, what is the inner architecture.


Same.

Often I find myself wishing that I could see things as a sequence diagram. I've never worked anywhere where functions were commented with useless English descriptions of the parameters and so on, so I don't really know if this is something that people really do. What I do wish I had was more high-level, visual representations of a system when I'm trying to learn how it functions.


Java actually does quite well with this, but I rarely see it used in the wild: package documentation. http://www.oracle.com/technetwork/java/javase/documentation/...


In Go, you can put a comment above the package clause in any of a package's source files (including an otherwise empty one) and the tools will pick it up. e.g.:

http://golang.org/src/pkg/fmt/doc.go -> http://golang.org/pkg/fmt/


This is one reason why I really like Doxygen, in the areas where it's supported. It's not only very easy to generate easy-to-read documentation for the code while writing it (specialized comments, with operators to call out special meanings and so forth), but it's also very easy to create high-level documentation from exactly the parts that contribute to it. (Section, page, etc. commands.) Not to mention, then generate a nice, easy HTML stack of it all with pages, search-ability, etc.

I've never really agreed with the whole "code should be just be obvious when read." The problem with any large set of instructions is that both the instruction, order, combinations, and other artifacts reflect the experience, background, and environment of the author. Two developers of largely equivalent experience and talent rarely come up with the same set of instructions for the same task.

Consider if I told you how an engine functioned as a means of telling you how to change a head gasket.


The documentation for Flask does this and it is really good.


> When some coder changes the function, it is very easy to forget to update the comments

It isn't.

All public APIs should have documentation, even if you believe it's obvious what they do. This documentation never goes out of date because once you release your API it tells you what you cannot change. If you changed the code so that your documentation is now wrong - this code change is a bug and you should fix it. Because there's other code in the wild that relies on the behavior that you promised.

Of course in practice you do have to change that behavior every once in a while. But this should be a big deal (that usually includes bumping up version numbers, mentioning it in release notes, etc). If you're changing it so often that updating the damn comment is an issue you either document implementation details that don't belong in API documentation or your API is unstable crap and nobody should be using it.

PS. Complaining that API documentation gets out of sync with the code is like complaining that unit tests break when you change the code. Duh - that's what they are there for!


You can change an API without removing functionality. Sometimes you want to add functionality. Or sometimes the context around the API changes. Like for example, a python API might not function the same in Python 2.5, Python 2.7, and Python 3.

All I know is, I help maintain a very large set of public APIs that my team is very resistant to changing, and yet somehow the docs are still out of date.


Is that because you're not updating your docs?


> It isn't.

Well, whether it is or isn't depends on how good the docs are and how well you write them. Yes, in many cases it can be easy to forget if the documentation is not woven in well enough.

> All public APIs should have documentation, even if you believe it's obvious what they do.

As a note part of the function of such documentation is to establish standards for what is acceptable in terms of expected input and output handling. What this means is that if documentation defines the code contract, then the first thing you look at when debugging is the API's documentation. Then, if it matches what you are doing, you might dig deeper.

What this gives you is not debugging by comments (something K&R rightly hated) but asking which side the violation of code contract is on. If the documentation doesn't match what you are doing with it, then the violation is on your side. If it does, then the violation may be on the API's side. The goal here is to define where changes can most productively be made.

> Because there's other code in the wild that relies on the behavior that you promised.

That's exactly right. More specifically the API documentation is the promise.

> If you're changing it so often that updating the damn comment is an issue you either document implementation details that don't belong in API documentation or your API is unstable crap and nobody should be using it.

The thing is it took us a long time to get our documentation approach right in LedgerSMB. It was a struggle that really only I think reached something I am happy with 5 years into the project. A lot of our public SQL API's are not documented actually, because they are dynamically discovered at run-time and are minimalistic (and consequently the developer contracts far more vague than the API conventions, so it isn't always clear what belongs in the documentation since it is all dynamically looked up anyway), but our Perl code is very well documented and I am very happy with that.

For the SQL though, it's written with documentation generation scripts in mind and therefore the question is what you can document on top of what is already there in the system catalogs.


> Complaining that API documentation gets out of sync with the code is like complaining that unit tests break when you change the code.

But that can be a valid complaint as well. There is an undeniable cost of maintaining tests, and it shouldn't just be taken for granted that the cost is worth it.


In both cases, you shouldn't be documenting and testing the internals if your functions, but the code contract, i.e. that it does what you have agreed with other developers that the function will do.

The question is not whether to test or whether to document, but what to test and what to document.


I don't disagree that public APIs should be well documented. However, I believe it's very easy to forget to update the comments. Especially in scripting languages without type safety. There is little to remind the programmer that arguments were added, types were changed, argument order was modified, behavior was changed, return value was changed, locking behavior was changed, memory allocations were changed, etc. Obviously unit tests should help catch many of these changes. However, it's still up to the coder to remember to change the comments. When you're behind on a deadline or you're juggling 50 different API changes because you're still alpha I'd say it's pretty easy to forget to update a comment.

See http://api.jquery.com/jQuery.ajax/ if you want an example of on API that could easily have a few typos in it. Who is checking to make sure it's 100% in sync with the code 100% of the time? Not trying to say this example has bugs in the docs but there's a LOT of behavior described that could be out of date.


But the solution is to make sure that the documentation is of a sort that is useful in determining where to fix things. Again, if you have a problem with a function call, the fist question should be "does your call to the function match the documentation?" If it doesn't the first thing you do is change the call to match.

Now sometimes one comes to the conclusion that an API is broken, so you have to modify the comments, and then modify the code, but there is a reason to do it in this order.

When you modify the comments you are modifying a set of promises you have made to other coders. This allows you to think through how this change is going to work and how it will affect other code out in the wild. Then, when you modify your code, it is going to be better.


If you're forgetting to update the comments frequently enough to be a problem, you're probably forgetting to update the code that consumes the API too. That quality in general falls by the wayside when you're crunching is no reason to abandon quality altogether.


Forgetting to change the code that consumes the API means that things don't work, which is visible right away. Forgetting to change the comments will be visible gradually and cruft accumulates. Also, sometimes comments wind up being non-local and just aren't noticed. I agree that in some senses it's "no excuse" but that doesn't mean it won't happen. Documentation that can break visibly when things change is better than static documentation.


> Forgetting to change the code that consumes the API means that things don't work, which is visible right away.

Oh how do I wish this were the case!

> Documentation that can break visibly when things change is better than static documentation.

Agreed, but I would not call your average run-of-the-mill unit tests "documentation", nor can all documentation can be programatically tested.


> > Forgetting to change the code that consumes the API means that things don't work, which is visible right away.

> Oh how do I wish this were the case!

It's certainly not always the case, but it's a whole lot more likely to be the case than for unchecked documentation.

> > Documentation that can break visibly when things change is better than static documentation.

> Agreed, but I would not call your average run-of-the-mill unit tests "documentation",

I think "is it documentation" is probably more of a spectrum than any particular threshold, and run-of-the-mill unit tests probably do fall on this spectrum though I'd probably agree that they're not particularly far along it (though that surely varies with the habits of those writing the tests).

> nor can all documentation can be programatically tested.

As a practical matter, that's certainly currently the case - tooling is not set up for testing documentation, and there are things we'd want to check that would be hard to check in any event. Theoretically also, there are certainly properties that can't be statically demonstrated. I'm not entirely convinced that there's nothing we're interested in that couldn't eventually be got at for the programs we care about, though it's certainly a possibility. Regardless, it seems an ideal worth pushing towards, and if tested documentation is interwoven with untestable documentation such that some conceptual locality is preserved it's less likely (though absolutely still possible, to be sure) that you'll forget to update the other when you're forced to update the one.


Don't you see the difference between changing a comment that doesn't break a build and breaking code that breaks the build because a test fails?


Which are you suggesting is worse?


Well, the point is that it's EASIER to make hard to detect errors in code comments than it is to break tests.


>> When some coder changes the function, it is very easy to forget to update the comments

> It isn't.

Yes, it is. Have done it myself loads of times.


First, I think more documentation always beats less documentation, assuming reasonable quality. What the article is getting at is the idea that documentation for the sake of documentation never results in any quality.

When I write a new module for LedgerSMB (I won't vouch for older code either by myself or others) I actually start writing the documentation. The reason here is that the documentation is written primarily to establish the contracts under which the code operates. This includes concept of operation documentation as well. It isn't just aimed at other programmers. It is aimed at documenting the code contracts so that it is clear what are acceptable operations at first.

So if there is a fallacy it is not that more code documentation is better (since that is often true, IMO), but rather that telling people to document for the sake of building documentation works.


"What the article is getting at is the idea that documentation for the sake of documentation never results in any quality."

X for its own sake is rarely good.


One thing to bear in mind is that documentation serves two different audiences.

One audience is the people who will use your code. For them, every external method should be properly documented (sure, use a tool for this). And make sure it's good enough that (barring debugging situations, because you wrote perfect code) your "users" never have to look inside your code. (If I think about this in C++, I think you should be able to look at a properly commented header and never read the code)

And then there's the poor slob who's going to come in and debug/fix your code some day. He may not need to have every method doc'd but he damn sure needs to know what's tricky, what's interesting, where the gotchas are, etc.


I've done plenty of dynamic language work where some documentation would go a long way. In Javascript, every library function should explain what all optional (i.e. undeclared or reinterpreted) parameters do, what gets returned, and likewise for all callbacks, plus `this` points to. It's very stylish for some insane reason not to do this, and it drives me insane.


Don't get me started on Javascript. It's a problem with dynamic languages in general - the signature does not document what interface the parameters should implement (eg, a parameter 'file' may be a file path or a file handle, but you'll only know that by looking at the implementation), but it's particularly egregious when nothing tells you at first glance which parameters are optional.

I've now started to document public functions. As my classes usually have few public functions, but many private ones, the code to comment ratio remains acceptable, though maintaining the documentation remains a challenge.


Agreed completely about how awful dynamic languages can be in this respect. I've found what helps a lot is consistent naming of parameters. For example, numberOfWidgets (an integer) versus widgets (a collection) or file (a file handle) versus path (a string).


Sure, consistent naming helps a lot (though I'm partial to short-yet-readable alternative like widgetCounts). Sadly, sometimes ambiguity is unavoidable. And other times, you're just dealing with bad code.

But consistent naming does not help with the "optionality" of parameters.


In that sense, dynamic languages often have a jump on their functional counterparts. Relying on the type annotations for all documentation is probably the other side of the coin. I understand that generic combinators sometimes need sufficiently general argument names, but even then, there's often some semantic meaning that can be attached with a self-documenting variable name. Enough with the 1- and 2-letter variable names!


It's JavaScript, they're all optional.

Just start passing a single object parameter or using the arguments object and force everything into the documentation. Then it'll be just like 90% of libraries that depend on jQuery, especially once the code has been patched, updated, and maintained a year beyond the documentation.


I agree. Some Python APIs aren't clear on what the parameters to functions are. It's frustrating to not know what type is even expected - yes, Python is dynamic, and one parameter can take on multiple types, but the documentation should still say which types are allowed, and what the behavior will be.


Documentation does not exist for your own benefit. It exists to help other people on the team, or the programmer that inherits your code, quickly understand your code. The key word here is quickly. Yes, a programmer can trace code & figure it out. On a large chunk of code, however, that is exceedingly inefficient. Whether or not documentation is obvious to the person that wrote it is a very poor test for the documentation's utility.

I used to hold a similar opinion -- "Document the non-obvious". The problem is that in a project of sufficient size, almost everything can slide towards non-obvious. Is price the base price, or unit price * quantity? Is the method name cancel_subscription_and_notify_customer really effective? Of course, I could use cancel_subscription, but then I'm not telling programmers about the email that goes out, or cancel_subscription_and_notify, but who am I notifying? The marketing department? Generally I find really descriptive method names to get unwieldy very fast. Further, if you say document the non-obvious, the tendency is towards zero documentation.

The value statement depends on how fast your team grows or changes, and the expected lifetime of the project.

If you are working on a project alone, and that will never change (e.g. it isn't something a business relies on), then you probably do not need documentation. This is also true if you are bringing on a dev a year, and the team size will always remain relatively small. Similarly, if the project is relatively short-lived (like a game), then dropping documentation could be a good idea. Maybe, I'd at least concede there are merits to doing so. Documentation isn't free, of course.

On the other hand, if you are working at a company that's trying to rapidly grow, needs to bring on devs quickly, or has developers moving from one project to another frequently, then I'd say documentation is very important. You are going to save your team a huge amount of time by taking a little time upfront to explain what you are doing, why, and the consequences of each method. Even simple methods deserve documentation for consistency sake.

If your documentation rots, then you handle it the same way as test rot. Make sure the team knows that docs are necessary, they need to spend the time on it, and if that means more time for features, so be it. I can say from experience that writing documentation after the fact is pretty gnarly.


The less that is described by the signature, the more necessary it is to document a function. In C++/C#/Java, while the parameters and return values are often fairly obvious, exceptions are completely undocumented by the type system (unless you use Java checked exceptions, which you shouldn't), and so should be documented manually (and asserted in unit tests, to ensure the documentation doesn't become incorrect).


This is an ever ongoing discussion between two kinds of programmers (documenting and non-documenting). Just decide for yourself or your team what works best. I myself like to even document other people's code after I see what the method does if the method isn't speaking for itself. If I am developing I want to see in my IDE when adding an existing method, a popup which tells me quickly about what the method does and the arguments it has and what it returns. Instead of (everytime) having to jump to the code to see what it does.

Just one simple (real life) example:

  public String convertText(String text) {
   return text.toUpperCase();
  }
This method is already named wrong in my opinion and should be refactored to something like convertTextToUpperCase to understand what it does without having to document. But if your methods get more complex I think a little comment on top of the method describing what's going on really cannot harm. Especially if the code is difficult to read for new people.

The point is in the end to keep the documentation in sync with the code and that takes indeed some effort. I myself always make documentation for a method in Java-doc style, so only above the method, if it's more complex than a simple getter/setter-method. I always tend to think in terms of the official Sun Java API-documentation, which I use(d) so often to know how all the classes/methods work, that it might also make my own code more readable/understandable when I or someone else has to work on my code if I have documented it. Inside the method code I try to comment little to not.

@snowwolf: I agree, but it's just an example to show that a method name should speak for itself


Just to comment on your example, that method shouldn't exist as it adds no value to the codebase - it is purely redundant code. Especially if you were to rename it to convertTextToUpperCase.

The only situation where the method would make sense is if you wanted to be able to change the implementation in the future (TitleCase, LowerCase etc.), in which case a better renaming would be covertTextForDisplayInTitles (i.e. Use the method name to comment why we need to convert the text). That has the added benefit of also telling you what the method does in your IDE just from its signature.


I was thinking about this recently, and wondered how other people think of this. Working in a high level language like Python, you can do quite a lot in one line, for example in a list comprehension.

Should a "productive" one liner be put in a separate method / function with meaningful name, or better a comment beside it?

I personally find short methods often decrease code readability, as you constantly have to jump around, and can't read anything from top to bottom.


> I personally find short methods often decrease code readability, as you constantly have to jump around, and can't read anything from top to bottom.

In general, if you have to jump into every method call to understand a method that calls other methods, the names are bad – probably too short and don't state intention.


My approach is frequently to "name" a complicated list comprehension by shoving it into a one or two-liner closure function in the current scope. Then you don't have to jump miles and the code which uses it becomes readable.


So what is the advantage of this approach over having a comment beside it? Sounds slightly more complex to me.


Because comments can lie.

Also, it can be harder to express the generator in a useful way in English than the code itself if the code is well written.

The purpose is to make the code as clearly self describing as possible. I would only do it if it made the code read more like a human language and remove too much complexity from one place.


Well named short methods lets you read the code on a higher abstraction level. I usually jump into methods to see what they do the first time I encounter them. Later on, the name (if well named) should tell me what it does, so I don't have to wade through the details of how it does it. That's part of what I'm getting at in "7 Ways More Methods Can Improve Your Program" http://henrikwarne.com/2013/08/31/7-ways-more-methods-can-im...


Well, I can really say what it should be...

But I put one-liners in a function when the one-liner is hard to read or too error prone (missing a detail won't lead to a compiler error, but to a bug).


"Cannot harm" is a fallacy. Every additional comment adds a maintenance burden.

Sometimes comments are worth the cost, but they should be a fallback to a fallback - ideally, the code should be self-explanatory. If that's not possible, unit tests should explain the usage and functionality - they're better than comments because the build system enforces that they're updated when the code changes. Only if you can't do that either should you resort to a comment.


As it is, it looks indeed quite dubious.

But if it had a specification comment added, it may be perfectly justified.

Remember, in programming, there's no problem that can't be solved by one additionnal level of indirection.

Here we have one level of indirection. What's not clear, is what problem it solves. This is what the comment should tell, or better, the name of the method. But perhaps we're in a context where converting things is the natural thing to do, and in this specific case, the convertion of text is a mere upcasing. Probably the conversion of numbers or the conversion of arrays will involve more work. Notice how I imagine (but leave unwritten) some specifications to justify this code. In a program those specifications should not be left unwritten.


Well, I think the code needs to be clear enough that you don't want to document unnecessarily. In general, I think code contracts should be documented, as restrictions on solutions that programmers may need to be aware of. But comments are not a substitute for clear code.


Probably all of us agree that being forced to comment everything will lead to some bad/useless comments. But the examples you showed were simply bad comments. Just because there are bad coders out there who don't care about the quality of their comments for whatever reason, isn't a reason to avoid comments. I'd treat anyone who wrote that first sample comment in a similar way I'd treat a programmer who writes unreadable code; that is, probably take the time to teach them some good commenting practices.

Many of the reasons that speak against commenting apply to good variable names, too. Maybe someone will come later and change the way the variable is used but won't change the name. It doesn't mean we should avoid descriptive variable names, though.

Also, with comments, as with any form of communication, the audience is the key. Let's say I'm a senior programmer somewhere and I'm writing comments. Often the train of thought seems to be "well, using this variable name/adding this comment clears it up for me". But that's usually not nearly enough for junior coders who are new to the codebase, and who are often the target audience. They'll probably still go "wtf" after reading a comment aimed at a senior programmer with an understanding of the codebase and a programming experience to match.

In addition, the obvious point to make is also that code is good at answering how, not why.

This is a bit of a pet peeve of mine, I guess since I've met relatively many coders who claim that good code should comment itself and ditched commenting altogether. Their code has usually ranged from above average to downright awful, and has, on average, been rather unreadable.


There's a principal/agent problem with documentation too. The writer of it rarely gets the benefit. Many times they will never see or meet anyone who does. But they have a lot of other competing priorities from highly visible requesters.

I am interested in examples of companies that get this right.


I agree with this assessment. Whenever I have the opportunity to get feedback from someone who has used my documentation, I make every effort to get it, and then update the documentation. It's gratifying to know that someone benefited from it.


"As a rough estimate, 95% of functions in any code base should be so simple and specific that their signature is all you need to use them."

Welcome to reality!

What is simple for one guy is really hard to understand for an another one.

What you expect is that every bigger function must be split into dozends of smaller functions, only to have a cleaner parameter part. But this make the code flow unreadable.


> What you expect is that every bigger function must be split into dozends of smaller functions, only to have a cleaner parameter part. But this make the code flow unreadable.

That's exactly what every good, experienced developer tries to do: Splitting complex stuff into more simpler functions that are easier to understand. Obviously the end result is not unreadable, on the contrary!


I'm surprised nobody has mentioned Master Foo and the Programming Prodigy. http://catb.org/esr/writings/unix-koans/prodigy.html


In many languages, the 90% comments from this article are "doc strings", while the 10% comments are actually comments. They have different uses, and should be treated accordingly.

I also don't understand the fear that doc strings will go out of date. At least with dynamic languages that's part of the point: if the doc string is wrong, then either the contract has changed and not been updated or the code is fulfilling the wrong contract. Both are useful things to know.

More often, the doc string is correct, and serves both as a guide to the code "here is what you are about to read" and as a quick summary. if you're trying to decide, say, between iterate-dirs and walk-dirs, that summary is perfect, while reading the code would be an annoying digression.


The "turing test" of comments.... If I can distinguish whether the comment was written by a human or generated by some auto-documenting software then the comment may be useful; if I cannot tell whether it was written by a human or auto-generated then it is useless.


...and for those who like concise quotes:

`If the code and the comments disagree, then both are probably wrong.' -- Norm Schryer

`Don't get suckered in by the comments -- they can be terribly misleading. Debug only code.' -- Dave Storer


My rules for good docs:

1. Begin with a usage synopsis in the form of sample code. It should hit the most important functions and show how they fit together. E.g. for a drawing library, show how to instantiate an image, draw a circle with arbitrary fill and stroke colors, and write the image to disk.

2. For each function or method, open your comment with a straightforward description of what the function does, even if it's absolutely, undeniably obvious. If you're writing an math library, you should even say what the sqrt function does. It doesn't hurt, it costs you very little effort, and you might help someone who's just beginning to learn about the problem domain.

3. For each parameter, document its possible types if your language doesn't encode that information in the function signature. Even if you're using something like Haskell where it does, you might need to comment on the type. E.g. if sin takes a float, say whether it's in degrees or radians.

4. Provide sample code for functions that have to be used in tricky ways. E.g. if the function requires special setup or context.


Ok, the problem is not the documentation. It's the specifications. This problem is exacerbed by management methods such as Agile/Scrum, where no specification document is built from the collection of task descriptions stored in Jira (and IF you are luck to have anything significant in the task descriptions, more often, from what I've observed, it's hand waving and shin pointing than anything precise written down).

And even if some specification document is written, it still remains the problem because when going thru all the phases of analysis design coding and debugging (whatever the period of the cycle you use), it is not updated!

Now we should probably distinguish API elements from internal implementation stuff (but the blog article mentions APIs).

When documenting internal stuff, unless you've developped internal APIs (which you should do!), the documentation can indeed be descriptive, to help maintainers orient themselves and avoid pitfals.

When documenting API, what you need mainly, is the specifications of the API. This will be the "contract" with the client code, and if there's a discrepancy between the API specification and the implementation, then it means there's a bug (somewhere, of course one could decide that the specifications where wrong, and need to update the specifications instead of the code). Most often it will be a bug in the code.

But the point is that either you have tools to track the specifications elements down to the line of code, so that when you create or modify a line of code, you have easy access to the specifications, or you put the specification in the docstrings (documentation comments) in the code, to get the same easy access. And note that this is a read/write access: specifications may need to be updated when the code is maintained.

So I would agree, write less documentation, write more specifications. Close to the code.


I tried a google search for "shin pointing" and only got pictures of people doing literally this.

I got it from context, obviously, but can you explain this turn of phrase to satisfy my curiosity?


i meant it quite literally. You know, in real life meetings, with body language, odors, hand waving and all that non verbal communication. If everybody knows what we're taliking about, then what's the point of writing it down? Yeah, right. Come back six month later and with half the team turned over, and see how useful thos old Jira tasks with no writte specifications will be useful...


Oh, I see. I've always known "hand waving" to be a Star Wars reference. i.e. "These are not the droids you're looking for."


So basically the author is making the argument that comment quality is correlated with quantity.

As far as I can see, this is backed exclusively with "but the ones that do probably...".

Let's just say I 'm not convinced yet.


I think the argument being made is that comment quality is more important than quantity. The author gave an example where the code with less documentation had more useful descriptions in order to illustrate this point; I don't think he necessarily meant that one caused the other.


I can't see why 90% of the functions cannot be properly commented. In the example, the "good" documentation is, by the way, probably less work than the verbose "bad" one.

And, in my experience at least, very few times the comments are out-of-date with the code. If that's the case, once is detected, it should be treated as a bug. Fix/remove the comment.


That's something we can all agree on, but then why bring quantity into the discussion? I prefer lots of useful comments over few useful comments.


To paraphrase The Incredibles, if everyone is special, than no one is.

If 10% of the functions are commented, I will assume that those 10% are more important or have more error prone or dangerous usages. If every function has a boilerplate comment, I lose that information.


You can catch garbage comments in code reviews just as easily as you can catch other forms of programming garbage during review. The fact that someone might make a lot of bad comments is not a good argument against commenting, it's a good argument against bad commenting.

I also think that if people are in the habit of having to comment it makes them more likely to document things like expected values of the input, what the return could be expected to be, and if there are any caveats.


Nobody is arguing for boilerplate. The idea that less is more, holding quality equal, is nonsensical. Of course, as documentation isn't free, you can't hold quality equal, so it's immaterial -- nobody is ever going to be presented with that choice in real life.


At its best, documentation (or a subset thereof) can and does serve as a cross-reference -- like a cross-reference in an important calculation. If the two results don't correspond, you know you have a problem. (Even if you have to explore both paths to learn what the problem really is and where it lies.)

Something to consider, the next time you find yourself inclined to complain about documentation. Is it the documentation, or the fact that it's not useful documentation?


I personally favour only commenting on the unusual. I also think that if you are doing TDD then the tests should do a good job of documenting the code. I can't count the number of times someone has asked me to explain some code I have written where my first response is "okay, let's look at the tests".

All that being said, it really depends on the scenario. My second last project was a shrink wrap product development that ran a million dollar plus accompanying piece of hardware. This project quite rightly required us to write thorough and consistent comments and docs. Expensive product, complex code, high risk = thorough docs. I am currently working for a small rapidly growing and changing business, where the software is mostly internal. Low risk, quick development required, constant change, low complexity = waste of time and money creating and maintaining good docs


I don't get it, having a few comments scattered about in the source code is better than having a separate document that fully and succinctly explains the application API?

And what's to stop you from putting the comment about race conditions in the Doxygen-style comment?


Single-to-noise ratio.

You can put it in there, but I probably won't read it. I basically assume that all javadoc/doxygen-style summary/@param/@return-style docs are entirely noise.

I much prefer the "docstring" style comments, especially though in Clojure, or Python.

Examples...

Clojure:

    user=> (doc map)
    -------------------------
    clojure.core/map
    ([f coll] [f c1 c2] [f c1 c2 c3] [f c1 c2 c3 & colls])
      Returns a lazy sequence consisting of the result of applying f to the
      set of first items of each coll, followed by applying f to the set
      of second items in each coll, until any one of the colls is
      exhausted.  Any remaining items in other colls are ignored. Function
      f should accept number-of-colls arguments.

Python:

    Help on built-in function map in module __builtin__:

    map(...)
        map(function, sequence[, sequence, ...]) -> list

        Return a list of the results of applying the function to the items of
        the argument sequence(s).  If more than one sequence is given, the
        function is called with an argument list consisting of the corresponding
        item of each sequence, substituting None for missing values when not all
        sequences have the same length.  If the function is None, return a list of
        the items of the sequence (or a list of tuples if more than one sequence).

And on the far other end of the spectrum, here's some C# / MSDN docs:

    Syntax

      public static IEnumerable<TResult> Select<TSource, TResult>(
          this IEnumerable<TSource> source,
          Func<TSource, TResult> selector
      )

    Type Parameters

      TSource
        The type of the elements of source.

      TResult
        The type of the value returned by selector.

    Parameters

      source
        Type: System.Collections.Generic.IEnumerable<TSource>
        A sequence of values to invoke a transform function on.

      selector
        Type: System.Func<TSource, TResult>
        A transform function to apply to each element.

    Return Value
      Type: System.Collections.Generic.IEnumerable<TResult>
      An IEnumerable<T> whose elements are the result of invoking the transform
      function on each element of source.

    Usage Note
      In Visual Basic and C#, you can call this method as an instance method on any
      object of type IEnumerable<TSource>. When you use instance method syntax to
      call this method, omit the first parameter. For more information, see
      Extension Methods (Visual Basic) or Extension Methods (C# Programming Guide).

Entertainingly, this is only one of many overloads in the C# version. The Clojure and Python functions are variadic with parallel traversal of collections. Those languages spend their precious docs space covering the corner cases.


What about this?

List.map : ('a -> 'b) -> 'a list -> 'b list

In languages like OCaml, or Haskell the signature provides a lot. In this particular case, if you think about it, you conclude that it's almost impossible to build any other implementation: You get a function from 'a to 'b and a list of 'a thingies. Now how on earth do I use these to get to a list of 'b thingies? No documentation needed IMNSHO ;)


Apply reverse, tail, etc. on the result of 'map' and you get something of the same type. Types don't tell all.


What if you use a vector whole length is indexed by a Natural in the type signature? It would still allow permutations and such (like reverse), but not tail.

Using a dependent type system, it would actually be possible to statically ensure that map can only map, and do nothing else whatsoever, but it does become rather unwieldy.


I personally think that leaving a function-level comment snippet empty is better than either filling it with useless information, or not having it at all. It serves as visual separator, and it's ready for me to fill when I have something to write.


So true! This is what happens when comments are produced because some coverage goal.

It's sad how comments here are being (on purpose?) misrepresenting the point. Another methodology jihad?

Edit: LOL! inmediate downvote. I have my answer now.


As an amateur coder I started commenting in my code only recently. There are two main reasons for this, to enable others to read my code more easily, and secondly to enable myself to read my code more easily.

printf("If you use C macros"); int _i=1729; while(_i--) printf(" that are defined in terms of other C macros"); printf(", you will be in a world of hurt " "unless you have good documentation written by a human.\n");


TL;DR: The people may be stupid, or may do a messy work, therefore it must be the methods they use or the requirements that are faulty in the first place.


Better TL;DR: Would you rather have 10% of functions commented or 90%? Author argues that 10% is better, if all functions are documented then documentation is obvious, noise, and often wrong. Document the non-obvious cases.


Would you rather go to one extreme or to the other one?

Please downvote me again.


My emacs+doxymacs doing it semi-automatic - I dont know how, but hope these tags @param will show flyhelp at mouse pointer when moving over params. Eclipse doing that but not emacs currently. Its all that I need. Often forgot parameters description and orders.

But agree often documentation older than code. Its very fast older and add many noise.


Here is a box.

You can choose to have it filled 90% of the way or 10% of the way. Most would choose 90%, right? But guess what—

THE BOX IS FILLED WITH BEES!


If I'm a beekeeper, that might be what I want...


I think the effectiveness of guessing what the function does by its parameters' types and its return type depends on whether or not the function performs side effects. That said, if it doesn't, or can't (even better), then the name and types should be documentation enough.


OP is a bit extreme in his conclusions, but not entirely incorrect either.

Insofar as I've experienced, there's a mostly finite amount of goodwill that will go into commenting. Extra comments once that limit is reached tend to be sloppy or useless.

Essentially the same for unit tests.


i think this is still a half way house.

code is documentation. bad source code is hard to read, or in practical terms - slow to read. good source code has minimal comments and documentation.

writing documentation before you actually need it is often a waste of time as well. at this point surely you have a design to follow, or something other than a brainless application of 'documentation' to store that valuable information into?

as many commenters already point out what you almost always want is very high level documentation, and I think this should be present as a design (assuming you have one), if you don't have one then you need a wiki or something. Something that doesn't pollute your code with garbage...


Saw a group pitch a product that was like RapGenius for code documentation, don't think it went anywhere? Both writers and users could add their commentary in a layer that was separate from code, seemed like a clever way to address documentation?


Personally, I would rather a useless comment (which I will ignore if the endpoint signature is self explanatory) than no comment at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: