But this penalizes programmers who like to use long readable names. I'm not one of them (though I used to be), but they have a strong case here.
Take any program. Replace all the names with the smallest possible character sequences. Have you made the program simpler? Or smaller in any meaningful way? Surely not. I'd say what you've done is left its logical structure precisely intact (another way of saying that token count is a good metric) while reducing its readability.
This metric relies on the assumption that people are trying to produce readable code. IMHO long variable names are much more helpful in complex codes than simple ones.
Ok, but now I'm wondering if we have opposite views of code size. In my view, code size is bad bad bad. More code means more complexity. Any time you add code, you're subtracting value; it's just that (if it's good code) you're adding more value than you're subtracting. So a higher score in a code size metric is a bad thing to aspire to, and we should greatly favor approaches to writing software that -- all other things being equal -- lead to smaller programs. I don't think that programmers who use long names for readability should have their programs discounted as longer (and thus more complex). Just because their names are longer doesn't mean their programs are.
No no no. My logic is this: Take tight, readable code with short names a replace them with long names, and you'll have worse code. The converse isn't true because complex (bad) codes are more readable with long variable names.
Complexity -> Code Size
Code Size -> Long Variable names (win for big codes)
Complexity is bad
Therefore long variable names are a symptom of a problem, but not the problem themselves. Long variable names aren't bad, but they are still a
good predictor of badness. Since size metrics are meant to predict badness, long identifiers should increase size metrics.
Oh, I see. You sound like an APLer. We have similar tastes, but many good programmers disagree, so I doubt that long variable names are a predictor of program badness. Not every long name is FactoryManagerFactoryManagerFactory.
Consider a language like K, in which variables usually have one-letter names. The real code-size win for K is not that. It's that the language is so powerful that complex things can be expressed in remarkably compact strings of operators and operands. (Short variable names, I'd argue, are an epiphenomenon. It's because the programs are so small that you don't need anything longer, and longer names would drown out the logical structure of the program and make it harder to read.) Token count is a good metric here. Both line count and byte count come out artificially low, but token count can't.
I came back to say I've thought about your argument a couple more times and I think you're on to something there. The idea that long variable names, even when they add to readability, are a secondary indicator of code badness (because the code is too complex not to be able to get away with short names) is a subtle and interesting way to frame the problem. I'm surprised it didn't get more pushback from the 95+% of programmers who take the opposing view. I suppose this little corner of the thread is a quiet enough backwater that nobody noticed.
But I still don't see how you get around the objection that, according to your preferred metric, if you replace all the names with arbitrarily small character sequences, you get significantly smaller code - yet clearly not better code.
Take any program. Replace all the names with the smallest possible character sequences. Have you made the program simpler? Or smaller in any meaningful way? Surely not. I'd say what you've done is left its logical structure precisely intact (another way of saying that token count is a good metric) while reducing its readability.