Though I must wonder: how complete is it? What does it and does it not support? It's at least complete enough to be self-hosting, but beyond that? The code doesn't use that much of C.
Judging from the comments in c4.cpp, it probably only supports enough of a subset to compile itself.
Granted, while building a parser that can parse (let alone compiling) the full C language is nontrivial, any undergrad should be able to build a parser and compiler for a sufficiently simple subset of it. (In my undergrad, we used this subset to build a "compiler" in second year: https://www.student.cs.uwaterloo.ca/~cs241/wlp4/WLP4.html)
You can build a C parser in an afternoon. It only has a few language constructs. Declarations are the hardest. Scanners are readily available for expressions and constants.
C is not a simple language as a CIL guy says [1]. I wrote my own C compiler [2] and I can say that writing a parser was harder than I thought. It would take more than half a day at least.
That first link is almost all about language semantics, not parsing issues.
As for your example, I'll be solomonic and say that you and Joe are right, of sorts (though I do think it'd be more than an afternoon).
It's certainly far more than a days work if you handwrite a lexer and parser that does the amount of additional work that yours do (AST construction; a lot of error reporting and sanity checking). But you can get very far with C very quickly if you use parser generation tools and have prior experience writing compilers and your goal is "just" to get something to parse it as quickly as possible - it's a tiny language.
Of course, in practice most real compilers don't use these parser-generation tools exactly because things like proper error reporting etc. is far harder, and a simple recursive descent parser is so much easier to work with.
You can put whatever code you want into the Flex phrase rules. The 'lexer hack' can be implemented there. That's what I do whenever I have to parse something like C.
Exactly. You have to drop hooks into the lexer from the parser, and by the time you're done you end up with just as much code. Only it's slower than a recursive-descent parser would be, and a lot of the time parsing speed really matters because it shows up as user-visible latency.
While I have a strong dislike for lex/yacc and descendants, the "hooks" you need for C are trivial. As far as I remember the only thing you need is an ability for the lexer to check whether or not a given identifier is a variable or type.
Even that is only needed if you want to report more specific information up to the parser. E.g. Clang doesn't. In Clang it is instead the parser that looks up the information in order to figure out what type of identifier it has received.
And what are the alternatives? I've always wanted a parser object (in C++ anyway) that I can add constructions to, feed it scanner output and have it build a symbol table and semantic tree. Does such a thing exist?
Though I must wonder: how complete is it? What does it and does it not support? It's at least complete enough to be self-hosting, but beyond that? The code doesn't use that much of C.