HN2new | past | comments | ask | show | jobs | submitlogin

Sorry for a shameless plug! I read up a lot on this subject while working on [my own toy compiler](https://hackernews.hn/item?id=28262149).

Other well-known languages that have hand-written parsers are Rust and D.

And here are a couple quotes from compiler developers explaining their reasons for going with hand-crafted parsers.

Someone from the C# compiler’s team gave the following reasons [1]:

>Hello, I work on the C# compiler and we use a handwritten recursive-descent parser. Here are a few of the more important reasons for doing so:

>Incremental re-parsing. If a user in the IDE changes the document, we need to reparse the file, but we want to do this while using as little memory as possible. To this end, we re-use AST nodes from previous parses.

>Better error reporting. Parser generators are known for producing terrible errors. While you can hack around this, by using recursive-descent, you can get information from further "up" the tree to make it more relevant to the context in which the error occurred.

>Resilient parsing. This is the big one! If you give our parser a string that is illegal according to the grammar, our parser will still give you a syntax tree! (We'll also spit errors out). But getting a syntax tree regardless of the actual validity of the program being passed in means that the IDE can give autocomplete and report type-checking error messages. As an example, the code ` var x = velocity.` is invalid C#. However, in order to give autocomplete on `velocity. `, that code needs to be parsed into an AST, and then typechecked, and then we can extract the members on the type in order to provide a good user experience.

GCC actually used Bison for a long time but eventually switched to a hand-written parser, the team gives some reasons in the changelog [2]

>A hand-written recursive-descent C++ parser has replaced the YACC-derived C++ parser from previous GCC releases. The new parser contains much improved infrastructure needed for better parsing of C++ source codes, handling of extensions, and clean separation (where possible) between proper semantics analysis and parsing.

Some people argue coding parsers by hand is prone to errors. This line of reasoning certainly makes a lot of sense to me, but it's not that simple in practice for non-trivial grammars. For instance, ANTLR 4 emits a so called parse tree which you would like to convert to AST, if you have a somewhat complex grammar. This conversion takes quite a bit of code written manually, see AstBuilder.java [3] to get an idea (I'm not affiliated with the project). So, there's still plenty of opportunities for having errors in that amount of code. To be fair, creating AST-s is an explicit non-goal for ANTLR 4 [4].

[1]: https://hackernews.hn/item?id=13915150

[2]: https://gcc.gnu.org/gcc-3.4/changes.html

[3]: https://github.com/crate/crate/blob/5173b655a9fbf72028876ae7...

[4]: https://theantlrguy.atlassian.net/wiki/spaces/~admin/blog/20...



Run don’t walk away from ANTLR. Unless you are an intellectual masochist. I maintain a high performance Haskell like compiler for a living. Somebody added ANTLR to another tool I am now maintaining as well. It’s a maintenance nightmare.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: