On a semi-related note: I find it bit strange that C, language that is widely used for over 30 years, has no defacto standard container library. I mean, it's bit ridiculous that a 4kloc project uses 1kloc to create something as basic as a dynamic array.
It's nice that Go offers some improvements over C, but is it enough?
Porting a program written in one of the "lowest level" languages still in use today to one of the newest "modern" languages which claims high level features and having only 20% code size reduction doesn't seem like a big win to me, especially if all the saving attribute to only one feature (better arrays).
Maybe it is just because it was port, and not a reimplementation in idiomatic Go style.
I would certainly like to see other solutions to the same problem written in idiomatic Go, Clojure, Haskell, C++ and Scala, just for the comparison.
Enough? It is rather the observation that if you use the same parsing approach, more or less, in a language which looks like C, then the possible improvement is rather small.
There are two things at work here: how small is the parser? And second, how fast is the parser? I know you can write some extremely fast highly idiomatic parsers in Haskell though. But I doubt they will be faster than hand coded C.
(I'm the author of the article and also a heavy Python programmer).
That's not an easy question to answer.
If you were to do a faithful port i.e. using using the same techniques as C/Go code, it would be very close. upskirt uses a traditional lexing/top-down-parsing approach. There's nothing in Python syntax that makes writing such code more compact than in Go. It's a bit tedious to write but the benefit is that (in C/Go) it gives the best speed because it minimizes the number of times each source character is looked at.
If you were to use a different approach e.g. brute-forcing your way through the text several times with regexpes, which is the most popular way of doing that in dynamic languages, the code clocks at ~2 thousand lines (like the implementation I use for my home-grown blog system https://github.com/kjk/web-blog/blob/master/markdown2.py)
If you were to use this technique in Go, the code would probably end up smaller than the manual lexing/parsing approach used in upskirt, but it would be significantly slower.
Interestingly, upskirt approach would probably be slower in Python than the regexp approach, because regexpes are heavily optimized C code and looking at individual characters of the string in Python code isn't particularly fast due to python interpretation overhead.
Most dynamic languages use a C Regular Expression library. That's why they can do so well on Regular Expression benchmarks.
V8 is a notable exception. They use their code generation pipeline to JIT compile regexps to machine code, making it the fastest regexp engine around bar none (I think).
I'd like to put in a word for CL-PPCRE, the Common Lisp regex library. It does the same except using the CL optimizing compiler, so also achieves amazing speed.
Libraries don't appear in the Comp.Lang.Benchmark Shootout, so CL doesn't do so well.
What about LPEG? It has a superset of the functionality of most regexp engines, yet is able to handle many things linearly that they would handle in quadratic time.
Performance-wise, any regex-based implementation may spend a lot of time backtracking and re-parsing data that a DFA hand-coded parser would handle in linear time.
I'd like to point out that there are libraries (PyPEG, PyParsing, and PyMeta) which take EBNF-style descriptions of grammars and produce decently fast parsers, PEG-style. PyMeta's my personal favorite, since it both takes a straight text description, and also uses OMeta instead of ENBF, turning parse and lex into a single operation.
Of course, the downside is that formal grammars are often a bad fit for custom languages of the web, which tend to have incomplete/undecideable/context-dependent grammars. This would work for Markdown, though, since it does have a parseable grammar.
Now that upskirt has been ported to Go as a library, it's also 5 lines.
BTW: if people want to do markdown in Go, the library to use is https://github.com/russross/blackfriday. It's a different one that the code I ported but Russ Ross did exactly the same thing at almost exactly the same time and was a little bit ahead, so after I discovered his work, I decided to contribute to his code instead of maintaining my own, almost identical, project.
Two, writing a converter is way above my head and quite likely theoretically (and practically) impossible. I've definitely had to make some decisions that I don't think even the cleverest compiler could (like noticing that array.[c|h] and buffer.[c|h] could be replaced with native Go arrays/slices easily so I didn't have to port that at all, just change the callers to Go equivalents).
I interpreted his comment as "it doesn't look like you learned Go."
If you are porting direct C logic to Go you could be missing a lot of potential idioms that could differentiate Go from C or python further. You said yourself it was easier to implement parsing with slices.
I think a more effective task is "understand what the 4k lines of C do, and then throw it away and re-write it as if it was written in Go in the first place." Your miles may vary.
I think you mean compiler - a parser just outputs an AST. Anyway, while some of the transformations are probably mechanical, I imagine that the semantic analysis required to recognize the C implementations of a growable array would be more effort than just doing it by hand.
Seems to me comparison to C++ might be more interesting. Wonder how that would be given there are 'growable arrays' and such. There's better (not perfect) safety given scoped_ptr, string classes, etc.