Experience porting 4k lines of C code to go

zokier · on June 22, 2011

On a semi-related note: I find it bit strange that C, language that is widely used for over 30 years, has no defacto standard container library. I mean, it's bit ridiculous that a 4kloc project uses 1kloc to create something as basic as a dynamic array.

Groxx · on June 21, 2011

>The C code implements parsing by partying on char pointers.*

That's an excellent quote. And it describes a lot of C code I've seen (good and bad).

lylejohnson · on June 22, 2011

So what word do we think he meant to go there? Partitioning?

billforsternz · on June 22, 2011

He meant partying. It's called using the English language in a lively and interesting way.

Legion · on June 22, 2011

The name "Go" is easy enough to skim past even when authors do properly capitalize it in their post titles. Not capitalizing it is just being mean.

jemeshsu · on June 22, 2011

Agree "Go" is not particularly search friendly. I tend to use Golang instead when post or blog. #golang is the Twitter tag used by Rob Pike FYI.

markokocic · on June 22, 2011

It's nice that Go offers some improvements over C, but is it enough?

Porting a program written in one of the "lowest level" languages still in use today to one of the newest "modern" languages which claims high level features and having only 20% code size reduction doesn't seem like a big win to me, especially if all the saving attribute to only one feature (better arrays).

Maybe it is just because it was port, and not a reimplementation in idiomatic Go style.

I would certainly like to see other solutions to the same problem written in idiomatic Go, Clojure, Haskell, C++ and Scala, just for the comparison.

jlouis · on June 22, 2011

Enough? It is rather the observation that if you use the same parsing approach, more or less, in a language which looks like C, then the possible improvement is rather small.

There are two things at work here: how small is the parser? And second, how fast is the parser? I know you can write some extremely fast highly idiomatic parsers in Haskell though. But I doubt they will be faster than hand coded C.

veyron · on June 21, 2011

How many lines would the equivalent python program take?

kkowalczyk · on June 21, 2011

(I'm the author of the article and also a heavy Python programmer).

That's not an easy question to answer.

If you were to do a faithful port i.e. using using the same techniques as C/Go code, it would be very close. upskirt uses a traditional lexing/top-down-parsing approach. There's nothing in Python syntax that makes writing such code more compact than in Go. It's a bit tedious to write but the benefit is that (in C/Go) it gives the best speed because it minimizes the number of times each source character is looked at.

If you were to use a different approach e.g. brute-forcing your way through the text several times with regexpes, which is the most popular way of doing that in dynamic languages, the code clocks at ~2 thousand lines (like the implementation I use for my home-grown blog system https://github.com/kjk/web-blog/blob/master/markdown2.py)

If you were to use this technique in Go, the code would probably end up smaller than the manual lexing/parsing approach used in upskirt, but it would be significantly slower.

Interestingly, upskirt approach would probably be slower in Python than the regexp approach, because regexpes are heavily optimized C code and looking at individual characters of the string in Python code isn't particularly fast due to python interpretation overhead.

knome · on June 21, 2011

> because regexpes are heavily optimized C code

AFAIK, the python re module does not have a C language component. It is a pure python regular expression implementation.

kkowalczyk · on June 21, 2011

That's not true: http://hg.python.org/cpython/file/d1d5a7392e39/Modules/_sre....

enneff · on June 21, 2011

Most dynamic languages use a C Regular Expression library. That's why they can do so well on Regular Expression benchmarks.

V8 is a notable exception. They use their code generation pipeline to JIT compile regexps to machine code, making it the fastest regexp engine around bar none (I think).

azakai · on June 21, 2011

AFAIK all modern JS engines JIT regexps: V8, SpiderMonkey and Nitro.

pavpanchekha · on June 22, 2011

I'd like to put in a word for CL-PPCRE, the Common Lisp regex library. It does the same except using the CL optimizing compiler, so also achieves amazing speed.

Libraries don't appear in the Comp.Lang.Benchmark Shootout, so CL doesn't do so well.

silentbicycle · on June 22, 2011

What about LPEG? It has a superset of the functionality of most regexp engines, yet is able to handle many things linearly that they would handle in quadratic time.

silentbicycle · on June 22, 2011

Performance-wise, any regex-based implementation may spend a lot of time backtracking and re-parsing data that a DFA hand-coded parser would handle in linear time.

MostAwesomeDude · on June 22, 2011

I'd like to point out that there are libraries (PyPEG, PyParsing, and PyMeta) which take EBNF-style descriptions of grammars and produce decently fast parsers, PEG-style. PyMeta's my personal favorite, since it both takes a straight text description, and also uses OMeta instead of ENBF, turning parse and lex into a single operation.

Of course, the downside is that formal grammars are often a bad fit for custom languages of the web, which tend to have incomplete/undecideable/context-dependent grammars. This would work for Markdown, though, since it does have a parseable grammar.

keenerd · on June 22, 2011

I agree. (Though my favorite is FuncParserLib, very haskellish.)

Using a real parser could probably cut half the code. Been meaning to make my own Markdown parser, everyone else's kind of sucks :-)

tgrisfal · on June 21, 2011

Five; one to import the library and four for frivolous comments.

kkowalczyk · on June 21, 2011

Now that upskirt has been ported to Go as a library, it's also 5 lines.

BTW: if people want to do markdown in Go, the library to use is https://github.com/russross/blackfriday. It's a different one that the code I ported but Russ Ross did exactly the same thing at almost exactly the same time and was a little bit ahead, so after I discovered his work, I decided to contribute to his code instead of maintaining my own, almost identical, project.

doosra · on June 22, 2011

I was hoping to see a speed comparison of the two implementations.

homebru · on June 21, 2011

So, why didn't you write a parser to convert your C code to Go?

kkowalczyk · on June 21, 2011

(the author of the article here).

One, this was an opportunity to learn Go.

Two, writing a converter is way above my head and quite likely theoretically (and practically) impossible. I've definitely had to make some decisions that I don't think even the cleverest compiler could (like noticing that array.[c|h] and buffer.[c|h] could be replaced with native Go arrays/slices easily so I didn't have to port that at all, just change the callers to Go equivalents).

alttab · on June 22, 2011

I interpreted his comment as "it doesn't look like you learned Go."

If you are porting direct C logic to Go you could be missing a lot of potential idioms that could differentiate Go from C or python further. You said yourself it was easier to implement parsing with slices.

I think a more effective task is "understand what the 4k lines of C do, and then throw it away and re-write it as if it was written in Go in the first place." Your miles may vary.

nvictor · on June 21, 2011

I am impressed with the AppEngine version of our beloved (now dead) JoS forum. Great job.

kristianp · on June 22, 2011

I was curious, so I looked it up: http://blog.kowalczyk.info/software/fofou/manifesto.html

scott_s · on June 21, 2011

I think you mean compiler - a parser just outputs an AST. Anyway, while some of the transformations are probably mechanical, I imagine that the semantic analysis required to recognize the C implementations of a growable array would be more effort than just doing it by hand.

gte910h · on June 21, 2011

The point is to learn the language...I don't see writing an automatic converter in a non-go language a good way to learn go.

dm_mongodb · on June 22, 2011

Seems to me comparison to C++ might be more interesting. Wonder how that would be given there are 'growable arrays' and such. There's better (not perfect) safety given scoped_ptr, string classes, etc.