Show HN: Parson, lighweight json parser in C

eps · on Oct 28, 2012

parson.c, 109 - wrong conditional

Well-written, clean and consistent. A bit too pendantic and therefore too verbose for my taste though :) Why in KnR's name this -

  JSON_Object *new_obj = (JSON_Object*)parson_malloc(sizeof(JSON_Object));

is not written like this -

  JSON_Object *new_obj = parson_malloc(sizeof *newobj);

On a more serious note - I would move json_*_t structs to the header file and eliminate a bunch of API functions that service the access to these structs. It's a C library after all, everyone knows how to shoot one's own leg.

(edit) Oh, and you don't handle realloc failures properly (at least in json_object_add).

pflanze · on Oct 28, 2012

> I would move json_* _t structs to the header file and eliminate a bunch of API functions that service the access to these structs. It's a C library after all, everyone knows how to shoot one's own leg.

From what I know I would suggest to only do that if the code is never to be used as a shared library, as otherwise the structs can't be changed without recompiling all programs using the library.

There was a Debian security update to libxml2 in 2008 that made various Gnome apps including, IIRC, the window manager crash for many users in various circumstances (depending on the used theme etc.)(* ), because it changed some struct. Public API functions were provided in that case, but apparently direct access to structs was not discouraged enough to prevent some programs from going the unsafe route anyway (or perhaps the API wasn't sufficiently complete? I don't know the details).

I haven't created any wide spread C programs, and don't know if there are ABI checkers that Debian should use / have used, or if it's really just restraint to procedure based APIs that can avoid such problems.

(* ) e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496125 http://lists.debian.org/debian-user/2008/08/msg01864.html

eps · on Oct 28, 2012

Right. This would apply if parson were built into a shared library and distributed as an .so. It is a possibility, though Debian's package dependency system exists exactly to safeguard against this sort of a situation.

kgabis · on Oct 28, 2012

Thanks for pointing out those errors, I already fixed the one with the wrong conditional and will fix the realloc one in the next commit. I'm being explicit with casting, because g++ and vc++ will throw warnings without it.

eps · on Oct 28, 2012

Of course you'd get warnings compiling C code with C++ compiler. C doesn't require casting from void* to another pointer type... it's one of its best features :)

malkia · on Oct 28, 2012

Certain people would still like to compile this as a "CPP" file, examples are lua - it's in "C", but it's compilable as C++ too (some people then are more or less strict whether the function interfaces should be extern"C" or not).

Ayway, not that important... But there is a whole lot of people that like amalgated sources (typical examples - juce, freetype2, lua, sqlite) where you include in one or two files all the rest and they compile fine.

cmccabe · on Oct 28, 2012

Luckily for you, you can link objects files produced by the C compiler with object files produced by the C++ compiler. Crisis averted.

cmccabe · on Oct 28, 2012

"I would move json_*_t structs to the header file and eliminate a bunch of API functions that service the access to these structs. It's a C library after all, everyone knows how to shoot one's own leg."

I strongly disagree. The author did exactly the right thing by hiding the internal data representation behind accessor functions. This makes it easy for him to change the internal data representation in the future without breaking binary compatibility.

See the libabc REAME for a fuller explanation: https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git;a=...

Summary: "Do not expose any complex structures in your API"

Unrelated: please don't be one of those people who typedef every struct. It's really ugly and obfuscatory (see Linus' remarks on this).

d0mine · on Oct 28, 2012

Looking at the source It seems it doesn't handle surrogate pairs e.g.,

  ["\ud83d\ude04"]

kgabis · on Oct 28, 2012

You're right, I'll try to add support for them in the future.

__david__ · on Oct 28, 2012

Thanks, I've been looking for something like this for a project I have (https://github.com/caldwell/daemon-manager). I did a survey of all the JSON libs for C and C++ about a year ago and found reasons to dislike them all. This actually looks like a reasonable interface, the code itself looks good so far, and I love the fact that it is just a couple files I can drop in.

udp · on Oct 28, 2012

Just out of curiosity, was json-parser one of the ones you looked at? If so, what did you dislike about it?

__david__ · on Oct 28, 2012

No, I haven't seen that one (looks like you wrote it?). I was looking for them at the tail end of 2011 and it looks like json-parser came into existence in march, 2012.

json-parser, at first glance, appears to be nice, too. The ones I looked at before were all multiple files and seemingly very complicated--either to use or to build/integrate with my code. I'll keep a pointer to this one too for when I get back on that project.

IgorPartola · on Oct 28, 2012

I've been using http://cjson.sourceforge.net/ for a while. Single file, reasonable interface. It lets you both serialize and de-serialize JSON. Was this a library you looked at when you did your survey?

isbadawi · on Oct 28, 2012

How does this compare to jansson (http://www.digip.org/jansson/), aside from just being a parser?

kgabis · on Oct 28, 2012

Parson may be easier to embed in some projects and it has different/smaller API.

thomaslee · on Oct 28, 2012

+1 for jansson. It rocks.

DrCatbox · on Oct 28, 2012

Whats the thing with saying "Only 2 files" what if it is only 1 file, like bottle.py, but the 1 or 2 files contain 10 000 lines of code? Im just saying.

pi18n · on Oct 28, 2012

Yeah, seriously. I just did some work with the s7 implementation of Scheme. "Only two files!"

I think in all fairness, I think the idea is that you should just drop it in your source without needing to worry about git subrepos (or whatever). But it would be perhaps more elegant to split it up in the repo and have a script stitch it together into the "only two files".

Also s7 turned out to be much easier to include than the other ones I was considering, so I guess I shouldn't be mocking it.

mhd · on Oct 28, 2012

Well, as in the case with bottle, it's probably meant to imply that you can just use it in your project as files, not as a complete library. No need to complicate your Makefiles (or worry about your package framework).

udp · on Oct 28, 2012

Well, you beat json-parser on SLOC. :-)

Main differences are that you use recursion instead of a stack, and that you dynamically resize your buffers as you parse (whereas json-parser passes through the JSON twice so that it knows exactly how much memory to allocate).

bengl3rt · on Oct 28, 2012

Hmm, looks complex compared to this one that I've been using called JSMN:

http://zserge.bitbucket.org/jsmn.html

udp · on Oct 28, 2012

Have you really been using jsmn for something or did you just recall the link?

IMHO jsmn doesn't do nearly enough with the JSON to make it usable by an application (it's basically a lexer).

hendi_ · on Oct 28, 2012

jsmn is great if you use it as a lexer and you know that your input is well-formed. We use it for our games to read in our configuration, gamemodes and level files. We've just built a few wrapper functions around jsmn and called it "jsmx", and it suits our needs perfectly :-)

But you're right, if you want a json library to parse arbitrary json input using a more complex library like this one is the easier and more viable route than using jsmn.

kgabis · on Oct 28, 2012

jsmn is more of a lexer and it doesn't support unicode.

zokier · on Oct 28, 2012

That style of C programming makes me immediately think how it would be neater in C++. Eg the frontpage example: https://gist.github.com/3969351

Key differences:

* actual namespaces, no need to prefix everything with "JSON_"

* RAII-style resource management, no need for "json_value_free" etc

* objects, so instead of get_array(root_value) you have root_value.get_array()

huhtenberg · on Oct 28, 2012

> json::Array commits = root_value.get_array();

You probably meant:

  json::Array & commits = root_value.get_array();

Don't really want to copy an array there, right? After that you'll probably be tempted to use std::vector as a basis for json::array. Then you'll realize what an abysmal PoS default vector allocator is, and so you'd want to allow customizing it. Hello templates, and then before you know it, the whole thing not only looks like one of them incomprehensible boost beauties, but also compiles into a meg of code with a dozen dependencies. Neat :)

zokier · on Oct 28, 2012

> After that you'll probably be tempted to use std::vector as a basis for json::array. Then you'll realize what an abysmal PoS default vector allocator is, and so you'd want to allow customizing it. Hello templates, and then before you know it, the whole thing not only looks like one of them incomprehensible boost beauties, but also compiles into a meg of code with a dozen dependencies. Neat :)

Writing good code means having discipline to resist all sorts of temptations, such as the one you describe. And not all slippery slopes are actually all that slippery.

The one good thing about C++ is that you can pick the features you want and mostly ignore the rest. The reason C++ is so humungous is that everyone is using different subset of it.

huhtenberg · on Oct 28, 2012

> The reason C++ is so humungous is that everyone is using different subset of it.

I completely agree with this. Incidentally, this is exactly the reason why C++ libraries are far less useful than the C ones.

malkia · on Oct 28, 2012

RAII would break you if longjmp/setjmp are used. For example if you had the code in C++, but with "C" externs (like zeromq for example), then using RAII might bite someone unexpected where longjmp/setjmp would not unwind and call the destructors.

halayli · on Oct 28, 2012

you are calling realloc and assigning the value directly before checking the return value. realloc can fail and return NULL and you'll overwrite the previous pointer which will cause a memory leak.

I don't see why you duplicated strndup in parson_strndup.

https://github.com/kgabis/parson/blob/master/parson.c#L129

parson_strdup can return NULL and this statement will fail silently.

thomaslee · on Oct 28, 2012

strndup is part of the "standard", but not present on some platforms unfortunately. :( I can't think of a modern C runtime that doesn't offer strdup, though.

The realloc check is good advice -- take heed, OP. :)

kgabis · on Oct 28, 2012

Realloc, strdup and strndup errors are handled as of now :) (https://github.com/kgabis/parson/commit/ee9be98974b3fdac0be1...) And yes, I've written my own strndup because it's "relatively new".

halayli · on Oct 28, 2012

True, strndup is relatively new (~7 years) to POSIX standard.

cmccabe · on Oct 28, 2012

I'm aware of a few different JSON parsing libraries for C.

* Metaparadigm's "JSON-C" library (see https://github.com/json-c/json-c/wiki)

* Jansson (see http://www.digip.org/jansson/)

* yajl (see http://lloyd.github.com/yajl/)

All of those are licensed extremely permissively (BSD, ISC, etc) yajl seems kind of SAX-like, which I'm not really a big fan of, but I guess some people might like it. It would be good if you had a humungous JSON stream, for the same reasons SAX beats DOM in those cases.

json-c and Jansson seem pretty similar to me. Jansson seems to be more full-featured than json-c, with stuff like a json_equal method, a json_deep_copy method, and the ability to specify custom memory allocation functions. But for most use cases there isn't too much difference between the two.

So I guess my question is really what is the use case you're targeting? It would be nice to have some context about what you see as the niche you're filling.