Yeah the raw parse speed comparison is almost a red herring at this point. The r...

magicalhippo · 2026-03-19T12:23:39 1773923019

> The real cost with JSON is when you have a 200MB manifest or build artifact and you need exactly two fields out of it.

There are SAX-like JSON libraries out there, and several of them work with a preallocated buffer or similar streaming interface, so you could stream the file and pick out the two fields as they come along.

IshKebab · 2026-03-19T13:22:41 1773926561

You still have to parse half the entire file on average. Much slower than formats that support skipping to the relevant information directly.

creationix · 2026-03-19T14:04:46 1773929086

yep, this is exactly the kind of use case that caused me to design this format.

xxs · 2026-03-19T09:10:02 1773911402

as parser: keep only indexes to the original file (input), dont copy strings or parse numbers at all (unless the strings fit in the index width, e.g. 32bit)

That would make parsing faster and there will be very little in terms on tree (json can't really contain full blow graphs) but it's rather complicated, and it will require hashing to allow navigation, though.

creationix · 2026-03-19T17:45:32 1773942332

yep. I built custom JSON parsers as a first solution. The problem is you can't get away from scanning at least half the document bytes on average.

With RX and other truly random-access formats you could even optimize to the point of not even fetching the whole document. You could grab chunks from a remote server using HTTP range requests and cache locally in fixed-width blocks.

With JSON you must start at the front and read byte-by-byte till you find all the data you're looking for. Smart parsers can help a lot to reduce heap allocations, but you can't skip the state machine scan.