Yeah the raw parse speed comparison is almost a red herring at this point. The real cost with JSON is when you have a 200MB manifest or build artifact and you need exactly two fields out of it. You're still loading the whole thing into memory, building the full object graph, and GC gets to clean all of it up after. That's the part where something like RX with selective access actually matters. Parse speed benchmarks don't capture that at all.
> The real cost with JSON is when you have a 200MB manifest or build artifact and you need exactly two fields out of it.
There are SAX-like JSON libraries out there, and several of them work with a preallocated buffer or similar streaming interface, so you could stream the file and pick out the two fields as they come along.
as parser: keep only indexes to the original file (input), dont copy strings or parse numbers at all (unless the strings fit in the index width, e.g. 32bit)
That would make parsing faster and there will be very little in terms on tree (json can't really contain full blow graphs) but it's rather complicated, and it will require hashing to allow navigation, though.
yep. I built custom JSON parsers as a first solution. The problem is you can't get away from scanning at least half the document bytes on average.
With RX and other truly random-access formats you could even optimize to the point of not even fetching the whole document. You could grab chunks from a remote server using HTTP range requests and cache locally in fixed-width blocks.
With JSON you must start at the front and read byte-by-byte till you find all the data you're looking for. Smart parsers can help a lot to reduce heap allocations, but you can't skip the state machine scan.