I was wondering when some of these types of libs might make their way into the world, I know I saw in Nicholas Zakas' book "High Performance JavaScript"[1] that he demonstrated how to read back (process) a large AJAX result in chunks, allowing you to begin working with the response before it was finished downloading. It was called "multipart XHR", and he shows a neat code example and then links to a site that has sample code[2].
The main difference between multipart XHR and Oboe, that I can see, is that MHXR requires you to format your data in a specific manner (using a magic delimiting character), though I'm curious if the base method is similar or not.
I haven't looked at MXHR but here's roughly what Oboe does:
1 Create XHR, listen to XHR2 progress event.
2 Use Clarinet.js SAX parser, scoop up all events.
3 From SAX events, build up actual JSON and maintain path from root to actual node.
4 Match that path (+ some other stuff) against registered JSONPath specs.
5 Fire callbacks if they pass.
Interesting. After looking at the MXHR more, it appears as though it was adapted from Digg.com[1] (aka DUI.Stream[2]), and then later adapted by Facebook[3].
Yours sounds more elegant in that it can handle JSON naturally, but I wonder if the other might be better suited for binary content (not sure in which context that would make sense, if any).
Either way, I find them all fascinating, and I've starred your project :) Will keep an eye on it.
I suppose you could make a binary equivalent if you needed to. You'd need to make some kind of binary matching language, maybe like Erlang's binary matching.
Adding XML/XPATH support would be a natural extension.
At the very bottom he describes the use case. Mobile apps optimize for battery life by preferring one big long request up front rather than lots of little ones as needed. But you only need the first 10% of the data to render the first screen of your app.
You might want to render the first screen of your app without any ajax call. That's the best thing to do. There is no reason why your app cant render with datas directly.
Here's one:
How about the fact in some cases waiting on the latency of a data lookup before rendering the page makes it feel much slower and more sluggish?
In the case mentioned above of map markers, if there's real latency involved in the lookup and you can render the page sans markers and then populate them a couple seconds later, isn't that a superior UI?
Yes, if you are building a simple, moderate or low traffic website/app, it's probably a better practice to render the page with the initial JSON needed. But a lot of the people here are working on products with millions of users -- or just tons of data -- and that changes the equation a bit.
This looks really cool! I'm confused about the usecase, however. In your foods/nonfoods example[0], it allows you to request a some JSON with 2 keys, `foods` and `"nonFoods"`, each with an array value, and use only foods, discarding nonFoods. You request this from oboe('/myapp/things.json')
My question: why not modify the backend to accept a request like `/myapp/foods.json` and let the backend compose the json you need & send only that? It seems like fixing it "in the wrong place" to build your frontend to accommodate getting the wrong/too much data. Is this a contrived example that isn't the core usecase? This is my assumption.
Is this primarily for 3rd party APIs and legacy codebases where it's impractical to change the response type & updating to e.g. sockets is impractical? Thanks for the cool project, I apologize for my ignorance wrt whatever points I'm missing!
I suppose the example is a little artificial. It isn't really for using some of the JSON response while ignoring the rest (well, you can use it for that but it isn't the main use).
I got the idea for this project working on data vis. Not all of the data was visible and we wanted to display the first bit of data quicker without waiting for all of it to arrive. We could have just sent the visible bit but it was good to have some data ready in the off-screen section for when the user scrolled.
Before that I worked on a service where we were aggregating 6 or 7 services into a single JSON. Some of the services were quicker than others but because the AJAX lib we were using waited for the whole response they all had to go at the speed of the slowest component.
We could have done multiple requests but it was more elegant to serve a whole page's json in one call. Also, we cached the slow services so they were only sometimes slow.
> well, you can use it for that but it isn't the main use
Actually, you could use it for that, as long as the library actually finishes downloading the file on abort. If you need contents from the file again, you just download it again and as it will already be cached, that operation should be extremely fast.
You might need "nonFoods" sometime later but not right now at the start of the app. But instead of making 2 http requests you just make one, but use the thing you actually need as soon as it's available.
If you scroll the README file to the bottom you'll find some good use cases that explain it better than I just did :)
Yeah! There are more understandable examples further down. suggestion @joombar: since the first example is the first thing people see and it doesn't actually reflect the core usecase, perhaps add a note explaining that the example is contrived for simplicity's sake ~or~ make the primary example represent the core usecase more closely. Cool project!
This makes a lot of sense. Latency is the #1 enemy on mobile, but bandwidth tends to be relatively okay. That's why streaming a video to your phone feels surprisingly fast, while everyday browsing feels sluggish. The obvious conclusion is to use fewer but larger requests, which is why Oboe is so attractive.
This is great for that 1 time out of 1,000,000 when you have an ajax call that would benefit from a tool like this. In the overwhelming majority usecase, this oboe.js thing is not going to be a "plug it in, automatically webscale" type of optimization. I'm not trying to rag on the authors of this project, but the wording of this submissions is going to lead noobs to mis understand the benefit. The authors should instead emphasize the usecase where an actual benefit comes out of using this library, instead of just saying "it makes your ajax faster!!"
It should make most calls faster. Exceptions are for small JSON files or on networks that are fast enough there is no streaming effect (the whole file arrives very quickly)
For most sites there'll be some users where it will make it faster (mobile, slow internet) and others that it'll be about the same. If the network is unreliable it should help as well because when the connection drops you don't lose what you already downloaded.
I'm using Node.js — with an Express.js router — to power an internal site. We have a few API endpoints which would benefit from this. Does anything need changing when sending the data?
I agree. If you're doing something slow/asynchronous like aggregating several http resources it is worth it to write out as early as you can but keep server-side the same if you can generate the whole JSON quickly.
One issue I found with writing early is handling error condition(s).
On the server side, you may run into an error after you started writing out your reply. That may cause an incomplete reply. It may require more involved error handling between the server and client. 'Gather first write later' approach gives you a simpler error propagation between server and client.
Java has had good stream parsing of JSON for a while now too. Last time I had to do it, I was surprised to find GSON, the library we were already using, had support for it. XML stream parsing vs. model parsing was a much bigger change.
There must be a break-even response size below which this is pointless, the server doesn't send responses byte by byte but in chunks the whole response could well be on the wire already so it would how zero impact on download footprint. The impact of TCP nagle makes this very hard to predict too so the chunk size will vary between server config and server workload at the time of the request.
Anyone that has developed an HTTP parser has experienced this. I feel like this is a problem to be solved in the server using JSON path or similar.
You could serialise a state history as it happens, allowing your users to modify the futere state in real time and push to the always unfolding history stream. Could be great for games.
If you don't care about old browsers you could use the same connection to keep the state updated as you used to download the original state.
I built this for more-or-less standard downloading, only quicker. But, yeah, if you set a server up to feed it you could use it for lots of creative things.
Hadn't seen that before. Interesting link, thanks.
Very similar except JSON/JSONPath instead of HTML/CSS. Oboe runs fine in Node but I want to make the code a bit more standards-y. Ie, using Node's EventEmitters instead of the little pubsub I made for the browser.
Random tangent: no need to make your own pubsub by the way. I figure you might be interested in component[1]. Makes writing libraries like oboe even easier because all of the little stuff is packaged up for you already, like emitter[2]. And then tons of the little helpers in oboe could easily be their own components and useable by others. Check out the full list[3].
It would be really nice to have the same streaming interface (with EventEmitters) as Node, but you I can just shim that on top of Oboe I guess. It would be great to have the same patter as in Node.
Btw what are you using for the Node side, JSONStream?
There's only a node side so far as I needed to write for some component tests. It'd work with anything that writes out valid JSON.
The client side works in Node right now as well as in the browser but it is a bit browser-y. It is on my home office Agile board to make it a bit more node-y.
This is really important for perceived load time[0]. Here's an article that illustrates how this can work in practice: http://dry.ly/full-streams-ahead
I'll have to test to be sure you still get streaming. It depends how the browsers handle xhr2 events with regard to gzip'd http. I /think/ it'll be fine but I need to check to be sure. Eg, with gzip on you still get progressive html rendering.
Gzip is a streaming format -- it's designed as a compressed format for communication streams. Browsers have no trouble with this. The Apache behavior you describe is probably related to buffering settings, which I think can be configured.
Very cool, but there's one thing I don't understand.
Do you need to stream JSON objects from a server to make this work? You have to get a response from the server in some kind of streaming protocol right?
Or...
Is this just reading the part of the JSON string that it currently has?
I'm inexperienced with streaming, so this might be obvious to some.
EDIT:
Ah, I get it. I was wrong to say it was "streaming". Looks like my second suggestion was the correct one.
If you've spent too much time in jQuery you might not realize that streaming is supported by the underlying XMLHttpRequest object. The readyState property can be set to LOADING or DONE. Most applications will wait for DONE so they have a complete document to work with. A "normal" payload will be small enough that the lag between LOADING and DONE is tiny, but a big download can definitely be parsed as it loads. Imagine an HTML doc with inline JavaScript. That JavsScript is evaluated as soon as it's encountered and doesn't wait for a page load that may never happen.
It will work with any JSON but It'll work faster if you write the JSON out a bit at a time.
Consider if you are writing data from a db: you can either collect all the rows together then write them out, or you can write them out one at a time as you get them. If you write one at a time you are in a sense streaming even if you're not using a streaming protocol.
On a slow network reading any http is like a stream because you can use the first bit to arrive without waiting for the rest. The faster the network the less the benefit but it shouldn't end up worse.
This could be big for making dynamic web maps faster. Often we request a large array of geometries to display on a map and can only display them all at once after ajax is done. If we could display each geometry as they are loaded it would be a big improvement to perceived performance. I imagine this is the case for other kinds of data vis as well. Off to test!
So if you're streaming in the JSON this program must have a custom JSON parser because there would be no way to assure valid JSON on an incomplete payload. Am I understanding how this works correctly?
edit: Also how is this faster, if you're parsing an incomplete response over and over again isn't each parse blocking, wouldn't this approach kill your FPS?
Well, you could have a JSON which was valid at the start and invalid at the end. It'd parse the first bit ok and only throw an error when it got to the invalid bit.
Nothing gets parsed more than once. SAX parsers already parse streams, they're just not used very much because they're a pain to program with.
Yes, it uses SAX which seems to fire events as nodes get parsed. I don't think that the response, or any node of the JSON string of the response, gets parsed more than once.
There's a cool thing you might try where you can download the historic messages, then continue to stream live ones over the same http. If you don't care about old browsers (anything without xhr2 progress events) should work fine to download and stream on same connection.
The main difference between multipart XHR and Oboe, that I can see, is that MHXR requires you to format your data in a specific manner (using a magic delimiting character), though I'm curious if the base method is similar or not.
1. http://shop.oreilly.com/product/9780596802806.do
2. http://techfoolery.com/mxhr/