Oboe.js: reacting to Ajax/Rest quicker by not waiting for it to finish

k3n · on Oct 18, 2013

I was wondering when some of these types of libs might make their way into the world, I know I saw in Nicholas Zakas' book "High Performance JavaScript"[1] that he demonstrated how to read back (process) a large AJAX result in chunks, allowing you to begin working with the response before it was finished downloading. It was called "multipart XHR", and he shows a neat code example and then links to a site that has sample code[2].

The main difference between multipart XHR and Oboe, that I can see, is that MHXR requires you to format your data in a specific manner (using a magic delimiting character), though I'm curious if the base method is similar or not.

1. http://shop.oreilly.com/product/9780596802806.do

2. http://techfoolery.com/mxhr/

joombar · on Oct 18, 2013

I haven't looked at MXHR but here's roughly what Oboe does:

1 Create XHR, listen to XHR2 progress event. 2 Use Clarinet.js SAX parser, scoop up all events. 3 From SAX events, build up actual JSON and maintain path from root to actual node. 4 Match that path (+ some other stuff) against registered JSONPath specs. 5 Fire callbacks if they pass.

k3n · on Oct 18, 2013

Interesting. After looking at the MXHR more, it appears as though it was adapted from Digg.com[1] (aka DUI.Stream[2]), and then later adapted by Facebook[3].

Yours sounds more elegant in that it can handle JSON naturally, but I wonder if the other might be better suited for binary content (not sure in which context that would make sense, if any).

Either way, I find them all fascinating, and I've starred your project :) Will keep an eye on it.

1. http://techfoolery.com/mxhr/mxhr.js

2. https://github.com/digg/stream

3. https://www.facebook.com/note.php?note_id=183263890703

joombar · on Oct 18, 2013

I suppose you could make a binary equivalent if you needed to. You'd need to make some kind of binary matching language, maybe like Erlang's binary matching.

Adding XML/XPATH support would be a natural extension.

stuaxo · on Oct 18, 2013

Awesome, I've been waiting for a library exactly like this :)

timmaah · on Oct 18, 2013

Perfect..

Took me about 10 minutes to add to an existing rails app that adds hundreds of markers onto a google map via json. Live updating as you scroll around.

Previously I was trying to find the sweet spot between how many slows down the initial rendering to the viewer vs showing all the markers.

Changed about 3 lines of javascript to use Oboe and changed my rails controller to:

      per_page = 100
      1.upto(10).each do |page|
        response.stream.write ActiveModel::ArraySerializer.new( resources.paginate(page: page, per_page: per_page) , {root: 'cg'}).to_json
      end
      response.stream.close

rurounijones · on Oct 19, 2013

What is the resulting speedup to the first marker?

dustingetz · on Oct 18, 2013

At the very bottom he describes the use case. Mobile apps optimize for battery life by preferring one big long request up front rather than lots of little ones as needed. But you only need the first 10% of the data to render the first screen of your app.

camus2 · on Oct 18, 2013

You might want to render the first screen of your app without any ajax call. That's the best thing to do. There is no reason why your app cant render with datas directly.

encoderer · on Oct 19, 2013

No reason? You, sir, are lacking imagination.

Here's one: How about the fact in some cases waiting on the latency of a data lookup before rendering the page makes it feel much slower and more sluggish?

In the case mentioned above of map markers, if there's real latency involved in the lookup and you can render the page sans markers and then populate them a couple seconds later, isn't that a superior UI?

Yes, if you are building a simple, moderate or low traffic website/app, it's probably a better practice to render the page with the initial JSON needed. But a lot of the people here are working on products with millions of users -- or just tons of data -- and that changes the equation a bit.

What do you think?

currymesurprise · on Oct 19, 2013

I think you misunderstood your parent's comment.

While the phrasing of the final sentence was awkward, the post was arguing for exactly the same thing as you.

sequoia · on Oct 18, 2013

This looks really cool! I'm confused about the usecase, however. In your foods/nonfoods example[0], it allows you to request a some JSON with 2 keys, `foods` and `"nonFoods"`, each with an array value, and use only foods, discarding nonFoods. You request this from oboe('/myapp/things.json')

My question: why not modify the backend to accept a request like `/myapp/foods.json` and let the backend compose the json you need & send only that? It seems like fixing it "in the wrong place" to build your frontend to accommodate getting the wrong/too much data. Is this a contrived example that isn't the core usecase? This is my assumption.

Is this primarily for 3rd party APIs and legacy codebases where it's impractical to change the response type & updating to e.g. sockets is impractical? Thanks for the cool project, I apologize for my ignorance wrt whatever points I'm missing!

[0] https://github.com/jimhigson/oboe.js#using-objects-from-the-...

joombar · on Oct 18, 2013

I suppose the example is a little artificial. It isn't really for using some of the JSON response while ignoring the rest (well, you can use it for that but it isn't the main use).

I got the idea for this project working on data vis. Not all of the data was visible and we wanted to display the first bit of data quicker without waiting for all of it to arrive. We could have just sent the visible bit but it was good to have some data ready in the off-screen section for when the user scrolled.

Before that I worked on a service where we were aggregating 6 or 7 services into a single JSON. Some of the services were quicker than others but because the AJAX lib we were using waited for the whole response they all had to go at the speed of the slowest component.

We could have done multiple requests but it was more elegant to serve a whole page's json in one call. Also, we cached the slow services so they were only sometimes slow.

estebank · on Oct 18, 2013

> well, you can use it for that but it isn't the main use

Actually, you could use it for that, as long as the library actually finishes downloading the file on abort. If you need contents from the file again, you just download it again and as it will already be cached, that operation should be extremely fast.

CWIZO · on Oct 18, 2013

You might need "nonFoods" sometime later but not right now at the start of the app. But instead of making 2 http requests you just make one, but use the thing you actually need as soon as it's available.

If you scroll the README file to the bottom you'll find some good use cases that explain it better than I just did :)

sequoia · on Oct 18, 2013

Yeah! There are more understandable examples further down. suggestion @joombar: since the first example is the first thing people see and it doesn't actually reflect the core usecase, perhaps add a note explaining that the example is contrived for simplicity's sake ~or~ make the primary example represent the core usecase more closely. Cool project!

joombar · on Oct 18, 2013

Hmmmm, yeah. Use cases doesn't take up much space, I'll put that before examples because it explains better what it is for.

sequoia · on Oct 18, 2013

OH yeah!! Those user-story style use cases really help clarify what it's for in a clear succinct way. Thank you!! (I wish all projects would do this)

samspot · on Oct 18, 2013

I am often consuming API's that I have no control over, and this is great for that use case.

joshfraser · on Oct 18, 2013

This makes a lot of sense. Latency is the #1 enemy on mobile, but bandwidth tends to be relatively okay. That's why streaming a video to your phone feels surprisingly fast, while everyday browsing feels sluggish. The obvious conclusion is to use fewer but larger requests, which is why Oboe is so attractive.

freework · on Oct 18, 2013

This is great for that 1 time out of 1,000,000 when you have an ajax call that would benefit from a tool like this. In the overwhelming majority usecase, this oboe.js thing is not going to be a "plug it in, automatically webscale" type of optimization. I'm not trying to rag on the authors of this project, but the wording of this submissions is going to lead noobs to mis understand the benefit. The authors should instead emphasize the usecase where an actual benefit comes out of using this library, instead of just saying "it makes your ajax faster!!"

joombar · on Oct 18, 2013

It should make most calls faster. Exceptions are for small JSON files or on networks that are fast enough there is no streaming effect (the whole file arrives very quickly)

For most sites there'll be some users where it will make it faster (mobile, slow internet) and others that it'll be about the same. If the network is unreliable it should help as well because when the connection drops you don't lose what you already downloaded.

"makers" = just me :-)

joombar · on Oct 19, 2013

Just merged in support for reading any stream in Node:

https://github.com/jimhigson/oboe.js#reading-from-any-stream...

  oboe( fs.createReadStream( '/home/me/secretPlans.json' ) )
   .node('!.schemes.*', function(scheme){
      console.log('Aha! ' + scheme);
   });
   .node('!.plottings.*', function(deviousPlot){
      console.log('Hmmm! ' + deviousPlot);   
   })
   .done(function(){
      console.log("*twiddles mustache*");
   });

jbrooksuk · on Oct 18, 2013

Does this require any changes server-side?

I'm using Node.js — with an Express.js router — to power an internal site. We have a few API endpoints which would benefit from this. Does anything need changing when sending the data?

joombar · on Oct 18, 2013

No changes required. It should accept any JSON resource.

Having said that, there's a much bigger improvement if you're progressively writing out the JSON rather than doing it in one big lump.

jbrooksuk · on Oct 18, 2013

Awesome! I'll investigate how easily we can stream our JSON out.

justincormack · on Oct 18, 2013

Its probably not worth changing it if it is eg in a file already, but if the process to generate it is a slow one it might be.

joombar · on Oct 18, 2013

I agree. If you're doing something slow/asynchronous like aggregating several http resources it is worth it to write out as early as you can but keep server-side the same if you can generate the whole JSON quickly.

reddit_clone · on Oct 18, 2013

One issue I found with writing early is handling error condition(s).

On the server side, you may run into an error after you started writing out your reply. That may cause an incomplete reply. It may require more involved error handling between the server and client. 'Gather first write later' approach gives you a simpler error propagation between server and client.

jbrooksuk · on Oct 18, 2013

Nope, just parsing some results back from MySQL.

jongleberry · on Oct 19, 2013

here's a streams2 library for streaming to the client as an array: https://github.com/stream-utils/streaming-json-stringify

buro9 · on Oct 18, 2013

Is it intentional that it accepts invalid JSON?

I ask only because most of the examples on that page would not be valid as the name part of the key:value pairs needs to be quoted.

PS: Don't upvote this, it's a minor detail and the big point about the use-case is far more important than this comment.

joombar · on Oct 18, 2013

Should be fixed now. Thanks for pointing out.

There's a test explicitly for not allowing this:

https://github.com/jimhigson/oboe.js/blob/master/test/specs/...

joombar · on Oct 18, 2013

Parsing is built on top of Clarinet (see where the name comes from?)

https://github.com/dscape/clarinet

Unquoted JSON in docs is a mistake. I'll take a look now.

Omni5cience · on Oct 19, 2013

One could be forgiven for thinking it comes from SAX, which is probably where Clarinet comes from.

lnanek2 · on Oct 18, 2013

Java has had good stream parsing of JSON for a while now too. Last time I had to do it, I was surprised to find GSON, the library we were already using, had support for it. XML stream parsing vs. model parsing was a much bigger change.

lttlrck · on Oct 19, 2013

There must be a break-even response size below which this is pointless, the server doesn't send responses byte by byte but in chunks the whole response could well be on the wire already so it would how zero impact on download footprint. The impact of TCP nagle makes this very hard to predict too so the chunk size will vary between server config and server workload at the time of the request. Anyone that has developed an HTTP parser has experienced this. I feel like this is a problem to be solved in the server using JSON path or similar.

tlarkworthy · on Oct 18, 2013

You could serialise a state history as it happens, allowing your users to modify the futere state in real time and push to the always unfolding history stream. Could be great for games.

joombar · on Oct 18, 2013

If you don't care about old browsers you could use the same connection to keep the state updated as you used to download the original state.

I built this for more-or-less standard downloading, only quicker. But, yeah, if you set a server up to feed it you could use it for lots of creative things.

alessioalex · on Oct 18, 2013

This is basically node-trumpet in the browser. Really great stuff!

https://github.com/substack/node-trumpet

joombar · on Oct 18, 2013

Hadn't seen that before. Interesting link, thanks.

Very similar except JSON/JSONPath instead of HTML/CSS. Oboe runs fine in Node but I want to make the code a bit more standards-y. Ie, using Node's EventEmitters instead of the little pubsub I made for the browser.

ianstormtaylor · on Oct 18, 2013

Random tangent: no need to make your own pubsub by the way. I figure you might be interested in component[1]. Makes writing libraries like oboe even easier because all of the little stuff is packaged up for you already, like emitter[2]. And then tons of the little helpers in oboe could easily be their own components and useable by others. Check out the full list[3].

[1]: https://github.com/component/component

[2]: https://github.com/component/emitter

[3]: https://github.com/component/component/wiki/Components

alessioalex · on Oct 18, 2013

It would be really nice to have the same streaming interface (with EventEmitters) as Node, but you I can just shim that on top of Oboe I guess. It would be great to have the same patter as in Node.

Btw what are you using for the Node side, JSONStream?

https://github.com/dominictarr/JSONStream

joombar · on Oct 18, 2013

There's only a node side so far as I needed to write for some component tests. It'd work with anything that writes out valid JSON.

The client side works in Node right now as well as in the browser but it is a bit browser-y. It is on my home office Agile board to make it a bit more node-y.

foxbarrington · on Oct 18, 2013

This is really important for perceived load time[0]. Here's an article that illustrates how this can work in practice: http://dry.ly/full-streams-ahead

[0]https://sites.google.com/a/webpagetest.org/docs/using-webpag...

tambourine_man · on Oct 18, 2013

This probably won't work if you have gzip on, right?

joombar · on Oct 18, 2013

I'll have to test to be sure you still get streaming. It depends how the browsers handle xhr2 events with regard to gzip'd http. I /think/ it'll be fine but I need to check to be sure. Eg, with gzip on you still get progressive html rendering.

tambourine_man · on Oct 18, 2013

From what I remember, Apache will send you a single chunk if you have it on, for example:

  <? 
  echo 'Hey<br>';
  flush();
  ob_flush();
  sleep(20);
  echo 'Bye';
  ?>

Will send you "Hey<br>Bye" after 20sec instead of what you would expect (which does work with gzip off).

Besides, even if you manage to stream it, I imagine inflating partially received content is not trivial for the browser.

mbrock · on Oct 18, 2013

Gzip is a streaming format -- it's designed as a compressed format for communication streams. Browsers have no trouble with this. The Apache behavior you describe is probably related to buffering settings, which I think can be configured.

joombar · on Oct 18, 2013

Quick experiment says: Gzip can be written out as a stream ok. Can't comment on Apache but Node does it fine.

Firefox's xhr fires progress events for gzipped content but not Chrome's. Looking to see if I can find a way round it.

joombar · on Oct 18, 2013

In meantime, made a bug report: https://code.google.com/p/chromium/issues/detail?id=309092

jongleberry · on Oct 19, 2013

in my experience, you need to set flush to SYNC for it to work on chrome without manually flushing

joombar · on Oct 19, 2013

Do you know how to do that from node?

Here's the little test service I wrote to stream out some gzipped content:

https://github.com/jimhigson/oboe.js/blob/master/test/stream...

jongleberry · on Oct 19, 2013

You need to construct the stream with flush: require('zlib').Z_SYNC_FLUSH

gagege · on Oct 18, 2013

Very cool, but there's one thing I don't understand.

Do you need to stream JSON objects from a server to make this work? You have to get a response from the server in some kind of streaming protocol right?

Or...

Is this just reading the part of the JSON string that it currently has?

I'm inexperienced with streaming, so this might be obvious to some.

EDIT:

Ah, I get it. I was wrong to say it was "streaming". Looks like my second suggestion was the correct one.

tootie · on Oct 18, 2013

If you've spent too much time in jQuery you might not realize that streaming is supported by the underlying XMLHttpRequest object. The readyState property can be set to LOADING or DONE. Most applications will wait for DONE so they have a complete document to work with. A "normal" payload will be small enough that the lag between LOADING and DONE is tiny, but a big download can definitely be parsed as it loads. Imagine an HTML doc with inline JavaScript. That JavsScript is evaluated as soon as it's encountered and doesn't wait for a page load that may never happen.

joombar · on Oct 18, 2013

It will work with any JSON but It'll work faster if you write the JSON out a bit at a time.

Consider if you are writing data from a db: you can either collect all the rows together then write them out, or you can write them out one at a time as you get them. If you write one at a time you are in a sense streaming even if you're not using a streaming protocol.

On a slow network reading any http is like a stream because you can use the first bit to arrive without waiting for the rest. The faster the network the less the benefit but it shouldn't end up worse.

fro · on Oct 18, 2013

This could be big for making dynamic web maps faster. Often we request a large array of geometries to display on a map and can only display them all at once after ajax is done. If we could display each geometry as they are loaded it would be a big improvement to perceived performance. I imagine this is the case for other kinds of data vis as well. Off to test!

jastanton · on Oct 18, 2013

So if you're streaming in the JSON this program must have a custom JSON parser because there would be no way to assure valid JSON on an incomplete payload. Am I understanding how this works correctly?

edit: Also how is this faster, if you're parsing an incomplete response over and over again isn't each parse blocking, wouldn't this approach kill your FPS?

joombar · on Oct 19, 2013

Well, you could have a JSON which was valid at the start and invalid at the end. It'd parse the first bit ok and only throw an error when it got to the invalid bit.

Nothing gets parsed more than once. SAX parsers already parse streams, they're just not used very much because they're a pain to program with.

the_gipsy · on Oct 18, 2013

Yes, it uses SAX which seems to fire events as nodes get parsed. I don't think that the response, or any node of the JSON string of the response, gets parsed more than once.

Kiro · on Oct 18, 2013

Is this sax? http://www.saxproject.org/ Not sure I understand what it is and what role it has in oboe.js.

joombar · on Oct 18, 2013

it is built on top of a sax parser.

zamalek · on Oct 18, 2013

This seems like it could be used to create a protocol similar to XMPP - which I always enjoyed from the perspective of elegance.

I might have a pet project this weekend, thanks for sharing.

joombar · on Oct 18, 2013

No worries. This is my masters dissertation btw.

alessioalex · on Oct 18, 2013

What exactly is the title of your dissertation? (just curious)

joombar · on Oct 18, 2013

Liable to change but "An approach to i/o for rest clients which is neither batch nor stream; nor SAX nor DOM." I'm writing it now.

joombar · on Oct 18, 2013

There's a cool thing you might try where you can download the historic messages, then continue to stream live ones over the same http. If you don't care about old browsers (anything without xhr2 progress events) should work fine to download and stream on same connection.

lucidrains · on Oct 18, 2013

Excellent! Plopped it onto my site where I was loading a big json package and it works beautifully. Thanks so much :)

jastanton · on Oct 18, 2013

Are you using a webworker to process your JSON stream?

joombar · on Oct 19, 2013

No, although that's a nice idea and something I'd like to look into.