Author of jumprope here. Great post! It’s cool seeing so many other good rope libraries ready to use.
It’s interesting seeing how poorly jumprope does on memory efficiency. I could definitely change the code to more aggressively join small text nodes together to bring the memory overhead more in line with the other algorithms. I’m just not sure how much anyone cares. Text is really small - a raspberry pi or phone has gigabytes of memory these days. If a 1mb text document takes up 5mb of ram - well, does that matter? I suppose if you’re opening a 1gb log file I do appreciate that some of the other approaches tested here have no overhead until you start making edits.
From a performance standpoint there’s one more trick jumprope has not mentioned here, though it could easily be applied to the other algorithms. And that is, in regular text editing sessions (and in replaying CRDT editing traces) we can buffer the most recent editing run before committing it to the data structure. So, if you start typing some contiguous characters, they get stored separately and only when you move the cursor or start deleting do we flush that change down. This improves performance of replaying editing traces by nearly 10x. But I’m not sure how applicable it is to the “regular” case of text editing. It is, however, super useful for replaying edits in a collaborative editor. It’s fast enough that my crdt doesn’t even bother storing the most recent document on disk - we just replay the entire editing history when a document is opened. Even for large documents, files can usually still be opened in about 1ms (including replaying the entire editing trace). This case doesn’t show up in all benchmarks because you have to opt in to it using the JumpropeBuf wrapper instead of Jumprope.
Thanks for all the work in bootstrapping this part of the ecosystem! I opened an issue[1] on the memory issue for jumprope. It seems to really come down to the large size of skiplist nodes relative to the text.
I did some testing with JumpRopeBuf, but ultimately did not include it because I was comparing things from an "interactive editor" perspective where edits are applied immediately instead of a collaborative/CRDT use case where edits are async. But it did perform very well as you said! I feel like JumpRopeBuf feels similar to a piece table, where edits are stored separately and then joined for reading.
Thanks for the issue. There’s nothing stopping you from using it in an interactive environment. What else does a text editor do that would make JumpropeBuf not appropriate in an interactive editor? Range queries going on for rendering? I’d be curious to know what requirements you feel aren’t being covered.
Mind you, at the speeds all these systems operate at, all of the rope libraries you benchmarked should be plenty fast enough not to be a bottleneck in interactive editing anyway. Maybe the nearby posters are right; and the biggest optimization issues remaining are those that only come up when handling 10gb+ documents.
> There’s nothing stopping you from using it in an interactive environment. What else does a text editor do that would make JumpropeBuf not appropriate in an interactive editor?
I wouldn't say it's "not appropriate" just that it is not applicable. JumpropeBuf is really design (as I see it) for handling large numbers edits to the same location. Which is exactly what we have in the tracing benchmarks. But in an "realworld" "interactive" environment usually after every edit there are follow up operation like rendering to the screen, sending change set to language servers or linters, running syntax highlighting etc. So you want those edits to applied immediately, not buffered.
I understand this is probably different in the CRDT case where you could have swarms of edits coming from other clients. In that case I can see buffering being useful. And as you pointed out, all of these data structures are more then fast enough at the scale of the tracing benchmarks.
That being said, it's interesting that piece tables are kind of a similar idea, expect they never actually apply the edits. They just keep them buffered and parse the buffer when they need to get the current state of the text. I wonder if JumpRopeBuf could become something similar. Basically a piece table the occasionally merges the edits to avoid fragmentation.
Re: a real editor, I guess my question is what queries are made to the rope data structure between edits? If those queries are trivial, we might be able to implement them without disturbing the buffer. (So, an order of magnitude extra performance). And if they’re nontrivial they’d dominate the time spent in the rope data structure for a typical text editor. So the benchmarks we’ve been writing aren’t actually testing the right things!
> I wonder if JumpRopeBuf could become something similar. Basically a piece table…
Maybe, but I doubt it would help.
The reason why JumpropeBuf improves performance over Jumprope is because it doesn’t need to traverse into the skip list structure to make adjacent changes in the most common case (when the new edit happened at the same location as the last change). Basically it’s a micro optimization for a common case. And it provides value because it’s so dead simple that it manages to be an order of magnitude faster than the skip list. I think if you added a full piece table in there, the overhead of piece table operations would be similar to the overhead of modifying the skip list itself. So at that point, the benefits from the JumpropeBuf approach would vanish. And I already have a pretty fast data structure for merging changes at arbitrary locations in the skip list & gap buffer at each node.
In short, I think adding a piece table in front would necessitate deoptimizing for the common case (edit at the last location) without adding real value over what the skip list is already doing.
By all means try it though. I’d love to be proven wrong!
I've reformatted 1GB files with a Vim macro in the past, weeks ago I processed 3-4GB JSON files, it's not that unusual nowadays if you work with a bit of data.
Though note that in many cases long lines are also important to consider - many editors struggle hard and become near unusable if the line on screen is e.g. 1MB long (which again isn't as uncommon as you'd think).
Dude. TFA is literally about the core data structures for a text editor. Many people care about large text files. It's something people actually do. BBEdit on the Mac can edit files much larger than RAM without thrashing. If you don't want to support text editors, that's fine, but then what is your rope library for?
Good question. I wrote it to support my work on highly efficient text CRDTs for collaborative editing. I wrote a blog post about it a few years ago, though this is out of date now since I’ve made so many new performance improvements:
Absolutely, although sometimes I have to resort to less(1) when everything else crashes and burns on them. The advanced troubleshooting technique known as “look” is as useful as ever.
It’s interesting seeing how poorly jumprope does on memory efficiency. I could definitely change the code to more aggressively join small text nodes together to bring the memory overhead more in line with the other algorithms. I’m just not sure how much anyone cares. Text is really small - a raspberry pi or phone has gigabytes of memory these days. If a 1mb text document takes up 5mb of ram - well, does that matter? I suppose if you’re opening a 1gb log file I do appreciate that some of the other approaches tested here have no overhead until you start making edits.
From a performance standpoint there’s one more trick jumprope has not mentioned here, though it could easily be applied to the other algorithms. And that is, in regular text editing sessions (and in replaying CRDT editing traces) we can buffer the most recent editing run before committing it to the data structure. So, if you start typing some contiguous characters, they get stored separately and only when you move the cursor or start deleting do we flush that change down. This improves performance of replaying editing traces by nearly 10x. But I’m not sure how applicable it is to the “regular” case of text editing. It is, however, super useful for replaying edits in a collaborative editor. It’s fast enough that my crdt doesn’t even bother storing the most recent document on disk - we just replay the entire editing history when a document is opened. Even for large documents, files can usually still be opened in about 1ms (including replaying the entire editing trace). This case doesn’t show up in all benchmarks because you have to opt in to it using the JumpropeBuf wrapper instead of Jumprope.