Author of jumprope here. Great post! It’s cool seeing so many other good rope li...

celeritascelery · on Oct 9, 2023

Thanks for all the work in bootstrapping this part of the ecosystem! I opened an issue[1] on the memory issue for jumprope. It seems to really come down to the large size of skiplist nodes relative to the text.

I did some testing with JumpRopeBuf, but ultimately did not include it because I was comparing things from an "interactive editor" perspective where edits are applied immediately instead of a collaborative/CRDT use case where edits are async. But it did perform very well as you said! I feel like JumpRopeBuf feels similar to a piece table, where edits are stored separately and then joined for reading.

[1] https://github.com/josephg/jumprope-rs/issues/5

josephg · on Oct 9, 2023

Thanks for the issue. There’s nothing stopping you from using it in an interactive environment. What else does a text editor do that would make JumpropeBuf not appropriate in an interactive editor? Range queries going on for rendering? I’d be curious to know what requirements you feel aren’t being covered.

Mind you, at the speeds all these systems operate at, all of the rope libraries you benchmarked should be plenty fast enough not to be a bottleneck in interactive editing anyway. Maybe the nearby posters are right; and the biggest optimization issues remaining are those that only come up when handling 10gb+ documents.

celeritascelery · on Oct 10, 2023

> There’s nothing stopping you from using it in an interactive environment. What else does a text editor do that would make JumpropeBuf not appropriate in an interactive editor?

I wouldn't say it's "not appropriate" just that it is not applicable. JumpropeBuf is really design (as I see it) for handling large numbers edits to the same location. Which is exactly what we have in the tracing benchmarks. But in an "realworld" "interactive" environment usually after every edit there are follow up operation like rendering to the screen, sending change set to language servers or linters, running syntax highlighting etc. So you want those edits to applied immediately, not buffered.

I understand this is probably different in the CRDT case where you could have swarms of edits coming from other clients. In that case I can see buffering being useful. And as you pointed out, all of these data structures are more then fast enough at the scale of the tracing benchmarks.

That being said, it's interesting that piece tables are kind of a similar idea, expect they never actually apply the edits. They just keep them buffered and parse the buffer when they need to get the current state of the text. I wonder if JumpRopeBuf could become something similar. Basically a piece table the occasionally merges the edits to avoid fragmentation.

josephg · on Oct 10, 2023

Re: a real editor, I guess my question is what queries are made to the rope data structure between edits? If those queries are trivial, we might be able to implement them without disturbing the buffer. (So, an order of magnitude extra performance). And if they’re nontrivial they’d dominate the time spent in the rope data structure for a typical text editor. So the benchmarks we’ve been writing aren’t actually testing the right things!

> I wonder if JumpRopeBuf could become something similar. Basically a piece table…

Maybe, but I doubt it would help.

The reason why JumpropeBuf improves performance over Jumprope is because it doesn’t need to traverse into the skip list structure to make adjacent changes in the most common case (when the new edit happened at the same location as the last change). Basically it’s a micro optimization for a common case. And it provides value because it’s so dead simple that it manages to be an order of magnitude faster than the skip list. I think if you added a full piece table in there, the overhead of piece table operations would be similar to the overhead of modifying the skip list itself. So at that point, the benefits from the JumpropeBuf approach would vanish. And I already have a pretty fast data structure for merging changes at arbitrary locations in the skip list & gap buffer at each node.

In short, I think adding a piece table in front would necessitate deoptimizing for the common case (edit at the last location) without adding real value over what the skip list is already doing.

By all means try it though. I’d love to be proven wrong!

IshKebab · on Oct 9, 2023

> Text is really small

Tell that to my 10GB JSON files.

josephg · on Oct 9, 2023

That doesn’t sound like a text file. Sounds like you have the world’s least convenient database.

You’re on your own with that one. Sounds like a job for SQLite, not a rope library.

alpaca128 · on Oct 9, 2023

I've reformatted 1GB files with a Vim macro in the past, weeks ago I processed 3-4GB JSON files, it's not that unusual nowadays if you work with a bit of data.

Though note that in many cases long lines are also important to consider - many editors struggle hard and become near unusable if the line on screen is e.g. 1MB long (which again isn't as uncommon as you'd think).

eviks · on Oct 10, 2023

Indeed. Do you know which editors don't blink working with a 1gb-1line json file?

KerrAvon · on Oct 9, 2023

Dude. TFA is literally about the core data structures for a text editor. Many people care about large text files. It's something people actually do. BBEdit on the Mac can edit files much larger than RAM without thrashing. If you don't want to support text editors, that's fine, but then what is your rope library for?

josephg · on Oct 9, 2023

Good question. I wrote it to support my work on highly efficient text CRDTs for collaborative editing. I wrote a blog post about it a few years ago, though this is out of date now since I’ve made so many new performance improvements:

https://josephg.com/blog/crdts-go-brrr/

roland35 · on Oct 9, 2023

I like big json files and I can not lie! All the other programmers can't deny! When I load json with an itty bitty schema...

pphysch · on Oct 9, 2023

You're opening those in a text editor?

mananaysiempre · on Oct 9, 2023

Absolutely, although sometimes I have to resort to less(1) when everything else crashes and burns on them. The advanced troubleshooting technique known as “look” is as useful as ever.

eviks · on Oct 10, 2023

Why not use the common tool for some quick lookup?

kaba0 · on Oct 11, 2023

Arguably, different algorithm could be used for that and for “normally” occurring files.

KerrAvon · on Oct 9, 2023

> If a 1mb text document takes up 5mb of ram

Text files are sometimes not that small. It's not that unusual to have a multigigabyte text log, sometimes larger than system RAM on the machine.