Text showdown: Gap Buffers vs. Ropes

raphlinus · on Oct 9, 2023

Interesting article, and I love to see performance numbers to back up engineering decisions. I'm also glad xi-rope wasn't included, as I'm sure it's performance lags :)

That said, it's not measuring what I would measure. The jumprope benchmarks are reported as the total time to complete a number of tasks. To me, the massive strength of the rope data structure is its O(log n) worst case performance when performing a simple operation such as inserting or deleting a character. That translates into user-perceivable latency. If you have a sequence of a thousand edits, and your rope accomplishes each in 200µs, while your gap buffer has a mean of 100µs but variance extending to 10ms, then I'd much prefer the former, even though this benchmark would indicate the latter is 2x faster.

loeg · on Oct 9, 2023

Also, the author is quick to dismiss 20-100ms latencies as imperceptible, or nearly so. That is too high.

chongli · on Oct 9, 2023

Yeah. 20ms is more than a full frame at 60hz. This sort of delay is unacceptable in 3D games! And here we’re talking about inserting a character into the data structure used to keep track of edits to a text file!

It’s also ignorant of the heavily pipelined nature of input and output on modern computers. If you add up all of the latency from the time you press a key on the keyboard to the time a character appears on the screen — even if you subtract all the time spent inside your text editor — it adds up to many ms of delay. Now the author thinks it’s okay to add another 20-100ms to that just for the basic data structure holding the text? No thank you!

Edit: I have to add this link to a classic post by Dan Luu [1]. The Apple IIe leads the pack with 30ms latency from keystroke down to character on the screen. 20ms for a data structure (in the key-to-screen critical path) on a 2023 computer, even if it only occurs 1% of the time, is totally unacceptable.

[1] https://danluu.com/input-lag/

_a_a_a_ · on Oct 9, 2023

I thought your first para was sarcasm. Now I'm just not sure. It sits on the razor's edge, ready to fall either way in my mind.

NB. Elite running on Emacs https://www.salkosuo.net/2015/10/22/elite-for-emacs.html

codetrotter · on Oct 10, 2023

To be sarcastic is human, but to be on the razor's edge is divine

intelVISA · on Oct 9, 2023

The true end goal of all posts here, such craftsmanship.

jenadine · on Oct 10, 2023

> And here we’re talking about inserting a character

We are talking about the worst case when the buffer needs to be resized. This is not for every character. Just once in a while while editing very large texts.

Not that it makes it acceptable. But also not as bad as what you're making it sound.

HerculePoirot · on Oct 10, 2023

Yeah, I'd have dismissed your comment before I tried running a scheme repl on a raspberry pi over ssh (local) vs a clojure repl running on the jvm on my machine. Day and night.

kaba0 · on Oct 11, 2023

Maybe reread the relevant part of the article. It’s not average case at all.

IshKebab · on Oct 9, 2023

I don't think he's quick to dismiss it. He talks about it several times. Those latencies are when editing files in the hundreds of MB, and only in the worst case non-local edit. How often do you edit I file that big?

Also I expect if that was a real problem you could probably implement a system where you have more than one gap. Does that exist?

celeritascelery · on Oct 9, 2023

> Also I expect if that was a real problem you could probably implement a system where you have more than one gap. Does that exist?

That is a really cool idea! I actually tried implementing that. However I gave up because the code became quite a bit more complex, but it would totally be possible! That being said, multiple gaps would only help the "move gap" latency, they wouldn't help with the resizing latency. The later is both less predictable and much higher then moving the gap. Also when you need to coalesce the text for searching you would loose all your gaps.

IshKebab · on Oct 10, 2023

You can possible use an array of (heap allocated) ring buffers. I heard about that data structure a while ago and it is clever but I have never seen anyone use it.

It means an insertion anywhere just involves M character moves where M is the number of ring buffers.

The data layout won't quite be as nice as a gap buffer (2 chunks), instead you get 2*M chunks. But if you make your ring buffers like 10MB it's probably fine.

funcDropShadow · on Oct 11, 2023

> they wouldn't help with the resizing latency

Could'nt you avoid resizing by massively over allocating in the first place? A decent OS, hint Linux, would only map pages to the allocated memory pages once they are touched.

Avshalom · on Oct 9, 2023

you can also cut the worst case non local edit case in half by implementing a circular gap buffer as in climacs https://flexichain.common-lisp.dev/download/StrandhVilleneuv...

gpderetta · on Oct 9, 2023

> How often do you edit I file that big?

On the other hand, does speed really matter for small files?

Handling uncommon scenarios gracefully is important.

XorNot · on Oct 10, 2023

Hundred+ meg JSON files are pretty common these days. A lot data analysis is just "make some JSON and worry about it later" and that I do like to start by pulling into a text editor.

masklinn · on Oct 10, 2023

Pulling data in a text editor to look at it has nothing to do with editing it tho. And actuallly wilfully editing hundred+ megs json by hand seems like a near nonexistent use case indeed.

funcDropShadow · on Oct 11, 2023

The text editor needs to be able to load large without needing x times more memory than the size of the file.

celeritascelery · on Oct 9, 2023

depends on the context. The rule of thumb I have seen is that anything under 100ms is perceived as instantaneous by a user. So for interactive editing those latencies are acceptable. Though is should be noted those latencies are only for 1 GB file, so they will get worse as the edited file gets larger. But if you are building something like a CRDT where you can have edits coming in from many sources then those will start to compound, and will destroy your responsiveness. Also if you are performing edits while doing something like scrolling you will start to drop frames. Jumprope and Xi-rope were specifically designed for the CRDT use case. As Raph points out, the latencies are what will really bite you. And latency is actually why I picked a gap buffer; the ropes have too high of latency with regex searches. If that was ever fixed I would be tempted to switch to ropes.

gary_0 · on Oct 9, 2023

An occasional 100ms pause while I'm editing a huge 1GB file wouldn't be a deal-breaker for me, as long as regular-sized files have <10ms latency. It's been a long time since I opened a text file that huge, and I think the editor I was using at the time chugged something awful.

VSCode is slightly laggy for me just editing a 10k line text file, so my standards are sadly not that high, even though I find typing latency really annoying.

vlovich123 · on Oct 9, 2023

I’ve always found that latency number to be suspect. In some contexts, it’s imperceivable. In others it is.

For example, dragging an item around a screen with your finger, 100ms is definitely noticeable. Typing seems to be less although I had a coworker claim he could. So it’s hard to say if we don’t notice it vs we’ve just become accustomed to interacting with text with 100ms of latency (or something about the task of typing is more latency insensitive).

I would be interested in seeing the the same data driven analysis for human factors as HCI is even less intuitive than computer algorithm performance.

Searching is not a particularly latency sensitive task as your next match is probably significantly within 1 gib where you’re paying 250ms at worst. But yeah, if regex searching is the task to optimize around, ropes in Rust won’t work well due to the lack of incremental search at this time.

userbinator · on Oct 10, 2023

I would definitely notice 100ms of latency when typing, considering that at my usual speed, the time between characters is around 70ms.

vlovich123 · on Oct 11, 2023

The problem is that your thought is at the level of sentences with your muscle memory being automatic and your visual system mainly scanning for errors. Your “what character do I type next” at speed is less likely to be driven via visual feedback and more by direct proprieception / tactile feedback. Also, it’s not like you have a fixed 100ms delay between each character -the display system is asynchronous and will effectively batch everything up. So that 100ms delay means 100ms after your most recently typed character, not an additive 100ms delay.

Again, I’m not saying you won’t notice it, but I know I’m a very fast typer and when I was working on Oculus AirLink we had a prototype virtual desktop thing and I know I personally didn’t really notice any lag despite our internal metrics saying ~100ms of latency (we’d artificially inject extra latency for studies). It may be people differences, but keep in mind that how you measure is also important and almost no one measures true key stroke to display latency nor do they run human factors studies to accurately characterize if there really is anyone who does notice it (even at Oculus - most studies are very poor quality often with limited sample size, skewed population, and very poor reproduction/analysis because the time allotted for these projects is just enough to try to get guidance on next steps)

XorNot · on Oct 10, 2023

I notice 100ms latency when flipping a mechanical light switch (found out while dialling in debounce on ESPHome).

burntsushi · on Oct 9, 2023

What regex engines do people use for this task generally? Most regex engines I'm aware of don't support stream/incremental searching.

zogrodea · on Oct 9, 2023

This comment by someone at GitHub mentioned that Atom used PCRE’s partial match functionality, but I have no idea about what is used most commonly.

https://news.ycombinator.com/item?id=15386155

vlovich123 · on Oct 9, 2023

I was just going from the linked issue where I thought I read that Go’s engine supports incremental search. Maybe I misread?

While I have you, how close is the Rust incremental search support? It sounded like regex-automata might make it possible but hard to easily get a sense of high level progress from a GitHub issue.

burntsushi · on Oct 9, 2023

Go has this: https://pkg.go.dev/regexp#Regexp.MatchReader --- That just tells you whether an io.Reader contains a match anywhere or not. I don't think it tells you anything else, like the position of the match.

> While I have you, how close is the Rust incremental search support? It sounded like regex-automata might make it possible but hard to easily get a sense of high level progress from a GitHub issue.

I'm not working on it. The current status is that other people are working on trying to write it themselves on top of regex-automata in a way that works for their specific use case. That was my high level strategy: release a separately versioned library with the regex internals[1] exposed so that others can experiment and build on top of it.

The regex-automata crate exposes the low level DFA (and lazy DFA) transition function. So you can pretty much do some kind of stream searching out of the box today (certainly at least what Go provides): https://docs.rs/regex-automata/latest/regex_automata/#build-...

Bottom line here is that if you're looking to implement stream searching in Rust, then you should be able to go out and build something to do it today with regex-automata. But you aren't going to get a streamlined experience. (And whether you ever will or not remains unclear to me.)

[1]: https://blog.burntsushi.net/regex-internals/

LukeShu · on Oct 10, 2023

> Go ... I don't think it tells you anything else, like the position of the match.

The regexp.Regexp.FindReaderIndex and .FindReaderSubmatchIndex methods tell you the position of the match.

burntsushi · on Oct 11, 2023

Despite having studied Go's regex engine in depth, I have somehow missed those methods. To the point that when you responded, I assumed they were added recently. I checked to be sure, and indeed not. They've been there since Go 1.0.

Thanks for the correction!

alpaca128 · on Oct 9, 2023

> ropes in Rust won’t work well due to the lack of incremental search at this time

Unless you don't mind implementing a couple dozen lines yourself. The regex_automata crate allows you to directly access the DFA, so with iteration through the rope and a bit of state tracking you can already do this, just not as convenient (yet) as a single method call.

celeritascelery · on Oct 10, 2023

It’s not quite that simple, but folks are working on it.

https://github.com/rust-lang/regex/issues/425#issuecomment-1...

https://github.com/helix-editor/helix/pull/211#issuecomment-...

burntsushi · on Oct 10, 2023

Note that the lazy DFA and its lower level APIs exists now and threads this needle.

alpaca128 · on Oct 10, 2023

Maybe not in the general case for arbitrary abstracted datastructures, but I have done it in a basic test to see if it's viable for my use-case, which is why I made that claim at all.

kaba0 · on Oct 11, 2023

Our brains are very good at adjusting to latency, but only if it’s a constant.

raphlinus · on Oct 9, 2023

Search is 100% a valid justification to use contiguous storage. It wasn't high in my list of considerations when I was starting out.

_dain_ · on Oct 9, 2023

>The rule of thumb I have seen is that anything under 100ms is perceived as instantaneous by a user.

for typing characters into a text box this is not imperceptible. it should happen in a single frame.

deagle50 · on Oct 9, 2023

And it gets more perceptible when moving the caret (more pixels to track and you're actively tracking).

deagle50 · on Oct 9, 2023

Users can easily feel one frame of latency at 60hz given enough visual feedback (pixels in this case). It gets harder at 120hz and above. I have a toy editor with Ropey and WGPU, just added a 100ms wait on char insert and it's rubber banding when typing fast.

4death4 · on Oct 9, 2023

Instead of the rule of thumb you’ve seen how about the rule of thumb you’ve tried? Make two simple HTML forms: one with an input that instantaneously accepts input and another with 100 ms delay. You can definitely tell the difference.

kaba0 · on Oct 11, 2023

You are not measuring what you want. You measure 0+~20ms vs 100+~20ms.

4death4 · on Oct 11, 2023

Ok, change the experiment to add 80ms artificial delay instead…

alpaca128 · on Oct 9, 2023

I'd say 100ms is perceptible in almost any context, at least if it's on top of the already existing latencies of the hardware & OS (e.g. some laptops already add 150ms latency to every keypress [0], so good luck)

[0] https://danluu.com/input-lag/

Hammershaft · on Oct 10, 2023

100ms is 6 frames of input latency at 60hz! Absolutely noticable.

josephg · on Oct 9, 2023

Author of jumprope here. Great post! It’s cool seeing so many other good rope libraries ready to use.

It’s interesting seeing how poorly jumprope does on memory efficiency. I could definitely change the code to more aggressively join small text nodes together to bring the memory overhead more in line with the other algorithms. I’m just not sure how much anyone cares. Text is really small - a raspberry pi or phone has gigabytes of memory these days. If a 1mb text document takes up 5mb of ram - well, does that matter? I suppose if you’re opening a 1gb log file I do appreciate that some of the other approaches tested here have no overhead until you start making edits.

From a performance standpoint there’s one more trick jumprope has not mentioned here, though it could easily be applied to the other algorithms. And that is, in regular text editing sessions (and in replaying CRDT editing traces) we can buffer the most recent editing run before committing it to the data structure. So, if you start typing some contiguous characters, they get stored separately and only when you move the cursor or start deleting do we flush that change down. This improves performance of replaying editing traces by nearly 10x. But I’m not sure how applicable it is to the “regular” case of text editing. It is, however, super useful for replaying edits in a collaborative editor. It’s fast enough that my crdt doesn’t even bother storing the most recent document on disk - we just replay the entire editing history when a document is opened. Even for large documents, files can usually still be opened in about 1ms (including replaying the entire editing trace). This case doesn’t show up in all benchmarks because you have to opt in to it using the JumpropeBuf wrapper instead of Jumprope.

celeritascelery · on Oct 9, 2023

Thanks for all the work in bootstrapping this part of the ecosystem! I opened an issue[1] on the memory issue for jumprope. It seems to really come down to the large size of skiplist nodes relative to the text.

I did some testing with JumpRopeBuf, but ultimately did not include it because I was comparing things from an "interactive editor" perspective where edits are applied immediately instead of a collaborative/CRDT use case where edits are async. But it did perform very well as you said! I feel like JumpRopeBuf feels similar to a piece table, where edits are stored separately and then joined for reading.

[1] https://github.com/josephg/jumprope-rs/issues/5

josephg · on Oct 9, 2023

Thanks for the issue. There’s nothing stopping you from using it in an interactive environment. What else does a text editor do that would make JumpropeBuf not appropriate in an interactive editor? Range queries going on for rendering? I’d be curious to know what requirements you feel aren’t being covered.

Mind you, at the speeds all these systems operate at, all of the rope libraries you benchmarked should be plenty fast enough not to be a bottleneck in interactive editing anyway. Maybe the nearby posters are right; and the biggest optimization issues remaining are those that only come up when handling 10gb+ documents.

celeritascelery · on Oct 10, 2023

> There’s nothing stopping you from using it in an interactive environment. What else does a text editor do that would make JumpropeBuf not appropriate in an interactive editor?

I wouldn't say it's "not appropriate" just that it is not applicable. JumpropeBuf is really design (as I see it) for handling large numbers edits to the same location. Which is exactly what we have in the tracing benchmarks. But in an "realworld" "interactive" environment usually after every edit there are follow up operation like rendering to the screen, sending change set to language servers or linters, running syntax highlighting etc. So you want those edits to applied immediately, not buffered.

I understand this is probably different in the CRDT case where you could have swarms of edits coming from other clients. In that case I can see buffering being useful. And as you pointed out, all of these data structures are more then fast enough at the scale of the tracing benchmarks.

That being said, it's interesting that piece tables are kind of a similar idea, expect they never actually apply the edits. They just keep them buffered and parse the buffer when they need to get the current state of the text. I wonder if JumpRopeBuf could become something similar. Basically a piece table the occasionally merges the edits to avoid fragmentation.

josephg · on Oct 10, 2023

Re: a real editor, I guess my question is what queries are made to the rope data structure between edits? If those queries are trivial, we might be able to implement them without disturbing the buffer. (So, an order of magnitude extra performance). And if they’re nontrivial they’d dominate the time spent in the rope data structure for a typical text editor. So the benchmarks we’ve been writing aren’t actually testing the right things!

> I wonder if JumpRopeBuf could become something similar. Basically a piece table…

Maybe, but I doubt it would help.

The reason why JumpropeBuf improves performance over Jumprope is because it doesn’t need to traverse into the skip list structure to make adjacent changes in the most common case (when the new edit happened at the same location as the last change). Basically it’s a micro optimization for a common case. And it provides value because it’s so dead simple that it manages to be an order of magnitude faster than the skip list. I think if you added a full piece table in there, the overhead of piece table operations would be similar to the overhead of modifying the skip list itself. So at that point, the benefits from the JumpropeBuf approach would vanish. And I already have a pretty fast data structure for merging changes at arbitrary locations in the skip list & gap buffer at each node.

In short, I think adding a piece table in front would necessitate deoptimizing for the common case (edit at the last location) without adding real value over what the skip list is already doing.

By all means try it though. I’d love to be proven wrong!

IshKebab · on Oct 9, 2023

> Text is really small

Tell that to my 10GB JSON files.

josephg · on Oct 9, 2023

That doesn’t sound like a text file. Sounds like you have the world’s least convenient database.

You’re on your own with that one. Sounds like a job for SQLite, not a rope library.

alpaca128 · on Oct 9, 2023

I've reformatted 1GB files with a Vim macro in the past, weeks ago I processed 3-4GB JSON files, it's not that unusual nowadays if you work with a bit of data.

Though note that in many cases long lines are also important to consider - many editors struggle hard and become near unusable if the line on screen is e.g. 1MB long (which again isn't as uncommon as you'd think).

eviks · on Oct 10, 2023

Indeed. Do you know which editors don't blink working with a 1gb-1line json file?

KerrAvon · on Oct 9, 2023

Dude. TFA is literally about the core data structures for a text editor. Many people care about large text files. It's something people actually do. BBEdit on the Mac can edit files much larger than RAM without thrashing. If you don't want to support text editors, that's fine, but then what is your rope library for?

josephg · on Oct 9, 2023

Good question. I wrote it to support my work on highly efficient text CRDTs for collaborative editing. I wrote a blog post about it a few years ago, though this is out of date now since I’ve made so many new performance improvements:

https://josephg.com/blog/crdts-go-brrr/

roland35 · on Oct 9, 2023

I like big json files and I can not lie! All the other programmers can't deny! When I load json with an itty bitty schema...

pphysch · on Oct 9, 2023

You're opening those in a text editor?

mananaysiempre · on Oct 9, 2023

Absolutely, although sometimes I have to resort to less(1) when everything else crashes and burns on them. The advanced troubleshooting technique known as “look” is as useful as ever.

eviks · on Oct 10, 2023

Why not use the common tool for some quick lookup?

kaba0 · on Oct 11, 2023

Arguably, different algorithm could be used for that and for “normally” occurring files.

KerrAvon · on Oct 9, 2023

> If a 1mb text document takes up 5mb of ram

Text files are sometimes not that small. It's not that unusual to have a multigigabyte text log, sometimes larger than system RAM on the machine.

userbinator · on Oct 10, 2023

The simplest approach is to just use a large string or array of lines. However these each suffer from poor performance as either the size or line length of text increases.

Every time this point comes up in the design of text editors, I feel compelled to mention that memory bandwidth is dozens of GB/s on modern hardware; even when it was only a few MB/s a few decades ago, text editors that used a single contiguous buffer (not even a gap) were very common and no one complained about their speed. Indeed, the benchmarks in this article confirm that.

As another popular article that's often submitted here says: computers are fast --- very fast for human timescales, and in particular manipulating text of humanly-encountered sizes. More performance is lost in unnecessary abstraction and belief that "clever" optimisations (like theoretically-optimal but considerably more complex data structures) work, when they actually turn out the opposite.

There's a lot of dogma around advanced text editor structures and what is "optimal", and IMHO a lot of that is totally unneeded accidental complexity created by those more interested in theoretical daydreaming than real-world solutions. Thus, I think even a gap buffer is "overkill", and a single "gapless" buffer is really sufficient. Don't overthink things.

hoseja · on Oct 10, 2023

And yet, search-and-replace on big files takes seconds.

kaba0 · on Oct 11, 2023

I can grep through half my file system in seconds. What do you measure?

PH95VuimJjqBqy · on Oct 10, 2023

I don't know this authoritatively, but I believe vim uses a simple array internally. And it works well.

gumby · on Oct 9, 2023

In a modern OS, you can combine the gap and the piece table approaches using the MMU to eliminate most copying in the common cases.

Basically: if you need to make a gap, split the page it’s on into two (so your region copy is always less than one page long). You can start the copy into the middle of second page or not (I don’t actually think that’s an optimization in practice but have never measured it).

If you get a lot of fragmentation you can have a background process that does coalescence away from the active insertion point.

kammerdiener · on Oct 10, 2023

Do you have any more info on this?

gumby · on Oct 10, 2023

I thought my comment was pretty comprehensive on the technique. Is there something that was unclear? I can clarify.

I’ve used this technique a few times over the decades in non-editor applications with large sorted vectors.

kammerdiener · on Oct 10, 2023

Your explanation was just fine. I was simply asking what else you knew about the technique. Perhaps where you first heard of it or some examples where it has been applied/benchmarked.

PH95VuimJjqBqy · on Oct 10, 2023

Not only was it clear but it's the approach that popped into my head immediately while reading the article.

subarctic · on Oct 9, 2023

I really like this article's structure for showing benchmarks. each benchmark has its own heading and a paragraph explaining the reason for it and analyzing the results.

I was a bit surprised, though, with how it ends right after the search benchmark. This seems like the perfect setup to talk about search/replace (aka find/replace), which seems like the best use case for non-local edits that I can think of, and therefore would be a great opportunity to show how the algorithms compare in something the ropes should be better at.

jiggawatts · on Oct 10, 2023

Notably, most modern text editors such as Visual Studio Code use neither of those. They use "piece tables", which have a number of advantages. For example they allow efficient incremental saves and the list of edit changes can be trivially wound back or replayed for undo/redo.

https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...

https://en.wikipedia.org/wiki/Piece_table

throwaway17_17 · on Oct 10, 2023

Do you know of any assessment of performance characteristics for said ‘piece tables’? The thrust of TFA is essentially a performance comparison, so I can’t help but wonder why this implementation wasn’t addressed if it is present in one of the most widely used code editors. Also, VS Code is not known for its responsiveness for text editing, is this due to the ‘piece table’, or more likely, due to architectural decisions in the application more generally?

I’m not really up-to-date on any text editing implementation details, so if this is basic/common knowledge feel free to just tell me to Google it.

rudedogg · on Oct 10, 2023

https://www.cs.unm.edu/~crowley/papers/sds/sds.html

zogrodea · on Oct 10, 2023

I implemented a Piece Tree like VS Code's not long ago and found the insert/delete performance fine, but performance for text retrieval (line retrieval, substring) queries wwas embarrassingly bad. Like, 10x slower than the other ropes I compared it with.

I think this comes down to two things:

1. Fragmentation. When a rope splits a string into two by inserting into it, it is able to rejoin the pieces to form a new string, which "greatly reduces space consumption and traversal times" (quote from the first paper on ropes: https://www.cs.tufts.edu/comp/150FP/archive/hans-boehm/ropes... ).

2. Proximity of nodes. In a Piece Tree, all of the inner nodes contain {start, length} piece data. You can construct the whole string represented by the Piece Tree through an in-order traversal. In contrast, the Rope only stores data at the leaves of the tree.

Imagine you are at the root of a Piece Tree that looks like this (where o = a node containing a piece).

      o
     / \
    o   o
   / \ / \
  o   oo  o

Say you want to get a substring from (root - 1) to (root + 1), one character before the tree's root up to one character after the root. How many tree nodes do you need to visit? We can understand by reminding ourselves that "the whole string can be reconstructed through an in order traversal", letting us know the order we need to visit nodes in.

      c
     / \
    v   v
   / \ / \
  o   cc  o

(where c = a node whose string we copy and v = a node we visit on our way to a node we need to copy). That is two separate O(log n) queries we make from the root of the tree for our substring operation.

What are ropes like in contrast?

      v
     / \
    v   o
   / \ / \
  c   co  o

You do still need basically two O(log n) queries, but not from the tree's root. You find the inner metadata node where the start of the substring and the end of the substring (and all strings in between) can be reached and make an in-order traversal from there, which is less traversal time. This, combined with the lower fragmentation that ropes enable (or rather that high fragmentation the Piece Tables have), makes them faster than Piece Trees in my experience.

(A Piece Table/Piece Tree is able to avoid creating new nodes if you insert consecutively, one character after another without backtracking to edit a previous part, since you could just extend the length of the piece. Real life benchmarking data tells me that's not a big advantage though: the text retrieval time is still embarrassingly bad.)

It's good to see Jetbrains avoid the Piece Table and its variants for Fleet (where they mention using a rope).

https://blog.jetbrains.com/fleet/2022/02/fleet-below-deck-pa...

(Note that the rope diagram I posted copies two strings and the Piece Tree copies three, which might be considered unfair. I didn't want to extend the rope diagram to make it copy three strings, but I would say this is actually an accurate depiction of reality considering the fragmentation the Piece Table/Tree has, meaning more nodes to visit.)

jll29 · on Oct 9, 2023

I'm curious: who invented gap buffers originally? Do we know if it was RMS? What was the first peer-reviewed publication describing or mentioning them?

A quick search brings up only references in the 1990s or later...

gumby · on Oct 9, 2023

Emacs uses a gap buffer because TECO used one, long before RMS finished high school.

Emacs was initially just a bunch of macros (in Eugene Ciccarelli's TECO init file IIRC) that provided an easier interface to TECO once a realtime display mode (^R mode of course!) had been added.

Yes, in TECO, control R was a keyword in the language. If you don't know TECO it looks like line noise. For controlling an editor, that's not necessarily insane.

jwstarr · on Oct 10, 2023

The author of TECO, Dan Murphy, wrote an article about its history for the IEEE Annals of the History of Computing. A PDF copy of the article is available here: https://opost.com/tenex/anhc-31-4-anec.pdf

Alas, it does not discuss the gap buffer.

The book The Craft of Text Editing claims the gap buffer technique was first used by TECO (http://www.finseth.com/craft/#c6.6), although this claim is given no support.

gumby · on Oct 10, 2023

Thanks for that reference — I’d never seen it. I’ll be happy never to write any more TECO in my life, but that doesn’t mean I’m not interested.

cbsmith · on Oct 9, 2023

Yeah, gap buffers LONG predate RMS.

scotty79 · on Oct 9, 2023

I wonder if these benchmarks include the time spent on updating the tree that stores information where the lines begin. Because if you have a gap at the beginning of the buffer and you make an insert you need to update all of that information.

celeritascelery · on Oct 9, 2023

All the containers store the metrics (line endings, unicode codepoints, UTF-16 codepoints, etc) in a tree. The tree stores the value of all it's children summed up. So if if you do an insert at the start you only need to update a single branch. When calculating the line count you have to walk down the tree summing all the children as you go. But this means updating and lookup are both logn instead of having to update N nodes.

These benchmarks include that time.

ahefner · on Oct 9, 2023

A neat trick with gap buffers is you can track the line boundaries as indices into the gap buffer itself rather than 'real' indices into the document. In this way the indices of line starts seldom change except when the gap buffer is resized. You can then keep a second gap buffer, this one recording the start of line indices, and keep its gap in sync with cursor movement in the text gap buffer, making insertion cheap and giving a trivial way to map from line numbers to a position in the gap buffer. No trees necessary.

celeritascelery · on Oct 9, 2023

hmm. What happens when you move the gap from the front to the end of the text (or vice versa)? wouldn't that require you to update all the line boundaries? And wouldn't finding the position of the Nth line be O(n).

ahefner · on Oct 9, 2023

Moving the gap from front to end - it takes O(n) time to do that both on the text (gap) buffer and the line-tracking (gap) buffer, so you're only a small constant factor worse.

Find the position of the Nth line is O(1) because N is either before the gap, in which case you just look up the Nth entry in the line-tracking buffer to get the position in the text buffer (which can be similarly mapped back to a position not including the gap in constant time if that's what you need), or N after the gap and you adjust the arithmetic to include the size of the gap.

celeritascelery · on Oct 10, 2023

This does seem like a really neat trick I have not heard of before. I am still confused about moving the gap in the line-tracking buffer. It seems to me the only way you could make look-up O(1) is if you updated all the line indexes you passed over when you moved the gap. Because they are pointing to absolute positions (right?). So moving the line-tracking gap would technically be O(n), but you couldn't use memmove, and instead would need to iterate over one and add or subtract the gap size. Am I misunderstanding?

ahefner · on Oct 10, 2023

The contents of the line-tracking buffer (ignoring its gap for a moment, which you could move anywhere) only change when the text-buffer's gap moves. When that happens, each time a line break moves across the gap, you need to update its index in the line tracking buffer.

It doesn't directly matter where the gap is in the line-tracking buffer except ideally it's positioned consistent with the user's cursor so that if they insert a bunch of new lines, they're inserted into the line-tracking gap.

The position of the line-tracking gap doesn't effect the values stored on either side of it gap, so you can still use memmove there.

teo_zero · on Oct 10, 2023

The trick is to decouple the concept of 'position' in the text, that is independent of the gap, from 'pointer' into the buffer. If you keep track of line beginnings as positions, you can immediately get the pointers:

  ptr = pos if pos<=gapstart else pos + gapsize

teo_zero · on Oct 10, 2023

Replying to myself: it's actually a bad idea because you have to update all indices at each inserted char! Forget about it...

scotty79 · on Oct 10, 2023

Could you describe your implementation of this tree in more detail in the repo? I think it's really interesting part of your project.

celeritascelery · on Oct 10, 2023

It is actually not very novel. It is just a Fenwick tree[1]. And a Fenwick tree is basically a rope where the leafs don't store any text.

basically every leaf holds some metrics (say bytes and line endings) for some chunk of text. By just looking at the leaf we don't know which chunk it points to. But since the leafs are all in order, we can calculate their byte position and line ending by adding them up. If we add all the leaf we should get the total number of bytes and lines in the text. In order to avoid O(n) cost of summing every leaf, we store them in a tree, where each parent holds the sums of it's children. So when we want to find the byte position[2] of a particular line ending we can just walk down the tree and see if the line ending we are searching for is greater then the left child or not. If it is, we add the left child's sum to our running total and go to the right. Otherwise we go to the left. By the time we get to the leaf, we have summed all the chunks before it in O(logn) time.

Insertion/deletion are similar, except when we update the leaf, we also go and update each parent on the way back up.

[1] https://www.baeldung.com/cs/fenwick-tree

[2] https://github.com/CeleritasCelery/rune/blob/be243ea2bd385ec...

sfink · on Oct 10, 2023

When you resize a large gap buffer, it seems like you could mremap the pages after the gap and avoid almost all copying. The usual alignment problems don't apply since you can adjust the position and size of the gap slightly as needed to accommodate.

zogrodea · on Oct 9, 2023

This is an interesting article. Not 100% sure about the implementation of JumpRope, but I think it combines a gap buffer together with a rope. Ropey (and I think Crop too?) also support cheap persistence.

celeritascelery · on Oct 9, 2023

I should have added a section on that.

JumpRope combines a gap buffer with a skip list. Crop combines a gap buffer with a rope. ropey is a "traditional" rope that doesn't use a gap buffer under the hood.

zogrodea · on Oct 9, 2023

Thank you for writing the fun article and experiments! :)

nayuki · on Oct 10, 2023

A fantastic article that goes into a lot of depth and rigor until I got to the very end:

> GB here means 2^30, as it should when talking about base-2 memory. The only people who think it should be 10^9 are hard drive salesmen and the type of people who like to correct all their friends by saying “It’s centripetal, not centrifugal force!”. Also “gibibyte” sounds like Pokémon invented by a first grader.

This attitude ruined it for me. You're a technologist. You should care about vocabulary and precision. You are on the wrong side of history for both of these things - a gigabyte is 10^9 bytes as per SI, and centrifugal force is fictitious.

I challenge you: If a 1 GHz processor can process 1 byte on every cycle, how long does it take to process a 1 GB file? Hopefully you'll see the error in your ways.

Fragmenting the definition of the giga- prefix based on context leads us down some dark paths that we already traversed with the pound-mass vs. pound-force, pound Avoirdupois vs. pound Troy, fluid ounce vs. weight ounce, US gallon and UK gallon, liquid bushel vs. dry bushel, different varieties of tons, statute mile vs. nautical mile, and the list goes on.

userbinator · on Oct 10, 2023

1KB is 1024 bytes per JEDEC.

If you go to a computer parts store and ask for a 17.179869184GB DDR4 DIMM, you'll probably be laughed at and get some very strange looks.

nayuki · on Oct 10, 2023

Good luck fitting a 16 "GB" RAM dump on a 16 GB flash drive when you hibernate the computer.

Your contrived example is absurd; you just have to ask for 16 GiB of RAM, as per the proper definitions.

As for JEDEC, standards bodies are made of people and people are fallible.

userbinator · on Oct 10, 2023

Good luck fitting a 16 "GB" RAM dump on a 16 GB flash drive when you hibernate the computer.

Filesystem overhead notwithstanding, it's worth noting that NAND flash usually had true capacities too, with some additional hidden extra for error correction/wear-leveling. I have a 16MB SLC one that truly has 16,777,216 usable bytes (32M 512-byte sectors), and a slightly newer 2GB one with 2147483648 bytes.

Let's not forget the original IBM PC "10MB" hard drive held a little more than 10MB: 10653696 bytes, a bit more than the 10485760 one would expect.

No one seriously talks about "GiB", and the only ones I hear using it are pedantic assholes. It just sounds absurd and stupid.

As for JEDEC, standards bodies are made of people and people are fallible.

I can say the same goes for SI and their "iB" nonsense.

nayuki · on Oct 10, 2023

> I can say the same goes for SI and their "iB" nonsense.

The Ki-, Mi-, Gi-, etc. prefixes aren't in SI. They're defined in IEC 60027 specifically to avoid conflicting with SI.

smarks · on Oct 10, 2023

Yep. And 2^20 bytes of unreliable storage is a mebibyte.

girvo · on Oct 10, 2023

If it's really unreliable, I guess one could consider it a maybebyte

celeritascelery · on Oct 10, 2023

> you just have to ask for 16 GiB of RAM, as per the proper definitions.

If you walked in an asked for 16 "gibibytes" of RAM they would just look at you funny. Or think you were non-native speaker.

I work in the hardware software boundary, and I have never once heard someone use the term kibibyte, mebibyte, or gibibyte in real life. Occasionally I will see KiB, MiB, and GiB in an academic paper, but that's about it. When I am talking with people and they say kilobyte, they always mean 1024. Trying to pretend otherwise would just cause unneeded confusion. That's what it meant went the term was coined, that what it meant when IEC tried to change the definition 25 years ago, and (unless you are selling hard drives) that's what it means to most people today.

I agree that it is unfortunate that they reused the terms from SI prefixes, since they are not SI prefixes. And if I had a time machine I would tell them to use different prefixes. But they didn't and those are the terms the world has adopted. It's also unfortunate that hard drive manufactures played fast and loose with the definition to pretend like their drives were bigger then they really were. Words are democracy, and they mean what people say they mean, not what the IEC says they should mean.

> As for JEDEC, standards bodies are made of people and people are fallible.

Exactly, so why is the IEC redefinition somehow more valid?

nayuki · on Oct 10, 2023

You're needlessly diluting the meaning of words and making them context-dependent. You create problems at the boundaries, like when 1 GHz faces off with 1 "GB". You're helping to undo the unification of definitions that SI strives for. You're doing this in the name of keeping tradition - which is exactly how we end up with messes like US Customary units.

P.S. Don't forget how "1.44 MB" floppy disks are neither MB nor MiB; their definition is MB = 1024000 bytes. This free-for-all benefits no one. Hard drive manufacturers were right, RAM was wrong.

PH95VuimJjqBqy · on Oct 10, 2023

While I agree with you in spirit, you're being needlessly militaristic here.

It sucks, but we lost that fight. GiB is a thing that people use.

kaba0 · on Oct 11, 2023

> centrifugal force is fictitious

It is not. It exists if you change your point of reference.