1. 46
    1. 9

      It’s interesting that a much simpler gap buffer outperforms ropes in many scenarios: https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/

      1. 4

        I think part of the appeal of ropes is that it’s pretty easy to make them persistent data structures (as the article shows) which makes undo trivial and simplifies concurrency. That justifies the extra overhead.

        I’m currently in mid-rabbit-hole-dive reading about Project Xanadu’s exotic elaborations of the concept, like Enfilades and Ents, which enables a very sophisticated hypermedia system. It’s hard to follow, though, because it’s a collection of transcripts of talks with almost no diagrams. But I’ve almost figured out how the Canopy worked. (Like everything associated with Ted Nelson, these have funny names.)

        1. 2

          The metadata (what they call summaries) for the gap buffer used in those tests was also stored in a sum tree almost exactly like the one in Zed.

        2. 5

          The SumTree looks like an extension of the Segment Tree. Very cool.

          It’s interesting how you can use it as an index on secondary keys contained in the Sum struct, like the row/column, as long as those keys all increase monotonically with the primary key (byte offset.)

          Something like it is used in CouchDB (or was, ten years ago) to optimize the reduce phase of map-reduce queries. Each B-tree node in an index stores the reduced value of its children, so to reduce the range between two keys you can reuse the precomputed values of any entire nodes contained in that range. (This is why the reduce function takes a Boolean rereduce flag, to tell it that it’s being given already-reduced values in case that alters the computation.)

          Also, of course, a Merkle tree or hash-tree can be thought of as an application of a SumTree.

          [Nostalgia: I started implementing a TTY text editor in college, in the mid-80s. I got as far as a prototype that did basic editing and word-wrapping, of which I was very proud. But I graduated before I could finish. It used a gap buffer to store its text.]

            1. 1

              If the post’s author reads this, I think there’s a typo:

              On the surface, the call to rope.offset_to_point(30)

              Based on the context, I think it should be:

              On the surface, the call to rope.offset_to_point(26)

              1. 3

                Thanks for flagging!

                A correction is being deployed now.

                1. 2

                  Also, the very first example seems incorrect, missing a space probably? I would expect: "Hello World!" + "This is your captain speaking." to result in: "Hello World!This is your captain speaking.", whereas you present: "Hello World! This is your captain speaking.". Or is there some functionality of adding an extra " " built-in? 🤔

                  (Otherwise, an awesome article, thanks! 💖)

              2. 1

                Nostalgia: Had used SGI STL implementation of Rope in a CAD software implementing a compiler for CMM DMIS language back in late 90s. Need something the expand macros & inline functions when parsing a program.

                Found a link to Rope implementation: http://ld2014.scusa.lsu.edu/STL_doc/rope

                1. 1

                  Isn’t this basically the same as what xi does?

                  1. 5

                    It is indeed very similar to ropey. One difference seems to be that the leaf chunks in Zed/SumTree are fixed size, while Ropey allows for variable sized chunks (with small vector optimization).

                    One advantage of rolling out a custom version for Zed though, is that they get to control what metrics are cached in the internal nodes and tailor it to their needs.

                    Also, the data structure is generic, so they reuse it everywhere they have metrics that are a monoid:

                    Currently there are over 20 uses of the SumTree in Zed. The SumTree is not only used as the basis for the Rope, but in many different places. The list of files in a project is a SumTree. The information returned by git blame is stored in a SumTree. Messages in the chat channel: SumTree. Diagnostics: SumTree.

                  2. 0

                    So thoughtful to provide help singing along to this