I dissected a bunch of editors about a year ago and then added some more that users from Reddit wanted. In no way is this professional or completely accurate.
Great article, in the line of the other one where actual performance of alists vs trees vs hashtables was compared.
In school I learned that the only data structure that made sense for text editing was a rope – who could argue against the complexity analysis? Later I learned gap buffers also gave performance good enough for the real world. Imagine how surprised I was to learn that the text editor I use just stores an array of strings!
There was a period of time when I was implementing a toy text editor. I spent a few days writing a gapbuffer implementation after doing a bunch of reading, but eventually I hit a bug and, pressed for time, decided to just use an array of strings. And bugger me, I couldn’t tell the difference using it.
You might find this paper on data structures for text editing interesting, which I came across a couple of days ago when I was looking into how text editors manage their data as well.
The “Piece Table” approach was first developed at Xerox PARC for the Bravo editor, which was designed by Butler Lampson and Charles Simonyi (although the idea itself was apparently from J Strother Moore who was also at Xerox; he’s the Moore of Boyer-Moore String Search as well); Simonyi later went on to develop Microsoft Word, which used the same representation. AbiWord uses it as well. In the realm of text editors, Niklaus Wirth’s Lilith project (inspired by the Xerox Alto machines) had an editor named Lara, which also used a piece chain representation. Here’s a nice tutorial on piece chains with some additional links to historical uses.
As another point in the spectrum, Deuce is based on Emacs with an object-oriented concept of a “line”, etc. to facilitate text, graphics, composition of multiple sources into a buffer, and so on.
I feel like the “rope” approach would be the best, if only because it gives you undo/redo for free.
I find it hard to imagine that the gap-buffer approach would be a good idea; if I load up a hundred-megabyte log-file and hit Ctrl-End to go to the other end of the file, it’s got to copy a hundred megabytes of data across by 10 bytes? That can’t be a good idea. But if Emacs does it, I guess it must be practical after all. Somehow.
I don’t know about Emacs’s implementation, but if I’m remembering my Finseth correctly (http://web.mit.edu/~yandros/doc/craft-text-editing/), modern buffer gap implementations maintain more than just one gap.
I’ve got an editor for a programming game I’m writing in Lua that uses a table of strings (but strings are immutable in Lua), and adding undo was fairly trivial. But within the constraints of my game I can more or less guarantee that you won’t be editing files over 100kb.