The foundation is a text with attributes — a pair of a string and a map from string’s subranges to key-value dictionaries. Attributes express presentation (color, font, text decoration), but also semantics.
This is the same representation used by AppKit (Cocoa) — NSAttributedString. I think it’s the best way to represent rich text, much better than a tree (DOM). It’s one of the reasons why text editing is so nice on macOS.
Without bothering to find the code, I’m wondering how the mapping for subranges is stored. As an interval tree or something? It seems like there would be performance issues if the data structure weren’t chosen carefully (not that I’m implying it wasn’t chosen carefully).
For anyone interested in understanding how an emacs works and is implemented under the hood, I cannot recommend strongly enough The Craft of Text Editing: https://www.finseth.com/craft/
It’s painfully dated in a lot of ways, but also full of great insight and clever architectures / solutions to problems that come up in editor design.
I credit it largely with my realization that array indices point between elements which I consider a really key insight that simplifies a lot of otherwise-fiddly programming.
On the last point, I can sort of see why that could be more intuitive … but I think the idea of half-open intervals, traditionally written [0, 5), basically gives you all of that reasoning? While still thinking of indices pointing to the elements.
That’s how Python works (briefly mentioned in the post), and I never really have problems when thinking about ranges/slices/indices that way.
I would also love to see more editors experiment with this approach. In addition to text-centric UIs, I also find grid based (VisiData, Excel, etc) to be very intuitive for certain tasks. Emacs does have some grid based interfaces, but having first class support along with the text and line based interactions would be nice.
a directory listing can be represented by a buffer
spell checking is an operation on a buffer
And the author was pleasantly “tickled” by that reuse / composition.
But I wonder what the M x N really looks like for Emacs? There are probably dozens or hundreds of operations on buffers?
What are are all the things that can be buffers? ( I was looking at org mode recently, and buffers can also have images? )
I’m brainstorming and Googling techniques on how to make these diagrams automatically .. I’m imagining some kind of SVG template, and then you could write
<top>
M things go here
</top>
<waist>
Buffers of attributed text, Windows, and Frames
</waist>
<bottom>
N things go here
</bottom>
And then some magic (maybe JavaScript) would weave that together with the diagram, do the right word wrapping, etc. … I would really appreciate any tips or help on this :-)
We could share some code – I think it’s really important to document these powerful software architectures, since software is becoming larger, messier, and less compositional.
Another random thought: I feel like this “one program to rule them all” idea –where some people live in emacs (making a lifelong commitment as the quoted Steve Yegge says), some people live in the shell, some people live in the browser, etc. actually has to do with the “narrow waist”.
That is, once you learn how to use a narrow waist in a compositional manner (which can take awhile), it has so much inertia that it sucks in everything you do!
Great article. It’s nice seeing articles about the core of Emacs design that discuss more than “everything is a buffer” (which is true, but less interesting).
Emacs’ text representation is interesting. It’s a gap buffer, not a rope – this simplifies the implementation a bunch, especially for the regexp engine searching the currently open file.
Attributes (essentially arbitrary properties associated with spans of text) compose pretty well, but it’s always surprising when you copy text in Emacs and it sometimes preserves the highlighting of the source file.
The hard bit about text is representing different bytecodes. Emacs will allow you to open e.g. an invalid UTF-8 file and fix it. The ‘multibyte’ representation is robust and versatile, but it definitely complicates the core.
Very interesting. I’ve also been interested in something similar for a while. I really would like a modern, high-quality text editor component that could easily be embedded into or built on top. I don’t want to live in Emacs, but I want the same text-editing features and abilities no matter what program I’m using. There seem to be a bunch of interesting projects mainly using Rust that are building new editors, we’ll see if any of them pan out.
This is the same representation used by AppKit (Cocoa) — NSAttributedString. I think it’s the best way to represent rich text, much better than a tree (DOM). It’s one of the reasons why text editing is so nice on macOS.
Without bothering to find the code, I’m wondering how the mapping for subranges is stored. As an interval tree or something? It seems like there would be performance issues if the data structure weren’t chosen carefully (not that I’m implying it wasn’t chosen carefully).
Yeah, I checked. It’s a balanced binary tree of intervals. According to the GNU Emacs mailing list, it is “not precisely defined anywhere.”
For anyone interested in understanding how an emacs works and is implemented under the hood, I cannot recommend strongly enough The Craft of Text Editing: https://www.finseth.com/craft/
It’s painfully dated in a lot of ways, but also full of great insight and clever architectures / solutions to problems that come up in editor design.
I credit it largely with my realization that array indices point between elements which I consider a really key insight that simplifies a lot of otherwise-fiddly programming.
On the last point, I can sort of see why that could be more intuitive … but I think the idea of half-open intervals, traditionally written
[0, 5)
, basically gives you all of that reasoning? While still thinking of indices pointing to the elements.That’s how Python works (briefly mentioned in the post), and I never really have problems when thinking about ranges/slices/indices that way.
https://en.wikipedia.org/wiki/Interval_(mathematics)
I was expecting to see an engine for an editor, but after all I was pleased with the explanation on eMacs so cool
I would also love to see more editors experiment with this approach. In addition to text-centric UIs, I also find grid based (VisiData, Excel, etc) to be very intuitive for certain tasks. Emacs does have some grid based interfaces, but having first class support along with the text and line based interactions would be nice.
Nice post! As a Vim user, this helped me understand Emacs.
I think it would be useful to explicitly say what the M and N are, and draw it on these types of diagrams.
https://www.oilshell.org/blog/2022/02/diagrams.html#shell-and-distributed-systems
This recent post gives an example of a pair: https://lobste.rs/s/zvkxqs/benefits_everything_being_buffer
And the author was pleasantly “tickled” by that reuse / composition.
But I wonder what the M x N really looks like for Emacs? There are probably dozens or hundreds of operations on buffers?
What are are all the things that can be buffers? ( I was looking at org mode recently, and buffers can also have images? )
I’m brainstorming and Googling techniques on how to make these diagrams automatically .. I’m imagining some kind of SVG template, and then you could write
And then some magic (maybe JavaScript) would weave that together with the diagram, do the right word wrapping, etc. … I would really appreciate any tips or help on this :-)
We could share some code – I think it’s really important to document these powerful software architectures, since software is becoming larger, messier, and less compositional.
Another random thought: I feel like this “one program to rule them all” idea –where some people live in emacs (making a lifelong commitment as the quoted Steve Yegge says), some people live in the shell, some people live in the browser, etc. actually has to do with the “narrow waist”.
https://www.johndcook.com/blog/2008/04/27/one-program-to-rule-them-all/
That is, once you learn how to use a narrow waist in a compositional manner (which can take awhile), it has so much inertia that it sucks in everything you do!
Great article. It’s nice seeing articles about the core of Emacs design that discuss more than “everything is a buffer” (which is true, but less interesting).
Emacs’ text representation is interesting. It’s a gap buffer, not a rope – this simplifies the implementation a bunch, especially for the regexp engine searching the currently open file.
Attributes (essentially arbitrary properties associated with spans of text) compose pretty well, but it’s always surprising when you copy text in Emacs and it sometimes preserves the highlighting of the source file.
The hard bit about text is representing different bytecodes. Emacs will allow you to open e.g. an invalid UTF-8 file and fix it. The ‘multibyte’ representation is robust and versatile, but it definitely complicates the core.
Very interesting. I’ve also been interested in something similar for a while. I really would like a modern, high-quality text editor component that could easily be embedded into or built on top. I don’t want to live in Emacs, but I want the same text-editing features and abilities no matter what program I’m using. There seem to be a bunch of interesting projects mainly using Rust that are building new editors, we’ll see if any of them pan out.