So most of the text editors I’ve written, the internal representation is whatever wchar_t is on the system, almost certainly a 32 bit integer these days.
But in my most recent editor, I assume files, input, and display to all be UTF-8; this is a safe assumption these days.
I store a file as an array of arrays, more or less, since that’s how the user deals with text (as a ragged two dimensional array). The array is just treated as bytes.
I abstract away the encoding shenanigans by having riders (borrowing the term from Wirth and Gutknecht). A rider is attached to a buffer and then can move forward a character, up a line, back, whatever, but it’s responsible for moving the correct number of bytes to end up aligned at a character position.
A rider reports its position as a Position struct, which is a row/column in the array.
A Position has several validity flags: valid for insert (which might be, for example, one after the end of the line), valid for delete (points to an actual place within a line), etc. Obviously changing a buffer invalidates positions.
Anyway, it’s always neat to see how it’s done. I love text editors.
So most of the text editors I’ve written, the internal representation is whatever
wchar_tis on the system, almost certainly a 32 bit integer these days.But in my most recent editor, I assume files, input, and display to all be UTF-8; this is a safe assumption these days.
I store a file as an array of arrays, more or less, since that’s how the user deals with text (as a ragged two dimensional array). The array is just treated as bytes.
I abstract away the encoding shenanigans by having riders (borrowing the term from Wirth and Gutknecht). A rider is attached to a buffer and then can move forward a character, up a line, back, whatever, but it’s responsible for moving the correct number of bytes to end up aligned at a character position.
A rider reports its position as a Position struct, which is a row/column in the array.
A Position has several validity flags: valid for insert (which might be, for example, one after the end of the line), valid for delete (points to an actual place within a line), etc. Obviously changing a buffer invalidates positions.
Anyway, it’s always neat to see how it’s done. I love text editors.