This is just moving from the unix-style cli to the old win32-style cli. Except Microsoft gave up and ended up embracing ANSI escapes and CSIs (some time after WSL landed) to support the ever-growing corpus of apps specifically targeting terminal emulators.
In the Win32 console model, apps emitted only text to stdout and used the console handle to do everything else from setting attributes to switching between the full screen buffers. And the open source world made fun of them for it.
And the open source world made fun of them for it.
I wouldn’t do that, it isn’t like the unix terminal design is exactly ideal either.
I don’t think the unix design is as offensive on the output side as the input though; if you’re just writing a program that wants to output some colorful text, it isn’t all bad. Just remember to check isatty and the command line flag that overrides that before adding the color sequences!
On the input side though, yikes, what a pile of disgusting hacks. You have to do a timeout check on the esc char! And woe be upon you if you need to query the terminal for something, since now that’s both input and output sequences! Worth noting a lot of the newer extended terminal emulators are introducing new features to change this stuff, but they’re tied by the need to be compatible with the large existing ecosystem of applications.
There’s also the separate question of api vs underling data too. In the underlying data, one stream does make synchronization and redirection and other things easier. But then in the api you might want it separate anyway, to make supporting --color=auto and --color=off easier. Stripping escape sequences from the output isn’t awful though, so you could just always write them in your code and have the library function just does that.
Nevertheless, I’m not a fan of making fun of designs… there’s often fair reasons for differences that you can learn from.
If you generalise to that level, well, duh. The argument that in-band signalling UI is harmful in the same way smashing strings together versus prepared statements is not a difficult one.
Microsoft didn’t “give up” – it was a no brainer for them to throw a few idle headcounts on writing an emulator for their WSL trajectory and getting more developers into VScode. They’re otherwise busy adding AI keys and ads in the UI so perhaps their sensibilities is not something to use as ‘appeal to authority’ or inspiration.
I am intimately familiar with the inner workings of the different console layers in the MS ecosystem and their reasons for being – from the days of conio.h, int 21h, mode con codepage prepare and onwards. Their powershell model is much closer in comparison but the power they had there came specifically from them having something of a functional IPC system to work with. Linux/BSD isn’t so lucky.
This is super interesting but I’m having a hard time understanding the full extent of the post. Is there a project implementation somewhere?
I’ve been thinking strongly along the same lines for a while now. We need to break free of terminal emulation and in-band signaling for text based interfaces. I’d be super curious to read more about efforts in this area.
It’s distilled experience from 10 years of slowly sifting through the ~130kloc of linux-tty; 200kloc of ncurses; 70kloc of GNU readline; 70kloc of tmux; 150kloc of openssh + a handful of emulators (excluding all other display server things) – then abstracting and implementing. All because a former colleague dared me to. Now I can claim my beer. That’s to say there is quite a few moments of ‘oh no’ that change your mental model of things that’s hard to convey without some serious code-fu.
There’s quite a few people with insight and experience by now on the Discord (dang kids rejecting IRC).
I recall someone reproducing ACME in Rust as well, but I can’t seem to find the link, it was very “on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”
Yeah. Telling users “if you want message history or to be able to receive DMs while not logged in, just run your own bouncer on a server somewhere” is ridiculous.
I’m shocked that Arcan uses Discord. I thought free software projects were the last bastion against it and of all the projects I would never have expected Arcan to capitulate…
Understand the enemy and get behind enemy lines, talk to the locals and gather intelligence. I don’t plan on staying there for longer than necessary but the alternative is early in proof of concept. In the current slopjob of an Internet I see Discord as one of the more dangerous of accidents (accident in the sense of its original scope of coordination/paratext around gaming).
The quality of information and community if you know which ‘fake BBS’ (sell the appearance of distributed, ‘click to spawn community infrastructure’) to look for is something I’ve not seen elsewhere, not on IRC, not on Usenet, not in mailing lists and so on. Their owners are getting a very good training corpus when other wells are poisoned.
We need to break free of terminal emulation and in-band signaling for text based interfaces. I’d be super curious to read more about efforts in this area.
A case worth looking at is the Windows console api…. which moved in the opposite direction. They had something free of that stuff and went toward terminal emulation, mostly in the name of compatibility with the existing ecosystem of terminal stuff. That’s a strong pull, and hard to imagine adoption of something without two way compatibility there, which does limit your design space somewhat.
But the Windows api - much maligned by some (and tbh its default user interface is kinda meh), but i like it - is more to the PC hardware text mode what xterm is to the vt100. You see the design influence and there’s some emulated compatibility, but also new concepts layered on top that weren’t in the original hardware (of if it was, i didn’t know how to use it lol).
In the hardware text mode, the screen buffer was composed of 2 bytes per character cell. One was the character code, which mapped to the font, the other an “attributes” byte, which was packed bits of foreground color, background color, and blink. Being an in-memory buffer, of course, you can poke those bytes in any random order or do block copies or whatever, but the print apis would generally just write the characters with a default attribute byte. Input was provided entirely separately since they’re separate hardware.
In the win32 console, they expanded on this: the character and attribute thing there is 16 bit, but same idea, they associated a current attribute with the screen buffer then provided functions to write text separate from change attribute, generally abstracting it into functions rather than an exposed memory block (though you can ask it to copy the memory block in and out for you - see ReadConsoleOutput), but you can see the heritage pretty clearly. So to write out some colored text, you first SetConsoleTextAttribute(handle, red); then WriteConsole(handle, "text");. This drives people coming from linux nuts because they so badly want to just printf("\033[44mtext\033[49m")… but there’s a lot to like about separating those things.
The win32 console also lets you allocate entirely separate screen buffers, write to them in the background, then swap it out to be displayed. Doing this with a unix terminal is…. non-trivial, and this is one of the functions they just plain left behind in their transition to unix style. But it is very useful, letting a program work in relative isolation. (In unix terminals, they usually set the magic code for “alternate screen”, vim does this for example, but there’s just one alternate screen, you can’t have multiple that you swap around.)
Input also works differently than unix: there’s two modes. ReadConsoleInput gives you an array of structured input events. Key down, key up, mouse moved, window resized, etc., all different tagged structs. MUCH nicer than reading random sequences on a timer and decoding them according to random environment variables! You can loop on these like any other event loop and find joy. Or you can ReadConsole, which just picks the characters out of it.
ReadConsole is interesting too because when you enable line buffering, the system provides a line editor for the user - arrow keys work, backspace works, there’s programmable history (Get/SetConsoleHistoryInfo), there’s even a built-in tab completion hook! (I could never find decent documentation of this so i wrote something myself on stack overflow then expanded on my blog https://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#tab-completion i got an email once from a guy who said he worked at Microsoft who said that was the best doc he could find on it too lol). In theory, you can do some of this on a linux terminal too - cooked mode read line is buffered but it is more under the kernel’s control than either the terminal or the application, hence why most everyone uses gnu getline (or one of its many similar alternatives) instead when they want a richer or customizable user experience, but the Windows API is reasonably rich and customizable out of the box.
All this said… look up the docs to those functions, and every one of them has a deprecation warning, saying use the virtual terminal sequences instead. Sigh, feel like a step backward. But I understand why they did it given the ecosystem pressure.
Still, the Windows API is worth studying as another way to break free of these terminals, and from what I’ve seen of Arcan, in this post and others, looks like they’ve studied it too.
In the win32 console, they expanded on this: the character and attribute thing there is 16 bit, but same idea, they associated a current attribute with the screen buffer then provided functions to write text separate from change attribute, generally abstracting it into functions rather than an exposed memory block.
Back in the mid-90s, I wrote my own “terminal API” for the PC text screen (my own TUI as it were). You could define a region of the screen (default region—the entire screen) and do things like ScrollUp(region,n) (to scroll the region up n lines) or ScrollLeft(region,y) (to scroll the region left y lines) or even SetFGColor(region,c). At the time, I didn’t support separate screen buffers, but I think it could have been added with no issue.
And I liked this API, but how well would it lend itself to a “remote screen”, say, across a serial or network port? Some form of protocol would have to be defined to separate the commands from the data. And as for supporting existing physical terminals, there would have to have been a translation layer because I don’t know of any physical terminal supporting left/right scrolling (although I would love to learn othewise).
MUCH nicer than reading random sequences on a timer and decoding them according to random environment variables!
When Unix was being developed, there were dozens of different physical terminals available, all working over serial ports (the USB of the day). But the command sets and features vary wildly. The ADM-3A was a very popular (read: cheap) terminal, but it’s capabilities were extremely limited (you could move the cursor, clear the screen, and scroll the entire screen up—that’s about it). But the VT-100 was way more capable, and later models even more so. Unix solved this with a library, using an environment variable to select the terminal type and managing the differences so you didn’t have to.
The only reason a timer is required is because of the ESC key—you need to determine if an ESC character is a key press, or part of a control sequence. It would have been nice had terminals sent a control sequence basically meaning “ESC was pressed” then all the timing would have been avoided, but alas, we have to use the terminals we got, not the terminals we want.
Indeed… and they actually still kinda do, but you can’t use TERM since everybody just claims to be xterm to avoid being treated like a dumb featureless thing by ncurses. Reminds me of User-Agent: Mozilla 5.0 (compatible; MSIE….) in the browser world. Environment variables are kinda meh for this anyway (whether TERM or the TERMINFO ones) since you might move between terminals and need to adjust too; screen, tmux, etc. of course can offer a layer of stability but it’d be nice sometimes to just send down a “relocated” event with new information. (I guess in theory you could hot-plug hardware terminals too, but in practice that’d send a SIGHUP surely anyway.)
It would have been nice had terminals sent a control sequence basically meaning “ESC was pressed” then all the timing would have been avoided, but alas, we have to use the terminals we got, not the terminals we want.
Aye. Even now, 50 years later, we’re stuck with the terminals we got. There’s some newer ones that will send the CSI character instead exclusively, but you need to detect and swap and probably still check the esc char for compatibility anyway so, partial success at best.
I hope some day something like Arcan can manage to finally deliver a terminal we want, but it is definitely an uphill battle.
The only reason a timer is required is because of the ESC key—you need to determine if an ESC character is a key press, or part of a control sequence. It would have been nice had terminals sent a control sequence basically meaning “ESC was pressed” then all the timing would have been avoided, but alas, we have to use the terminals we got, not the terminals we want.
You have other timing channels on the input end - in the sense of the instruction decoder needing to do something when there is nothing in the immediate read buffer after an ESC, OSC, DSC and so on – even though that can happen naturally from a network stall / dropped packet retransmission / … It ends up in a heuristic in the emulator and tty implementation: do you consider blocking until you get more even though that would interfere with the software you are hosting?. You don’t notice it as clearly as with the ESC keypress (we are quite sensitive to jitter and judder). Networks are much faster these days and ncurses et al. tried to smooth it over by using a bandwidth estimator against baudrate so they don’t emit outputs until it’s unlikely that a sequence would get fragmented. I’ve run into plenty of cases where inputs and commands gets misinterpreted one way or another on slow serial links trying to provide a TUI (network routers, debug interfaces).
On the output end you have other timing breaks, an easy case is looking at how an emulator is handling “find /”. You can’t block until it’s completed or people complain you’re unresponsive. If you align to some artificial vsync from outer display system others will complain that you are ‘slow’ when they benchmark emulator a vs emulator b by timing how long a command will take (yet its actually the heuristic being measured). At the same time you don’t know what is actually running because a shell is in the way, so another set of heuristics depending on if you are in altscreen versus line mode and so on. This breaks when you nest terminal emulators (as the inner one becomes altscreen to the outer one even though the actual command is in line mode). Compare running find ‘raw’ and within tmux.
On the sideband end you have others in the signal propagation mess, obvious one is SIGWINCH on drag resize; be too quick to process and propagate resize events and contents will tear and corrupt endlessly. Be too slow and it will feel laggy. This is one you try to mask by padding with background colour and retaining ‘old’ contents hoping the user won’t notice as well as have rate limit or bin on something like the X11 XA_WM_SIZE_HINT width_inc and height_inc matched to cell sizes…
No, but you can represent the associated emphasis with color (and on Windows, you can query the background color to adapt accordingly!). This is done often on many linux terminal emulators too. For example, mine does underline, but ignores italic and treats bold as just a color modifier. xterm does bold, but treats underline and italic i think as a color change (that might be configurable im not sure).
Until Vista, you could hit alt+enter and bring a console full screen, using the actual hardware text mode too.
Of course, extending it to support these things would not be difficult (in theory), there are some bits available lol. (Tho if I was redoing it, I’d probably not keep the char/attributes packed like that, and instead put the attributes as a separate set of overlay ranges, more like a rich text component’s innards.)
Depending on when the API was defined, Unicode might not have been a thing at all (Windows 1.0 was released in 1985 after all). And it wasn’t until the mid-to-late 90s that you got sufficiently high resolution with more than 256 colors on PCs (yes, VGA, available in 1987, supported a 256-color mode, but that was only at 320x200). It was also probably developed to handle both text mode and graphics, thus some of the limitations due to text mode.
The reason I commented on the limitations is that it is so obviously constricted by the time it was designed, without leaving any space to grow.
It’s sad that the most prominent alternative to the terminal
interface had such lack of foresight, and the supposedly worse technology has been able to incrementally adapt from 1 bit to 4 bit to 8 bit and then 24 bit colour, from ASCII to Latin-1 to unicode with bidi and emoji.
There’s something weird going on here that I’m struggling to articulate. I think there’s an uncanny valley between the strict limitations of a terminal and the wide open possibilities of a GUI (or the web), where there isn’t a happy medium that’s as easy to program as a terminal but more graphical or more reactive or something.
I admire people who are trying to find or make that happy medium, but I fear the opposing pulls to the terminal or the web are so strong that building a middle ground is much harder than we might hope.
There’s another weird thing going on here too … stuck with sub-optimal solution because of backwards compatible (probably for things you don’t really want to concern yourself with) and being forced on the upgrade treadmill (because you want the feature—it’s new! It’s fresh! It’s the new shiny!).
Another issue is that it’s easier to combine command line programs (shell scripting) than it is to combine GUI programs. While there have been attempts to script GUIs (Amiga with REXX, Apple with AppleScript) none have really caught on. I have a feeling it’s because most people aren’t computer literate (computerate?), nor are they taught (in my opinion) true computer literacy (which is “computers can do repetitive tasks”) and GUIs are not really geared for it either.
The API behind the appls of Arcan is a long running attempt of finding such a thing, and I’m fairly content with the capabilities by now, but it took a lot of dogfooding. In that sense getting rid of curses and terminal emulation was a side-quest.
I hate saying this, but I’d love a TL;DR summary of these articles because they’re so lengthy. I’ve seen the others in this series posted here, and I still disagree with a lot of the premises. I don’t want clickable URLs in my shell buffer, I don’t think dragging a file onto the shell window should copy the file there, etc. There are just better ways of doing most of those things (IMO).
I don’t really get the focus on terminal emulation and TTYs. Nobody’s forcing you to use that stuff. There are a lot of ways to do “terminal stuff” outside of a terminal, with or without a CLI and TUI.
The obvious way would be to use Vim or Emacs and do the work there. Emacs has a TUI and can (or can be made to) do all of the things the author is asking for, and the underlying details of TTY protocols and terminal emulation is entirely hidden away.
I’m an emacs user, so that’s the only one I have much direct experience with.
I’d imagine most editors can do it. I’ve seen a half-dozen “magit for $something” articles posted here, so probably those tools. VSCode has a shell/terminal thing.
It doesn’t mean it’s correct, though. I like the the separation of concerns, and so an editor should be an editor. What people are doing in the e.g. Neovim community is insane to me (package managers, many Lua rewrites of tools you can use in the CLI, etc.).
I think the evolution needs to come for « CLI replacements », and so far besides Cat9 I really haven’t see anything promising (besides emacs, but it comes with its own issues).
We likely consume, and expect to consume, content on the Internet in very different ways. I personally don’t deal with short-form whatevers at all. If it’s not worth at least an hour with a printout in the arm chair I simply stay away.
You are forced to use terminal emulation and ttys every step of the way (assuming Linux/BSD). The only one that successfully paved another way is Android and it worked rather well for them, and even then it pops up here and there. I know, I worked on it. The first thing your desktop has to do in order to get something in the screen is a ridiculous dance to tell the TTY to please stop doing its thing, and even then it’s there in ways that completely undoes the security model they try to implement.
That something is ‘hidden away’ doesn’t mean that it doesn’t influence every layer above it. Emacs pays through the nose to get its TUI to work. There’s a ridiculous amount of code and workarounds to present its rather modest UI.
A tangent perhaps but as a former Emacs user and as former Vim user (neither are even close to doing the things I ask for) I find it fairly bizarre to use a text editor to do shell work and vice versa. Just as I avoid trying to use a tooth brush to clean my toilet or a toilet brush to clean my teeth. They can get the job done, sortof, but at a price and with some risks.
I’m going to disagree with your last paragraph and say it’s quite natural to use a text editor to do shell work.
Most “shell work” involves navigating around, searching through, and otherwise working with text, which is exactly the toolset built into text editors. cat, grep, sed, and awk are command line tools implementing text editor functionality in a clunky (clunky for interactive use) pipeline/streaming fashion. Likewise, ls, find, cd, and other filesystem navigation is also basic text editor functionality. They all spit out text, and what better way to view and work with it than a text editor? Maybe I’ve had too much Emacs koolaid, though.
For the rest of it, I don’t care much. On a day-to-day basis I never have problems related to TTYs, and with a few (easily fixed) exceptions in the past, it’s been that way for a long time.
With that logic you can use a hex editor for everything, it’s all “just” bytes no? representations matter, interventions matter. Trivialities aside, they operate on different mechanisms and assumptions. Text editing is buffers with some intended persistence and at rather small buffer sizes. You can expect to seek and build features assuming that works; the backing store changing is an exception, not the default and so on.
The cli shell is stream processing. Its data sources are hierarchies of short lived processes that might expect something back from you to do their job and can spit out infinite amounts of data. The world can change beneath its feet between every command. To do that it gets to go through all the ugly legacy – signals, sessions, buffer bloat, line blocking and with that, serial communication.
This brings us back to never having problems; ‘works for me’-isms is just dismissed with ‘doesn’t work for me’-isms and doesn’t get you anywhere. It’s systems engineering, one doesn’t get to think like that. You don’t have to dig far to find people with serious problems with the TTY and a few vulnerabilities a year. First(?) Flatpack sandbox escape? the tty again.
Whenever they try to fix something to make embedded development bearable, you commit the Linus taboo of breaking userspace because your stack relies on hidden assumptions that all go back to the emulating the fantasy computer. To edit text. It’s holding everyone back.
Just the other week I had a coffee spill. It fried my mouse and worse, I had to make a new cup of coffee. I grabbed another rodent from the box of spare input devices. I plugged it in. Now my keyboard stopped working. I unplugged the mouse again. The kernel crashed and I lost a lot of work. Angry reboot. Plug back in. Now there’s half a second delay between keypresses every 10 seconds. 2 minutes later the machine froze up. It turned out the USB mouse exposed itself as serial rather than USB human-interface. This meant some /dev/input/eventN gets routed through the tty layer again. The display server ran the device node non-blocking, but that only goes skin deep. You still have ‘the global’ lock around in tty and a handful of mutexes on top of that. Race condition into memory corruption and panic. Then race condition into deadlock.
The point you missed is that a text editor has tools to work with text, while hex editors typically do not.
I find working inside of emacs generally easier than in the terminal because it has great tools for working with buffers of text, which is what most shell commands produce. If there were a hex editor that made it even easier, then I would indeed use it instead.
It was a junior being thrown into the deep end of the pool (at his request) and had zero experience with the API (unsurprising) but also with python bindings. It’s not been kept up to date as my personal policy with python is that if something is written in it I don’t use it unless at gunpoint.
We used it for a monitoring system for a turnkey distributed ‘from breakpoint in vscode / .qcow2 upload in webUI’ → ‘analysis / harness generation’ → distributed across cluster → crash collection into triaging → UI callback’.
What someone should do though, is to take the -server end of this, embed into whatever terminal emulator is in fashion and suddenly there’s a seamless transition path to bootstrapping in other desktop environments outside of Arcan.
This is just moving from the unix-style cli to the old win32-style cli. Except Microsoft gave up and ended up embracing ANSI escapes and CSIs (some time after WSL landed) to support the ever-growing corpus of apps specifically targeting terminal emulators.
In the Win32 console model, apps emitted only text to stdout and used the console handle to do everything else from setting attributes to switching between the full screen buffers. And the open source world made fun of them for it.
I wouldn’t do that, it isn’t like the unix terminal design is exactly ideal either.
I don’t think the unix design is as offensive on the output side as the input though; if you’re just writing a program that wants to output some colorful text, it isn’t all bad. Just remember to check isatty and the command line flag that overrides that before adding the color sequences!
On the input side though, yikes, what a pile of disgusting hacks. You have to do a timeout check on the esc char! And woe be upon you if you need to query the terminal for something, since now that’s both input and output sequences! Worth noting a lot of the newer extended terminal emulators are introducing new features to change this stuff, but they’re tied by the need to be compatible with the large existing ecosystem of applications.
There’s also the separate question of api vs underling data too. In the underlying data, one stream does make synchronization and redirection and other things easier. But then in the api you might want it separate anyway, to make supporting
--color=autoand--color=offeasier. Stripping escape sequences from the output isn’t awful though, so you could just always write them in your code and have the library function just does that.Nevertheless, I’m not a fan of making fun of designs… there’s often fair reasons for differences that you can learn from.
If you generalise to that level, well, duh. The argument that in-band signalling UI is harmful in the same way smashing strings together versus prepared statements is not a difficult one.
Microsoft didn’t “give up” – it was a no brainer for them to throw a few idle headcounts on writing an emulator for their WSL trajectory and getting more developers into VScode. They’re otherwise busy adding AI keys and ads in the UI so perhaps their sensibilities is not something to use as ‘appeal to authority’ or inspiration.
I am intimately familiar with the inner workings of the different console layers in the MS ecosystem and their reasons for being – from the days of conio.h, int 21h, mode con codepage prepare and onwards. Their powershell model is much closer in comparison but the power they had there came specifically from them having something of a functional IPC system to work with. Linux/BSD isn’t so lucky.
This is super interesting but I’m having a hard time understanding the full extent of the post. Is there a project implementation somewhere?
I’ve been thinking strongly along the same lines for a while now. We need to break free of terminal emulation and in-band signaling for text based interfaces. I’d be super curious to read more about efforts in this area.
It’s distilled experience from 10 years of slowly sifting through the ~130kloc of linux-tty; 200kloc of ncurses; 70kloc of GNU readline; 70kloc of tmux; 150kloc of openssh + a handful of emulators (excluding all other display server things) – then abstracting and implementing. All because a former colleague dared me to. Now I can claim my beer. That’s to say there is quite a few moments of ‘oh no’ that change your mental model of things that’s hard to convey without some serious code-fu.
There’s quite a few people with insight and experience by now on the Discord (dang kids rejecting IRC).
As for other examples on it being used in lieu of ncurses: https://github.com/cipharius/kakoune-arcan/tree/main https://github.com/letoram/nvim-arcan
I recall someone reproducing ACME in Rust as well, but I can’t seem to find the link, it was very “on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”
IRC could have progressed, but it stayed a fossil, ignoring all technological advancement in the past 30 years.
Yeah. Telling users “if you want message history or to be able to receive DMs while not logged in, just run your own bouncer on a server somewhere” is ridiculous.
I’m shocked that Arcan uses Discord. I thought free software projects were the last bastion against it and of all the projects I would never have expected Arcan to capitulate…
Understand the enemy and get behind enemy lines, talk to the locals and gather intelligence. I don’t plan on staying there for longer than necessary but the alternative is early in proof of concept. In the current slopjob of an Internet I see Discord as one of the more dangerous of accidents (accident in the sense of its original scope of coordination/paratext around gaming).
The quality of information and community if you know which ‘fake BBS’ (sell the appearance of distributed, ‘click to spawn community infrastructure’) to look for is something I’ve not seen elsewhere, not on IRC, not on Usenet, not in mailing lists and so on. Their owners are getting a very good training corpus when other wells are poisoned.
A case worth looking at is the Windows console api…. which moved in the opposite direction. They had something free of that stuff and went toward terminal emulation, mostly in the name of compatibility with the existing ecosystem of terminal stuff. That’s a strong pull, and hard to imagine adoption of something without two way compatibility there, which does limit your design space somewhat.
But the Windows api - much maligned by some (and tbh its default user interface is kinda meh), but i like it - is more to the PC hardware text mode what xterm is to the vt100. You see the design influence and there’s some emulated compatibility, but also new concepts layered on top that weren’t in the original hardware (of if it was, i didn’t know how to use it lol).
In the hardware text mode, the screen buffer was composed of 2 bytes per character cell. One was the character code, which mapped to the font, the other an “attributes” byte, which was packed bits of foreground color, background color, and blink. Being an in-memory buffer, of course, you can poke those bytes in any random order or do block copies or whatever, but the print apis would generally just write the characters with a default attribute byte. Input was provided entirely separately since they’re separate hardware.
In the win32 console, they expanded on this: the character and attribute thing there is 16 bit, but same idea, they associated a current attribute with the screen buffer then provided functions to write text separate from change attribute, generally abstracting it into functions rather than an exposed memory block (though you can ask it to copy the memory block in and out for you - see ReadConsoleOutput), but you can see the heritage pretty clearly. So to write out some colored text, you first
SetConsoleTextAttribute(handle, red);thenWriteConsole(handle, "text");. This drives people coming from linux nuts because they so badly want to justprintf("\033[44mtext\033[49m")… but there’s a lot to like about separating those things.The win32 console also lets you allocate entirely separate screen buffers, write to them in the background, then swap it out to be displayed. Doing this with a unix terminal is…. non-trivial, and this is one of the functions they just plain left behind in their transition to unix style. But it is very useful, letting a program work in relative isolation. (In unix terminals, they usually set the magic code for “alternate screen”, vim does this for example, but there’s just one alternate screen, you can’t have multiple that you swap around.)
Input also works differently than unix: there’s two modes.
ReadConsoleInputgives you an array of structured input events. Key down, key up, mouse moved, window resized, etc., all different tagged structs. MUCH nicer than reading random sequences on a timer and decoding them according to random environment variables! You can loop on these like any other event loop and find joy. Or you canReadConsole, which just picks the characters out of it.ReadConsole is interesting too because when you enable line buffering, the system provides a line editor for the user - arrow keys work, backspace works, there’s programmable history (Get/SetConsoleHistoryInfo), there’s even a built-in tab completion hook! (I could never find decent documentation of this so i wrote something myself on stack overflow then expanded on my blog https://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#tab-completion i got an email once from a guy who said he worked at Microsoft who said that was the best doc he could find on it too lol). In theory, you can do some of this on a linux terminal too - cooked mode read line is buffered but it is more under the kernel’s control than either the terminal or the application, hence why most everyone uses gnu getline (or one of its many similar alternatives) instead when they want a richer or customizable user experience, but the Windows API is reasonably rich and customizable out of the box.
All this said… look up the docs to those functions, and every one of them has a deprecation warning, saying use the virtual terminal sequences instead. Sigh, feel like a step backward. But I understand why they did it given the ecosystem pressure.
Still, the Windows API is worth studying as another way to break free of these terminals, and from what I’ve seen of Arcan, in this post and others, looks like they’ve studied it too.
Back in the mid-90s, I wrote my own “terminal API” for the PC text screen (my own TUI as it were). You could define a region of the screen (default region—the entire screen) and do things like
ScrollUp(region,n)(to scroll the region up n lines) orScrollLeft(region,y)(to scroll the region left y lines) or evenSetFGColor(region,c). At the time, I didn’t support separate screen buffers, but I think it could have been added with no issue.And I liked this API, but how well would it lend itself to a “remote screen”, say, across a serial or network port? Some form of protocol would have to be defined to separate the commands from the data. And as for supporting existing physical terminals, there would have to have been a translation layer because I don’t know of any physical terminal supporting left/right scrolling (although I would love to learn othewise).
When Unix was being developed, there were dozens of different physical terminals available, all working over serial ports (the USB of the day). But the command sets and features vary wildly. The ADM-3A was a very popular (read: cheap) terminal, but it’s capabilities were extremely limited (you could move the cursor, clear the screen, and scroll the entire screen up—that’s about it). But the VT-100 was way more capable, and later models even more so. Unix solved this with a library, using an environment variable to select the terminal type and managing the differences so you didn’t have to.
The only reason a timer is required is because of the
ESCkey—you need to determine if anESCcharacter is a key press, or part of a control sequence. It would have been nice had terminals sent a control sequence basically meaning “ESCwas pressed” then all the timing would have been avoided, but alas, we have to use the terminals we got, not the terminals we want.Indeed… and they actually still kinda do, but you can’t use TERM since everybody just claims to be xterm to avoid being treated like a dumb featureless thing by ncurses. Reminds me of User-Agent: Mozilla 5.0 (compatible; MSIE….) in the browser world. Environment variables are kinda meh for this anyway (whether TERM or the TERMINFO ones) since you might move between terminals and need to adjust too; screen, tmux, etc. of course can offer a layer of stability but it’d be nice sometimes to just send down a “relocated” event with new information. (I guess in theory you could hot-plug hardware terminals too, but in practice that’d send a SIGHUP surely anyway.)
Aye. Even now, 50 years later, we’re stuck with the terminals we got. There’s some newer ones that will send the CSI character instead exclusively, but you need to detect and swap and probably still check the esc char for compatibility anyway so, partial success at best.
I hope some day something like Arcan can manage to finally deliver a terminal we want, but it is definitely an uphill battle.
You have other timing channels on the input end - in the sense of the instruction decoder needing to do something when there is nothing in the immediate read buffer after an ESC, OSC, DSC and so on – even though that can happen naturally from a network stall / dropped packet retransmission / … It ends up in a heuristic in the emulator and tty implementation: do you consider blocking until you get more even though that would interfere with the software you are hosting?. You don’t notice it as clearly as with the ESC keypress (we are quite sensitive to jitter and judder). Networks are much faster these days and ncurses et al. tried to smooth it over by using a bandwidth estimator against baudrate so they don’t emit outputs until it’s unlikely that a sequence would get fragmented. I’ve run into plenty of cases where inputs and commands gets misinterpreted one way or another on slow serial links trying to provide a TUI (network routers, debug interfaces).
On the output end you have other timing breaks, an easy case is looking at how an emulator is handling “find /”. You can’t block until it’s completed or people complain you’re unresponsive. If you align to some artificial vsync from outer display system others will complain that you are ‘slow’ when they benchmark emulator a vs emulator b by timing how long a command will take (yet its actually the heuristic being measured). At the same time you don’t know what is actually running because a shell is in the way, so another set of heuristics depending on if you are in altscreen versus line mode and so on. This breaks when you nest terminal emulators (as the inner one becomes altscreen to the outer one even though the actual command is in line mode). Compare running find ‘raw’ and within tmux.
On the sideband end you have others in the signal propagation mess, obvious one is SIGWINCH on drag resize; be too quick to process and propagate resize events and contents will tear and corrupt endlessly. Be too slow and it will feel laggy. This is one you try to mask by padding with background colour and retaining ‘old’ contents hoping the user won’t notice as well as have rate limit or bin on something like the X11 XA_WM_SIZE_HINT width_inc and height_inc matched to cell sizes…
Oh dear, limited colours, no emoji … does it even underline/bold/italic?
No, but you can represent the associated emphasis with color (and on Windows, you can query the background color to adapt accordingly!). This is done often on many linux terminal emulators too. For example, mine does underline, but ignores italic and treats bold as just a color modifier. xterm does bold, but treats underline and italic i think as a color change (that might be configurable im not sure).
Until Vista, you could hit alt+enter and bring a console full screen, using the actual hardware text mode too.
Of course, extending it to support these things would not be difficult (in theory), there are some bits available lol. (Tho if I was redoing it, I’d probably not keep the char/attributes packed like that, and instead put the attributes as a separate set of overlay ranges, more like a rich text component’s innards.)
Depending on when the API was defined, Unicode might not have been a thing at all (Windows 1.0 was released in 1985 after all). And it wasn’t until the mid-to-late 90s that you got sufficiently high resolution with more than 256 colors on PCs (yes, VGA, available in 1987, supported a 256-color mode, but that was only at 320x200). It was also probably developed to handle both text mode and graphics, thus some of the limitations due to text mode.
The reason I commented on the limitations is that it is so obviously constricted by the time it was designed, without leaving any space to grow.
It’s sad that the most prominent alternative to the terminal interface had such lack of foresight, and the supposedly worse technology has been able to incrementally adapt from 1 bit to 4 bit to 8 bit and then 24 bit colour, from ASCII to Latin-1 to unicode with bidi and emoji.
There’s something weird going on here that I’m struggling to articulate. I think there’s an uncanny valley between the strict limitations of a terminal and the wide open possibilities of a GUI (or the web), where there isn’t a happy medium that’s as easy to program as a terminal but more graphical or more reactive or something.
I admire people who are trying to find or make that happy medium, but I fear the opposing pulls to the terminal or the web are so strong that building a middle ground is much harder than we might hope.
There’s another weird thing going on here too … stuck with sub-optimal solution because of backwards compatible (probably for things you don’t really want to concern yourself with) and being forced on the upgrade treadmill (because you want the feature—it’s new! It’s fresh! It’s the new shiny!).
Another issue is that it’s easier to combine command line programs (shell scripting) than it is to combine GUI programs. While there have been attempts to script GUIs (Amiga with REXX, Apple with AppleScript) none have really caught on. I have a feeling it’s because most people aren’t computer literate (computerate?), nor are they taught (in my opinion) true computer literacy (which is “computers can do repetitive tasks”) and GUIs are not really geared for it either.
I think we share a similar feeling, though I am far from finished describing or demonstrating it. My first attempt at jotting it down: https://www.divergent-desktop.org/blog/2020/08/10/principles-overview/#p3 with https://arcan-fe.com/2021/04/12/introducing-pipeworld/ an experiment to understand how interchange could work.
The API behind the appls of Arcan is a long running attempt of finding such a thing, and I’m fairly content with the capabilities by now, but it took a lot of dogfooding. In that sense getting rid of curses and terminal emulation was a side-quest.
I think this API was Windows NT, but I’m not entirely sure. It does use Unicode nowadays at least via the 16 bit characters.
There are links to the code at the top of the page; see “Code (Fossil)” for the main repository, and “Code (Github)” for the github mirror.
I hate saying this, but I’d love a TL;DR summary of these articles because they’re so lengthy. I’ve seen the others in this series posted here, and I still disagree with a lot of the premises. I don’t want clickable URLs in my shell buffer, I don’t think dragging a file onto the shell window should copy the file there, etc. There are just better ways of doing most of those things (IMO).
I don’t really get the focus on terminal emulation and TTYs. Nobody’s forcing you to use that stuff. There are a lot of ways to do “terminal stuff” outside of a terminal, with or without a CLI and TUI.
The obvious way would be to use Vim or Emacs and do the work there. Emacs has a TUI and can (or can be made to) do all of the things the author is asking for, and the underlying details of TTY protocols and terminal emulation is entirely hidden away.
Honestly, besides
emacs, do you have anything else on your mind?I’m an emacs user, so that’s the only one I have much direct experience with.
I’d imagine most editors can do it. I’ve seen a half-dozen “magit for $something” articles posted here, so probably those tools. VSCode has a shell/terminal thing.
Vscode’s terminal is a regular terminal emulator.
It doesn’t mean it’s correct, though. I like the the separation of concerns, and so an editor should be an editor. What people are doing in the e.g. Neovim community is insane to me (package managers, many Lua rewrites of tools you can use in the CLI, etc.).
I think the evolution needs to come for « CLI replacements », and so far besides Cat9 I really haven’t see anything promising (besides emacs, but it comes with its own issues).
We likely consume, and expect to consume, content on the Internet in very different ways. I personally don’t deal with short-form whatevers at all. If it’s not worth at least an hour with a printout in the arm chair I simply stay away.
You are forced to use terminal emulation and ttys every step of the way (assuming Linux/BSD). The only one that successfully paved another way is Android and it worked rather well for them, and even then it pops up here and there. I know, I worked on it. The first thing your desktop has to do in order to get something in the screen is a ridiculous dance to tell the TTY to please stop doing its thing, and even then it’s there in ways that completely undoes the security model they try to implement.
That something is ‘hidden away’ doesn’t mean that it doesn’t influence every layer above it. Emacs pays through the nose to get its TUI to work. There’s a ridiculous amount of code and workarounds to present its rather modest UI.
A tangent perhaps but as a former Emacs user and as former Vim user (neither are even close to doing the things I ask for) I find it fairly bizarre to use a text editor to do shell work and vice versa. Just as I avoid trying to use a tooth brush to clean my toilet or a toilet brush to clean my teeth. They can get the job done, sortof, but at a price and with some risks.
I’m going to disagree with your last paragraph and say it’s quite natural to use a text editor to do shell work.
Most “shell work” involves navigating around, searching through, and otherwise working with text, which is exactly the toolset built into text editors. cat, grep, sed, and awk are command line tools implementing text editor functionality in a clunky (clunky for interactive use) pipeline/streaming fashion. Likewise, ls, find, cd, and other filesystem navigation is also basic text editor functionality. They all spit out text, and what better way to view and work with it than a text editor? Maybe I’ve had too much Emacs koolaid, though.
For the rest of it, I don’t care much. On a day-to-day basis I never have problems related to TTYs, and with a few (easily fixed) exceptions in the past, it’s been that way for a long time.
With that logic you can use a hex editor for everything, it’s all “just” bytes no? representations matter, interventions matter. Trivialities aside, they operate on different mechanisms and assumptions. Text editing is buffers with some intended persistence and at rather small buffer sizes. You can expect to seek and build features assuming that works; the backing store changing is an exception, not the default and so on. The cli shell is stream processing. Its data sources are hierarchies of short lived processes that might expect something back from you to do their job and can spit out infinite amounts of data. The world can change beneath its feet between every command. To do that it gets to go through all the ugly legacy – signals, sessions, buffer bloat, line blocking and with that, serial communication.
This brings us back to never having problems; ‘works for me’-isms is just dismissed with ‘doesn’t work for me’-isms and doesn’t get you anywhere. It’s systems engineering, one doesn’t get to think like that. You don’t have to dig far to find people with serious problems with the TTY and a few vulnerabilities a year. First(?) Flatpack sandbox escape? the tty again.
Previous maintainer of the subsystem? Original gangster Alan Cox - https://lwn.net/Articles/343828 quit (incidentally triggered by a bug from Emacs losing data while trying to be a shell) Greg tried to step up, well - https://www.youtube.com/watch?v=g4sZUBS57OQ
Whenever they try to fix something to make embedded development bearable, you commit the Linus taboo of breaking userspace because your stack relies on hidden assumptions that all go back to the emulating the fantasy computer. To edit text. It’s holding everyone back.
Just the other week I had a coffee spill. It fried my mouse and worse, I had to make a new cup of coffee. I grabbed another rodent from the box of spare input devices. I plugged it in. Now my keyboard stopped working. I unplugged the mouse again. The kernel crashed and I lost a lot of work. Angry reboot. Plug back in. Now there’s half a second delay between keypresses every 10 seconds. 2 minutes later the machine froze up. It turned out the USB mouse exposed itself as serial rather than USB human-interface. This meant some /dev/input/eventN gets routed through the tty layer again. The display server ran the device node non-blocking, but that only goes skin deep. You still have ‘the global’ lock around in tty and a handful of mutexes on top of that. Race condition into memory corruption and panic. Then race condition into deadlock.
The point you missed is that a text editor has tools to work with text, while hex editors typically do not.
I find working inside of emacs generally easier than in the terminal because it has great tools for working with buffers of text, which is what most shell commands produce. If there were a hex editor that made it even easier, then I would indeed use it instead.
Someone should write some Python bindings, an app, and report back to us on the experience!
So someone did many years ago, it was in a startup that collapsed just before the finish line (2020 march, thanks covid). https://github.com/letoram/tui-bindings/tree/master/attic/python
It was a junior being thrown into the deep end of the pool (at his request) and had zero experience with the API (unsurprising) but also with python bindings. It’s not been kept up to date as my personal policy with python is that if something is written in it I don’t use it unless at gunpoint.
We used it for a monitoring system for a turnkey distributed ‘from breakpoint in vscode / .qcow2 upload in webUI’ → ‘analysis / harness generation’ → distributed across cluster → crash collection into triaging → UI callback’.
What someone should do though, is to take the -server end of this, embed into whatever terminal emulator is in fashion and suddenly there’s a seamless transition path to bootstrapping in other desktop environments outside of Arcan.