Abstract: The Unix shell is a powerful, ubiquitous, and reviled tool for managing computer systems. The shell has been largely ignored by academia and industry. While many replacement shells have been proposed, the Unix shell persists. Two re- cent threads of formal and practical research on the shell enable new approaches. We can help manage the shell’s es- sential shortcomings (dynamism, power, and abstruseness) and address its inessential ones. Improving the shell holds much promise for development, ops, and data processing.
The copy of FreeBSD that I run locally has a prototype of two things that I keep meaning to share more publicly: content negotiation for pipes and multiple streams for the tty.
Pipe content negotiation is a simple protocol modelled on the OpenStep drag-and-drop protocol, implemented as three
ioctl
s. The sender issues a blockingioctl
to advertise a set of things that it can produce, the sender issues a blockingioctl
to receive this list and a second (non-blocking) one to select the one that it wants to use. If the sender does awrite
before content negotiation, the receiver’sioctl
unblocks and returns an error, similarly if the receiver does aread
before completing content negotiation the sender unblocks and returns an error. This lets you establish a pipe between two processes that can negotiate a content type but also fall back to whatever your default was if neither end supported it.The second feature is ‘pipe peeling’ on the PTY layer. If the host side of the pseudo-terminal supports it then you can ‘peel off’ a pipe from any command that has access to the client side of the terminal to get a second stream for communicating with the terminal. This lets you do things like send a text-formatted table to the terminal and also send structured data that can be presented as an alternative view (e.g. as a table or outline) or explored in the screen reader.
I also modified
libxo
so that anything that supportslibxo
gets support for this for free. It will try to negotiate whichever of the outputs the receiver wants thatlibxo
can produce (JSON, XML, HTML) and fall back to plain-text otherwise. It will also do the pipe-peeling thing if sending output to a TTY, so if the terminal supports it then it will send plain text to the TTY device and also send structured output to another channel (so you can, for example, pop up a web view with the HTML output alongside the terminal window).The next step if I get back to this is to support sending the extra streams over SSH (SSH supports both multiple streams and synchronous commands and so would be able to support both of these) and modify a terminal emulator to do something fun with the structured output.
I implemented this (content type negotiation across producers/consumers in a pipeline) for Linux over ioctl a few months prior to going down the route of Dawn of a new CLI interface.
The reasons that I changed direction, which relates to the article here as well:
In the end, what it ends up with is yet another display server protocol - so re-use the one you have. The path I followed thereafter AWK for Multimedia, a lot of only partially documented parts on runtime redirection (‘crash recovery’), network redirection (‘remote desktop protocols’) and, lately, user-interface and language for modelling the whole (new) shebang in Pipeworld.
Damn, this comment shows far more originality than the paper, which is “what do you mean the POSIX shell isn’t perfect? we’re stuck with it either way”.
libxo has so much potential, but it also seems so underused unless you’re like, building a frontend to a Unix system. This has the potential to make it more interesting if rich terminals come into play.
Meh I don’t think that is a fair summary. It’s pointing out some problems with shell, some good parts, and explaining 2 current research projects.
The
ioctl
negotiation idea sounds very interesting; that said, I know next to nothing aboutioctl
, so I’m curious, does this protocol require support from the kernel, or would it work out of the box on any “POSIX” system (incl. Linux)? (Obviously, under assumption that both sides of the pipe use it; but IIUC in other case it just falls back gracefully to plain text.)Also, in:
do I assume correctly that there’s a typo, and the second part should read: the receiver issues … to receive?
Yes, it’s a kernel feature. That said, it’s only a couple of dozen lines of code. Implementing it in Linux would be fairly easy. I’ll probably do it at some point.
Uh, yes. Ho hum. The sender specifies the things it can produce, the receiver selects the one that it wants.
I frickin’ love this! A few years ago (my notes are dated 2016 but I think I’d already tried this before, and the notes happened post-factum) I tried to come up with something similar, for a related, though slightly different case (exposing “rich” data, like structured records and images, to an interactive shell). I was always unhappy with the result because, in my obtuseness, I kept trying to keep the kernel out of it and implement it all at the shell and program level, which made fallback messy and very prone to failure. I mean, I didn’t even think of bringing the kernel into it, it seemed like something that should’ve been handled strictly in userspace, and now that I read your post it seems completely obvious!
Is this… the kind of stuff that is usually happening in FreeBSD land lately? I have long lost touch with its development (I switched to Linux after 5.4) and I’m wondering if I should maybe, uh, get back in touch? :-D
I proposed this at a FreeBSD DevSummit many years ago and the consensus was it was pretty interesting but no one had time to work on it. I put together the prototype during a Microsoft Hackathon a couple of years back. It’s not currently in a state to be upstreamed because it does not currently bound the amount of kernel memory that it can use for the buffer that holds the types. When I have a few spare days to work on it, I’ll try to get it finished and ready to upstream. The entire diff is under 1KLoC, including the libxo changes, some libc wrappers that provide a more friendly interface than the raw ioctls, and some really long comments explaining the state machine.
I’d like to do a Linux version and the ssh changes to demonstrate that it’s a portable abstraction before I try to upstream it anywhere.
I’ve determined that it is impossible to do this cleanly without involving the kernel.
Paging @andyc. Potential collaboration of the next 1576800000 seconds.
I’ve a very similar (neglected) project for GNU/Linux, with emphasis on backwards compatibility. There might be a flaw in this
ioctl
design if negotiating the communication protocol doesn’t peel off a separate pipe: something may print garbage to stdout, eg due to a library-specific debug environment variable, an unexpected error, a stray debug printf, or a noisy library.Also, consider this scenario:
{ command1; command2 } | { command3; command4 }
. What happens if the commands mix protocol support? What should if one of them getsSIGKILL
in the middle of writing a long packet - undefined behavior?I must strongly discourage thought of embedding of web views. The web’s tech stack is a flaming garbage heap, and this is an opportunity to step out of it. Secondly, ‘render HTML’ is a terribly nasty dependency to have. Thirdly, terminal developers seem an Opinionated lot, and maybe it would be best to not give cause. Fourthly, as you know there are a lot of terminal emulators in a lot of weird places; the kernel implements one for virtual terminals, vim has a fake one, phones, browsers, tmux. If the typical command is spittin’ HTML+JS the question of polyfilling has nasty answers.
Frankly, to make any progress at all with regard to shells, maybe we need a clean break from the people who have been clinging to “a stream of bytes” is all you need for half a century.
Who are those people? Pretty much all the shells here (including Oil) have some kind of support for structured data:
https://github.com/oilshell/oil/wiki/Alternative-Shells
Whether it’s useful and composes is another story, but it feels like you’re expressing the consensus opinion to me, not a contrarian one.
My comment encompassed terminal users, I missed that grandparent comment referenced developers specifically.
I’ve occasionally idly wondered about the power you could get from just fusing the terminal emulator and the shell together. You could get a nicer UI for history completion since it wouldn’t be bound to the terminal grid, better support for fancy prompts (especially when resizing!), built-in detach/reattach a la tmux or something, and so on.
Of course, commands wouldn’t take advantage of this because they wouldn’t know it exists.
Can you explain this a bit more?
That’s an interesting case. I’ve never seen something like that in the wild, so didn’t consider it. I resume that
command1
andcommand2
are run in sequence with the same pipe asstdout
andcommand3
andcommand4
have the same pipe as standard input? In my implementation, onlycommand1
andcommand2
would have the opportunity to do content negotiation, butcommand4
would need to handle whatever they negotiated.That’s not great, but it’s no worse than today where none of the commands have any opportunity to negotiate. You can always provide command-line options to specify input / output formats if you don’t want to use auto-negotiation (and if you’re building a pipeline like this, you’re probably providing them already). If you want to guarantee no content negotiation, you can do
{command1 ; command2} | cat | {command3 ; command4}
. That does a redundant copy, but you could also insert something beforecommand3
that breaks out of the content-negotiation protocol.One of my main use cases for this was integration with GUI clipboard / drag-and-drop infrastructure. On macOS, for example, there’s a
pbcopy
andpbpaste
command, but you need to explicitly pass a type topbpaste
if you want to paste anything other than plain text. I want to be able to write pipelines likepbpaste | svg2png -whatever | pbcopy
and havepbpaste
automatically select the SVG representation from the pasteboard andpbcopy
set the contents as PNG.This was just an example, I think the JSON / XML output is more interesting for things like screen readers. Try typing
ls
in a terminal with a screen reader sometime to see the problem here! Apparently most blind people turn their rate up really high and listen for patterns in the output, which is a pretty awful experience.The kernel mechanism just provides a communication channel to the terminal emulator. Once you’ve peeled off a pipe, it runs the same content negotiation protocol as any other pipe and it’s fine to just not generate the additional formatted data if the terminal emulator just wants to display terminal output. You can imagine something like vim or an IDE using this to get FixIt hints in the LSP format on a secondary stream when compiling, for example.
Programs have to share address space with their libraries, and there is no guarantee that they don’t write to stdout with some obscure environment variable set, or just whenever they feel like it. Peeling off a separate file descriptor gives robustness.
Yes.
Needing to add stuff to new shell scripts to use new functionality is fine, but who wants broken shell scripts. If you’re only adding new commands then there’s 0 risk of breakage, but also possibly not too much reason to modify the kernel.
What commands have you implemented?
Kinda disappointed that this paper doesn’t delve more deeply into the potential for richer inter-program interaction paradigms that “everything is a stream of bytes” doesn’t cover.
Tools like elvish and PowerShell hint at the possibilities but we’d need adoption across the entire UNIX layer cake (shell, applications, etc.) to really leverage them effectively.
Previous submission: https://lobste.rs/s/kw3yn0/once_future_shell_pdf
In case anyone is attending (virtual) HotOS, there is a presentation about this ~now (4pm EST on 6/2), and a panel on Unix shell tomorrow at 1 pm, which I’m participating in!