Spring Doors are, without a doubt, my absolute favourite IPC primitive (what, you don’t have one? Weirdo). They’re a little bit hard to understand because there are a lot of components, many of which are separable, that combine.
The core is a very fast context switch. Normally, on *NIX systems, the receiver of an IPC message does a blocking call (to read or similar, or maybe a select descendant). The scheduler then marks the thread as blocked. When someone sends a message, the IPC channel flags any blocked threads as runnable and, at some point in the future, the scheduler wakes one up. With Doors, the scheduler is not involved at all. The send operation wakes the receiving thread and switches to it immediately. The woken thread continues to run with the caller’s quantum and, if it is preempted, resumes with the caller’s priority. This gives both a trivial way of doing priority inheritance and also gives a very low-latency IPC channel.
The next component is the shuttle. This is similar to a trusted stack: it ensures that there’s a return path for any message. If process A invokes a door in B, and B invokes one in C, there’s a stack of ABC. If B crashes in the middle, the return from C will wake A with a failure. You can also build timeouts on these, so if A times out waiting for B, then B and C can continue to work and B can the notified that the return failed because its caller was detached.
Finally, there’s a top layer that builds an RPC mechanism on top of these primitives. This (as was the fashion after Mach) comes with an IDL that generates wrapper functions. For anything whose arguments fit in registers, there’s a very fast path in the kernel that, on the fast context switch, simply preserves the used argument registers (I think the caller stub zeroes the unused ones), restores a subset of the callee’s ones (the ones not clobbered in system call) and executes from there. The next layer up allows small copies. The receive call is required to provide a certain amount of buffer space and the caller can do a one-copy operation to copy there (possibly two copies with modern MMUs - this was easier on SPARC). This lets on-stack arguments move very fast (memcpy speed). Finally, for very large objects there’s a page-flipping model.
On Solaris, if I remember correctly, the page-flipping bit was built on top of a more general mechanism where some arguments were allowed to be file descriptors. This hit a slightly slower path in the kernel that did the equivalent of a dup (or a SCM_RIGHTS message over a socket) and the userspace bit did an mmap.
The ability to pass file descriptors lets you build really nice things, because you can pass doors over doors. A DBUS-equivalent thing built with Doors would have a broker that just did introductions and handed you back the door for any RPC endpoint for the call-response things. This completely eliminates most of the buffering problems that the systemd version of DBUS has had to work so hard to avoid. The broker is needed on the data path only for the fan-out publish-subscribe kind of eventing.
Solaris also added some things that I would avoid. For example, their kernel and libc were tightly coupled and the kernel managed the thread pool for doors: if you didn’t have enough threads, it could spin up more. I’d build this as a separate notification that would let you integrate it with whatever other thread pools you’re running. If you need to spin up a new thread, you’re already so far from a fast path that a few extra system calls will make very little difference.
The way that Solaris attached doors to the filesystem was also odd. Rather than creating them like named pipes or AG_UNIX sockets (where the creation bound them to some location in an IPC namespace), there was an extra call to attach the file descriptor to the filesystem. This always struck me as weirdly non-orthogonal. You could attach a door to the filesystem, but not other arbitrary file descriptors. I suspect that there were plans to let you do the same with pipes and sockets, to attach ones created with pipe / pipe2 / socketpair into the filesystem, but I don’t know if it ever happened.
I had a couple of encounters with Solaris Doors in the 1990s. Warning: ancient memories with limited fact checking, read with caution.
One was nscd, the name service cache daemon. It was designed to speed up NIS/YP user lookups, when the password file was not available locally. But all the name service switch stuff went through nscd, including host lookups. In an ISP environment, without NIS but with a lot of DNS, this turned out to be bad (TM), because nscd was single-threaded. It was much better to ensure that nscd was never running, and the server was running a proper local DNS cache with proper support for concurrent resolution.
The other was the automount daemon and the autofs. Once again there was a scalability limit due to the automount daemon being single threaded. In normal use (home directories on NFS) it wasn’t a problem because once a directory was mounted the kernel autofs no longer needed to consult the userland automount daemon. However there was no negative cache in the kernel, so it was possible to overload the IPC channel between autofs and amd. When this happened, autofs would return an EPERM. I learned about this because our web server was looking for a nonexistent .htaccess file in an autofs directory on every HTTP request, and when the load got beyond a certain point, the web server started returning 403 errors. Sigh.
There was another weird userland/kernel interaction which I don’t think was related to Doors. Solaris had an optimization for large timeshared servers handling lots of telnet connections (before ssh was de rigueur): normally, every keypress would have to go network -> telnetd -> pty -> shell, but Solaris had a telnet fast path that bypassed the trip to userland via telnetd. This was implemented as a dynamic kernel module that telnetd loaded at run time. I discovered this because sometimes after a reboot, telnet stopped working – because on our systems telnetd was running in a chroot sandbox where the kernel module was inaccessible. Sigh.
So where those services door-based? Because AFAIU one of the selling points of doors is exactly that they make multi-threading easy (the kernel automatically spins up a thread running the service routine in the process for each incoming request).
I think so? Dunno? See the second sentence I wrote above?
I think for nscd, doors worked well for a cache hit, but failed horribly for a cache miss because it just called into libc after acquiring a big giant lock. So lookups that would normally have proceeded in parallel across multiple unix processes became serialized system-wide.
Dunno about amd: downcalls from the kernel to userspace are dicing with priority inversion, so it doesn’t surprise me at all that the overload situation failed messily. They did well to avoid a deadlock, but the design was asking for trouble.
The original paper about doors on Spring is much more informative, especially the error recovery in the shuttle. The Solaris adaptation made some interesting choices to shoehorn the construct into POSIX. I would do it quite differently now (with the benefit of seeing what they did) and periodically try to persuade someone to implement it for FreeBSD.
I’ve talked to the author of that (and hopefully persuaded him to do another one). It’s missing the most important bit of Doors: switching from the caller to the callee without going via the scheduler. It also copies a lot of the Solaris bits that, in hindsight, were probably a bad idea.
This is absolutely fascinating to read and contrast with our current widespread model of RPC-over-HTTP. Unlike the modern web:
The server can hold on to state and manipulate it per-connection, not per-request
Access is done via capabilities that can be passed to other processes
They spend a lot of work thinking about threads and how control flow migrates from the client to the server and back, spinning up a thread per connection/request on the server, and having the schedulers on both machines coordinate with each other(?)
Jeez I wouldn’t like to think of what happens to the client if the server code is miswritten and never does a “return” operation
It assumes the client and server already share an authentication mechanism and that latency between them is low – basically they’re both on a fast LAN with an auth server of some kind
It explicitly pretends that the server and client share a process – the “shuttle” object is mentioned as controlling signals and procfs stuff, which are in-process state.
Some of this makes sense given that they talk about using it as a local IPC mechanism as well. It’s just wild to read and see how much of it wouldn’t work as a modern internet server. I’m not an expert but it seems like so many of these assumptions either won’t scale up well, won’t work as implementation details change, won’t work as latency increases, or require assuming a lot of trust.
Spring Doors are, without a doubt, my absolute favourite IPC primitive (what, you don’t have one? Weirdo). They’re a little bit hard to understand because there are a lot of components, many of which are separable, that combine.
The core is a very fast context switch. Normally, on *NIX systems, the receiver of an IPC message does a blocking call (to
reador similar, or maybe aselectdescendant). The scheduler then marks the thread as blocked. When someone sends a message, the IPC channel flags any blocked threads as runnable and, at some point in the future, the scheduler wakes one up. With Doors, the scheduler is not involved at all. The send operation wakes the receiving thread and switches to it immediately. The woken thread continues to run with the caller’s quantum and, if it is preempted, resumes with the caller’s priority. This gives both a trivial way of doing priority inheritance and also gives a very low-latency IPC channel.The next component is the shuttle. This is similar to a trusted stack: it ensures that there’s a return path for any message. If process A invokes a door in B, and B invokes one in C, there’s a stack of ABC. If B crashes in the middle, the return from C will wake A with a failure. You can also build timeouts on these, so if A times out waiting for B, then B and C can continue to work and B can the notified that the return failed because its caller was detached.
Finally, there’s a top layer that builds an RPC mechanism on top of these primitives. This (as was the fashion after Mach) comes with an IDL that generates wrapper functions. For anything whose arguments fit in registers, there’s a very fast path in the kernel that, on the fast context switch, simply preserves the used argument registers (I think the caller stub zeroes the unused ones), restores a subset of the callee’s ones (the ones not clobbered in system call) and executes from there. The next layer up allows small copies. The receive call is required to provide a certain amount of buffer space and the caller can do a one-copy operation to copy there (possibly two copies with modern MMUs - this was easier on SPARC). This lets on-stack arguments move very fast (memcpy speed). Finally, for very large objects there’s a page-flipping model.
On Solaris, if I remember correctly, the page-flipping bit was built on top of a more general mechanism where some arguments were allowed to be file descriptors. This hit a slightly slower path in the kernel that did the equivalent of a dup (or a SCM_RIGHTS message over a socket) and the userspace bit did an mmap.
The ability to pass file descriptors lets you build really nice things, because you can pass doors over doors. A DBUS-equivalent thing built with Doors would have a broker that just did introductions and handed you back the door for any RPC endpoint for the call-response things. This completely eliminates most of the buffering problems that the systemd version of DBUS has had to work so hard to avoid. The broker is needed on the data path only for the fan-out publish-subscribe kind of eventing.
Solaris also added some things that I would avoid. For example, their kernel and libc were tightly coupled and the kernel managed the thread pool for doors: if you didn’t have enough threads, it could spin up more. I’d build this as a separate notification that would let you integrate it with whatever other thread pools you’re running. If you need to spin up a new thread, you’re already so far from a fast path that a few extra system calls will make very little difference.
The way that Solaris attached doors to the filesystem was also odd. Rather than creating them like named pipes or AG_UNIX sockets (where the creation bound them to some location in an IPC namespace), there was an extra call to attach the file descriptor to the filesystem. This always struck me as weirdly non-orthogonal. You could attach a door to the filesystem, but not other arbitrary file descriptors. I suspect that there were plans to let you do the same with pipes and sockets, to attach ones created with
pipe/pipe2/socketpairinto the filesystem, but I don’t know if it ever happened.I had a couple of encounters with Solaris Doors in the 1990s. Warning: ancient memories with limited fact checking, read with caution.
One was nscd, the name service cache daemon. It was designed to speed up NIS/YP user lookups, when the password file was not available locally. But all the name service switch stuff went through nscd, including host lookups. In an ISP environment, without NIS but with a lot of DNS, this turned out to be bad (TM), because nscd was single-threaded. It was much better to ensure that nscd was never running, and the server was running a proper local DNS cache with proper support for concurrent resolution.
The other was the automount daemon and the autofs. Once again there was a scalability limit due to the automount daemon being single threaded. In normal use (home directories on NFS) it wasn’t a problem because once a directory was mounted the kernel autofs no longer needed to consult the userland automount daemon. However there was no negative cache in the kernel, so it was possible to overload the IPC channel between autofs and amd. When this happened, autofs would return an EPERM. I learned about this because our web server was looking for a nonexistent .htaccess file in an autofs directory on every HTTP request, and when the load got beyond a certain point, the web server started returning 403 errors. Sigh.
There was another weird userland/kernel interaction which I don’t think was related to Doors. Solaris had an optimization for large timeshared servers handling lots of telnet connections (before ssh was de rigueur): normally, every keypress would have to go network -> telnetd -> pty -> shell, but Solaris had a telnet fast path that bypassed the trip to userland via telnetd. This was implemented as a dynamic kernel module that telnetd loaded at run time. I discovered this because sometimes after a reboot, telnet stopped working – because on our systems telnetd was running in a chroot sandbox where the kernel module was inaccessible. Sigh.
So where those services door-based? Because AFAIU one of the selling points of doors is exactly that they make multi-threading easy (the kernel automatically spins up a thread running the service routine in the process for each incoming request).
I think so? Dunno? See the second sentence I wrote above?
I think for nscd, doors worked well for a cache hit, but failed horribly for a cache miss because it just called into libc after acquiring a big giant lock. So lookups that would normally have proceeded in parallel across multiple unix processes became serialized system-wide.
Dunno about amd: downcalls from the kernel to userspace are dicing with priority inversion, so it doesn’t surprise me at all that the overload situation failed messily. They did well to avoid a deadlock, but the design was asking for trouble.
The original paper about doors on Spring is much more informative, especially the error recovery in the shuttle. The Solaris adaptation made some interesting choices to shoehorn the construct into POSIX. I would do it quite differently now (with the benefit of seeing what they did) and periodically try to persuade someone to implement it for FreeBSD.
Looks like there is an implementation for FreeBSD: https://github.com/bnovkov/freebsd-doors
I’ve talked to the author of that (and hopefully persuaded him to do another one). It’s missing the most important bit of Doors: switching from the caller to the callee without going via the scheduler. It also copies a lot of the Solaris bits that, in hindsight, were probably a bad idea.
This is absolutely fascinating to read and contrast with our current widespread model of RPC-over-HTTP. Unlike the modern web:
Some of this makes sense given that they talk about using it as a local IPC mechanism as well. It’s just wild to read and see how much of it wouldn’t work as a modern internet server. I’m not an expert but it seems like so many of these assumptions either won’t scale up well, won’t work as implementation details change, won’t work as latency increases, or require assuming a lot of trust.
I think doors are for local IPC only.
That would make more sense, but using the term “RPC” to describe it then feels incorrect.