I apologize for parading my ignorance here, but could somebody who deals with this stuff more regularly than I help explain something?
What the hell is the point of signals in ‘nix systems, and how are they handled gracefully?
My understanding was that the handler for signals was invoked as, effectively, an interrupt, and the entire burden of not screwing your program state up lies with the programmer–which suggests, to me, that you need to be writing a system that internally is an event queue lest you invite weiiiird bugs.
Am I just weird in seeing signals as kinda weird cruft in an otherwise tolerable API?
Yes, that is more or less exactly right. They are async interrupts. For more fun, consider they also need to be described as “reliable signals” which implies something quite terrible about the status of signals in the past.
The only thing you can safely do in a signal handler is set a flag, then check that in your event loop, at which point you ask, why not just make it a first class part of the event loop? And that’s exactly what most event abstractions (libevent, etc.) do. Attempting to actually handle the signal in the handler is certain doom.
I seem to remember some things (from reading some thickass books on programming in ‘nix environments) about signals masking lower-priority signals, and also not having reliable counts of how many you’ve received?
There’s no real priority system (although numerically lowest to highest plays out in practice). While handling one signal, others are implicitly masked and you can also manually mask some at various times, but this gets tricky fast. Right, there’s no count. The kernel side implementation is just a bitmask of things you haven’t received yet.
In addition to tedu’s excellent answer, let me add a bit.
The “default” behavior when a process misbehaves is for the OS to kill it. Tried to deference a NULL pointer, or memory in kernel space? You die. Obviously, this isn’t always what you want, so the next evolutionary step is for the OS to send signals that usually mean death (except for a few intended to be harmless, like SIGCHLD, which tells a parent that its child process died) or stoppage until further instructions (e.g. SIGSTOP to stop and SIGCONT to resume) but that, in many cases, processes can choose to handle differently. So what actually happens when you seg-fault is that the OS detects the bad memory reference and sends a SIGSEGV to the process. If you want to survive segmentation faults, you can write a SIGSEGV handler– although the only way I could get anything useful out of it was to use setjmp and longjmp, because returning to the seg-faulting instruction isn’t going to do any good…
As for signals, they’re very old/“legacy” in implementation and trade-offs– I believe that original implementations were designed with a single-word “signal vector”/bit-field in mind, which is why the original numbers were all 0-31 and why recurrent signals get dropped– and the general impression I get is that they shouldn’t be used for anything other than the existing use cases.
A way to recover from a SIGSEGV (by “recover” I mean “return from the SIGSEGV into a state that isn’t guaranteed to immediately cause another SIGSEGV to be generated”) is to mmap() something at the address that triggered the fault. You can implement lazy computation this way. This can be used for good.
Another way to recover from a SIGSEGV is to do something terrifying and non-portable, such as messing with the code that tried to dereference the bad pointer, or maybe just incrementing the program counter register on the hunch that maybe the instruction after this one might work. This can be used for evil.
maybe just incrementing the program counter register on the hunch that maybe the instruction after this one might work.
Ah, the good old ON ERROR RESUME NEXT. Maybe those old BASIC folks were on to something after all.
ON ERROR RESUME NEXT
Long ago, in a land far away, there was the Texas Persistent Store, which made use of this, IIRC.
Ah, a PostScript document at an FTP site. Takes me back.
Because my current terminal can’t into postscript, I’m guessing that this used sigsegv to fufill memory requests from the network somehow?
Old-style handling of signals using signal(2) or sigaction(2) is tricky at best. Basically the only things you can do in them is to set a global flag or perform some basic syscalls. This lead to a fun hack using non-blocking pipe(2) for normalizing signal and stream handling. Nowadays, one would prefer to use signalfd(2) or create a dedicated thread calling sigwaitinfo(2) to integrate the signal handling with your main loop.
On the other hand, when writing CLI utilities you often want the default behavior. Which is when you do not alter the handlers and let things play out the usual deadly way.
But yeah, they do feel ancient.
I really enjoy reading posts about these things. There are a lot of issues with POSIX but these things around pipes and sockets and files are really well thought out and many of the issues are very much in retrospect. I think implementing your own shell-pipe program is a very rewarding and insightful task.
OT (sorry), but I wonder how many cats were killed (with pipes!?) in the writing of this cruel rant?
Where does SIGPIPE come from in the first place? I remember that at a company I worked at, we tried building some tooling around tail -f-ing some file that was being appended to, and then processing that, but the command would intermittently fail because of a SIGPIPE. We ended up taking another approach, so we never investigated why it was happening.
Any time you write to a pipe/socket/fifo where the read end is closed. As noticed, sometimes this happens unexpectedly. Without SIGPIPE, you’d get an error from write() which you would presumably deal with in a more ordinary fashion.
What might make the read end close unexpectedly, other than the process failing? For example, is it possible for a pipe to get too full, if the read can’t process as fast as it’s being written to?
That’s about it. Either the reading process dies/quits, or it closes the pipe on purpose. If the pipe fills up, writes will just block until there’s space.
Slightly OT, but does this site (tedunangst.com, not lobste.rs) have an archive? I can click “random” to get to random historic posts but that seems a really cumbersome way to read the old articles.