I think what is needed is a process group (instead of prctl). Then the queue manager can clean up the child processes by sending a signal to the process group when its leader exits.
This involves… PTYs if a vague horrible memory I have tried to suppress is correct?
Googling leads to setsid(), and its man page says:
If a process that is a session leader terminates, then a SIGHUP
signal is sent to each process in the foreground process group of
the controlling terminal.
I am going to try to once again suppress all knowledge in this area so I’m never tempted to try to do anything involving it.
If the queue manager is the normal sort of daemon, it’ll call setsid() to create a session without a controlling terminal, so there’s no need for ptys to be involved. Call setpgid() after forking a worker; when the worker exits, send a signal to its process group to get rid of any stragglers.
Since this is an application, and the problem being solved is coming from outside (somebody killing the main process) or is a bug (crash, otherwise exiting without cleaning up its children), you might consider letting systemd (or something similar) do the heavy lifting. It will put all the processes into a cgroup (not a pid namespace), and when the main process exits for whatever reason systemd will take everything out in the cgroup, which is possible because you can enumerate all the processes in a cgroup.
You can use systemd-run --user to do this in an ad hoc way.
I ran into this recently, you can kill the direct and indirect processes by creating a process group. It doesn’t require elevated privileges and is viral, and you can kill a whole process group. The only way it fails is if the processes you spawn in turn create their own groups (they don’t nest, sadly) but that’s rare.
I think what is needed is a process group (instead of prctl). Then the queue manager can clean up the child processes by sending a signal to the process group when its leader exits.
This involves… PTYs if a vague horrible memory I have tried to suppress is correct?
Googling leads to
setsid(), and its man page says:I am going to try to once again suppress all knowledge in this area so I’m never tempted to try to do anything involving it.
If the queue manager is the normal sort of daemon, it’ll call setsid() to create a session without a controlling terminal, so there’s no need for ptys to be involved. Call setpgid() after forking a worker; when the worker exits, send a signal to its process group to get rid of any stragglers.
I think it’s pretty neat that the cause of this bug ultimately turned out to be the sole
unsafeblock in the entire program.Since this is an application, and the problem being solved is coming from outside (somebody killing the main process) or is a bug (crash, otherwise exiting without cleaning up its children), you might consider letting systemd (or something similar) do the heavy lifting. It will put all the processes into a cgroup (not a pid namespace), and when the main process exits for whatever reason systemd will take everything out in the cgroup, which is possible because you can enumerate all the processes in a cgroup.
You can use systemd-run --user to do this in an ad hoc way.
I ran into this recently, you can kill the direct and indirect processes by creating a process group. It doesn’t require elevated privileges and is viral, and you can kill a whole process group. The only way it fails is if the processes you spawn in turn create their own groups (they don’t nest, sadly) but that’s rare.