1. 41
  1.  

  2. 21

    But it’s pretty annoying to have to tell all your daemons not to fork

    Most daemons should not fork. The double-fork-exec hack to re-parent the daemon on top of PID 1 and then write a PID file is a legacy thing. There are all sorts of ways it can go wrong and was only really useful for systems that didn’t have any process manager.

    In this specific use-case though it makes sense because Nginx is using fork to do zero-downtime restarts.

    1. 15

      I recently discovered how horribly complicated traditional init scripts are whilst using Alpine Linux. OpenRC might be modern, but it’s still complicated.

      Runit seems to be the nicest I’ve come across. It asks the question “why do we need to do all of this anyway? What’s the point?”

      It rejects the idea of forking and instead requires everything to run in the foreground:

      /etc/sv/nginx/run:

      #!/bin/sh
      exec nginx -g 'daemon off;'
      

      /etc/sv/smbd/run

      #!/bin/sh
      mkdir -p /run/samba
      exec smbd -F -S
      

      /etc/sv/murmur/run

      #!/bin/sh
      exec murmurd -ini /etc/murmur.ini -fg 2>&1
      

      Waiting for other services to load first does not require special features in the init system itself. Instead you can write the dependency directly into the service file in the form of a “start this service” request:

      /etc/sv/cron/run

       #!/bin/sh
       sv start socklog-unix || exit 1
       exec cron -f
      

      Where my implementation of runit (Void Linux) seems to fall flat on its face is logging. I hoped it would do something nice like redirect stdout and stderr of these supervised processes by default. Instead you manually have to create a new file and folder for each service that explicitly runs its own copy of the logger. Annoying. I hope I’ve been missing something.

      The only other feature I can think of is “reloading” a service, which Aker does in the article via this line:

      ExecReload=kill -HUP $MAINPID

      I’d make the argument that in all circumstances where you need this you could probably run the command yourself. Thoughts?

      1. 6

        Where my implementation of runit (Void Linux) seems to fall flat on its face is logging. I hoped it would do something nice like redirect stdout and stderr of these supervised processes by default. Instead you manually have to create a new file and folder for each service that explicitly runs its own copy of the logger. Annoying. I hope I’ve been missing something.

        The logging mechanism works like this to be stable and only lose logs in case runsv and the log service would die. Another thing about separate logging services is that stdout/stderror are not necessarily tagged, adding all this stuff to runsv would just bloat it.

        There is definitively room for improvements as logger(1) is broken since some time in the way void uses it at the moment (You can blame systemd for that). My idea to simplify logging services to centralize the way how logging is done can be found here https://github.com/voidlinux/void-runit/pull/65. For me the ability to exec svlogd(8) from vlogger(8) to have a more lossless logging mechanism is more important than the main functionality of replacing logger(1).

        1. 1

          Ooh thankyou, having a look :)

        2. 6

          Instead you can write the dependency directly into the service file in the form of a “start this service” request

          But that neither solves starting daemons in parallel, or even at all, if they are run in the ‘wrong’ order. Depending on network being setup, for example, brings complexity to each of those shell scripts.

          I’m of the opinion that a dsl of whitelisted items (systemd) is much nicer to handle than writing shell scripts, along with the standardized commands instead of having to know which services that accepts ‘reload’ vs ‘restart’ or some other variation in commands - those kind of niceties are gone when the shell scripts are individually an interface each.

          1. 6

            The runit/daemontools philosophy is to just keep trying until something finally runs. So if the order is wrong, presumably the service dies if a dependent service is not running, in which case it’ll just get restart. So eventually things progress towards a functioning state. IMO, given that a service needs to handle the services it depends on crashing at any time anyways to ensure correct behaviour, I don’t feel there is significant value in encoding this in an init system. A dependent service could also be moved to running on another machine which this would not work in as well.

            1. 3

              It’s the same philosophy as network-level dependencies. A web app that depends on a mail service for some operations is not going to shutdown or wait to boot if the mail service is down. Each dependency should have a tunable retry logic, usually with an exponential backoff.

            2. 4

              But that neither solves starting daemons in parallel, or even at all, if they are run in the ‘wrong’ order.

              That was my initial thought, but it turns out the opposite is true. The services are retried until they work. Things are definitely paralleled – there is not “exit” in these scripts, so there is no physical way of running them in a linear (non-parallel) nature.

              Ignoring the theory: void’s runit provides the second fastest init boot I’ve ever had. The only thing that beats it is a custom init I wrote, but that was very hardware (ARM Chromebook) and user specific.

            3. 5

              Dependency resolving on daemon manager level is very important so that it will kill/restart dependent services.

              runit and s6 also don’t support cgroups, which can be very useful.

              1. 5

                Dependency resolving on daemon manager level is very important so that it will kill/restart dependent services

                Why? The runit/daemontools philsophy is just to try to keep something running forever, so if something dies, just restart it. If one restarts a service, than either those that depend on it will die or they will handle it fine and continue with their life.

                1. 4

                  either those that depend on it will die or they will handle it fine

                  If they die, and are configured to restart, they will keep bouncing up and down while the dependency is down? I think having dependency resolution is definitely better than that. Restart the dependency, then the dependent.

                  1. 4

                    Yes they will. But what’s wrong with that?

                    1. 2

                      Wasted cycles, wasted time, not nearly as clean?

                      1. 10

                        It’s a computer, it’s meant to do dumb things over and over again. And presumably that faulty component will be fixed pretty quickly anyways, right?

                        1. 5

                          It’s a computer, it’s meant to do dumb things over and over again

                          I would rather have my computer do less dumb things over and over personally.

                          And presumably that faulty component will be fixed pretty quickly anyways, right?

                          Maybe; it depends on what went wrong precisely, how easy it is to fix, etc. We’re not necessarily just talking about standard daemons - plenty of places run their own custom services (web apps, microservices, whatever). The dependency tree can be complicated. Ideally once something is fixed everything that depends on it can restart immediately, rather than waiting for the next automatic attempt which could (with the exponential backoff that proponents typically propose) take quite a while. And personally I’d rather have my logs show only a single failure rather than several for one incident.

                          But, there are merits to having a super-simple system too, I can see that. It depends on your needs and preferences. I think both ways of handling things are valid; I prefer dependency management, but I’m not a fan of Systemd.

                          1. 4

                            I would rather have my computer do less dumb things over and over personally.

                            Why, though? What’s the technical argument. daemontools (and I assume runit) do sleep 1 second between retries, which for a computer is basically equivalent to it being entirely idle. It seems to me that a lot of people just get a bad feeling about running something that will immediately crash.

                            Maybe; it depends on what went wrong precisely, how easy it is to fix, etc. We’re not necessarily just talking about standard daemons - plenty of places run their own custom services (web apps, microservices, whatever).

                            What’s the distinction here? Also, with microservices the dependency graph in the init system almost certainly doesn’t represent the dependency graph of the microservice as it’s likely talking to services on other machines.

                            I think both ways of handling things are valid

                            Yeah, I cannot provide an objective argument as to why one should prefer one to the other. I do think this is a nice little example of the slow creep of complexity in systems. Adding a pinch of dependency management here because it feels right, and a teaspoon of plugin system there because we want things to be extensible, and a deciliter of proxies everywhere because of microservices. I think it’s worth taking a moment every now and again and stepping back and considering where we want to spend our complexity budget. I, personally, don’t want to spend it on the init system so I like the simple approach here (especially since with microservies the init dependency graph doesn’t reflect the reality of the service anymore). But as you point out, positions may vary.

                            1. 2

                              Why, though? What’s the technical argument

                              Unnecessary wakeup, power use (especially for a laptop), noise in the logs from restarts that were always bound to fail, unnecessary delay before restart when restart actually does become possible. None of these arguments are particularly strong, but they’re not completely invalid either.

                              We’re not necessarily just talking about standard daemons …

                              What’s the distinction here?

                              I was trying to point out that we shouldn’t make too many generalisations about how services might behave when they have a dependency missing, nor assume that it is always ok just to let them fail (edit:) or that they will be easy to fix. There could be exceptions.

                          2. 2

                            Perhaps wandering off topic, but this is a good way to trigger even worse cascade failures.

                            eg, an RSS reader that falls back to polling every second if it gets something other than 200. I retire a URL, and now a million clients start pounding my server with a flood of traffic.

                            There are a number of local services (time, dns) which probably make some noise upon startup. It may not annoy you to have one computer misbehave, but the recipient of that noise may disagree.

                            In short, dumb systems are irresponsible.

                            1. 2

                              But what is someone supposed to do? I cannot force a million people using my RSS tool not to retry every second on failure. This is just the reality of running services. Not to mention all the other issues that come up with not being in a controlled environment and running something loose on the internet such as being DDoS’d.

                              1. 2

                                I think you are responsible if you are the one who puts the dumb loop in your code. If end users do something dumb, then that’s on them, but especially, especially, for failure cases where the user may not know or observe what happens until it’s too late, do not ship dangerous defaults. Most users will not change them.

                                1. 1

                                  In this case we’re talking about init systems like daemontools and runit. I’m having trouble connecting what you’re saying to that.

                          3. 2

                            If those thing bother you, why run Linux at all? :P

                        2. 2

                          N.B. bouncing up and down ~= polling. Polling always intrinsically seems inferior to event based systems, but in practice much of your computer runs on polling perfectly fine and doesn’t eat your CPU. Example: USB keyboards and mice.

                          1. 2

                            USB keyboard/mouse polling doesn’t eat CPU because it isn’t done by the CPU. IIUC the USB controller generates an interrupt when data is received. I feel like this analogy isn’t a good one (regardless). Checking a USB device for a few bytes of data is nothing like (for example) starting a Java VM to host a web service which takes some time to read its config and load its caches only to then fall over because some dependency isn’t running.

                          2. 1

                            Sleep 1 and restart is the default. It is possible to have another behavior by adding a ./finish script to the ./run script.

                        3. 2

                          I really like runit on void. I do like the simplicity of SystemD target files from a package manager perspective, but I don’t like how systemd tries to do everything (consolekit/logind, mounting, xinet, etc.)

                          I wish it just did services and dependencies. Then it’d be easier to write other systemd implementations, with better tooling (I’m not a fan of systemctl or journalctl’s interfaces).

                          1. 1

                            You might like my own dinit (https://github.com/davmac314/dinit). It somewhat aims for that - handle services and dependencies, leave everything else to the pre-existing toolchain. It’s not quite finished but it’s becoming quite usable and I’ve been booting my system with it for some time now.

                        4. 4

                          I’d make the argument that in all circumstances where you need this you could probably run the command yourself. Thoughts?

                          It’s nice to be able to reload a well-written service without having to look up what mechanism it offers, if any.

                          1. 5

                            Runits sv(8) has the reload command which sends SIGHUP by default. The default behavior (for each control command) can be changed in runit by creating a small script under $service_name/control/$control_code.

                            https://man.voidlinux.eu/runsv#CUSTOMIZE_CONTROL

                            1. 1

                              I was thinking of the difference between ‘restart’ and ‘reload’.

                              Reload is only useful when:

                              • You can’t afford to lose a few seconds of service uptime (OR the service is ridiculously slow to load)
                              • AND the daemon supports an on-line reload functionality.

                              I have not been in environments where this is necessary, restart has always done me well. I assume that the primary use cases are high-uptime webservers and databases.

                              My thoughts were along the lines o: If you’re running a high-uptime service, you probably don’t care about the extra effort of writing ‘killall -HUP nginx’ than ‘systemctl reload nginx’. In fact I’d prefer to do that than take the risk of the init system re-interpreting a reload to be something else, like reloading other services too, and bringing down my uptime.

                            2. 3

                              I hoped it would do something nice like redirect stdout and stderr of these supervised processes by default. Instead you manually have to create a new file and folder for each service that explicitly runs its own copy of the logger. Annoying. I hope I’ve been missing something.

                              I used to use something like logexec for that, to “wrap” the program inside the runit script, and send output to syslog. I agree it would be nice if it were builtin.

                            3. 25

                              Nice article. I must admin that I am a systemd fan. I much prefer it to the soup of raw text in rc.d folders. Finally, an init system system for the 1990s.

                              1. 13

                                I’ve never had a problem doing anything with systemd myself - I think a lot of the hate towards it stems from the attitude of the project owners, and how they don’t make any effort to cooperate with other projects (most notably, IMO, the Linux kernel folks). Here’s a couple of interesting mailing list messages that demonstrate that:

                                1. 11

                                  Exactly.

                                  I was initially skeptical about the debug ability of a systemd unit, but the documentation covers things to great depth, and I’m a convert to the technical merits. Declarative service files, particularly when you use a ‘drop-in’, are a definite step up from the shell scripts of sysvinit.

                                  The way the project tries to gobble up /everything/ is a concern though, given their interactions (or lack thereof) with other parts of the community.

                                  1. 2

                                    My impression is that the resistance to systemd stems from it not being unixy. Not being Debiany, even.

                                    I use for i in ..., sed, grep, awk, find, kill -SIGHUP, lsof, inotify, tee, and tr all damned day to mange my system, and systemd has left me blind and toothless.

                                    I’m still working on my LFS-based replacement for my various Debian desktops, vms, and laptop.

                                    1. 1

                                      Declarative service files, particularly when you use a ‘drop-in’, are a definite step up from the shell scripts of sysvinit.

                                      I’ve never found “systemd vs sysvinit shell scripts” to be a particularly compelling argument. “Don’t use sysvinit shell scripts” is a perfectly fine argument, but doesn’t say much about systemd. There are loads of init systems out there, and it seemed to me that systemd was never in competition with sysvinit scripts, it was in competition with other new-fangled init systems, especially upstart which was widely deployed by Ubuntu.

                                      1. 1

                                        In the case of Debian, it’s basically sysvinit or systemd.

                                        1. 2

                                          That’s still not much of an argument for systemd; it’s just passing the buck to the Debian developers, and going with whichever they chose. That’s an excellent thing for users, sysadmins, etc. to do, but doesn’t address the actual question (i.e. why did the Debian devs make that choice?).

                                          According to Wikipedia, the initial release of systemd was in 2010, at which point Ubuntu (a very widely-deployed Debian derivative) had been using upstart by default for 4 years.

                                          Debian’s choice wasn’t so much between sysvinit or systemd, it was which non-sysvinit system to use; with the highest-profile contenders being systemd (backed by RedHat) and upstart (backed by Canonical). Sticking with sysvinit would have been an abstain, i.e. “we know it’s bad, but the alternatives aren’t better enough to justify a switch at the moment”. In other words sysvinit’s only “feature” is the fact that it is already in widespread use, with all of the benefits that brings (pre-existing code snippets, documentation, blogposts, troubleshooting forums, etc.).

                                          These days systemd has that “feature” too, since it’s used by so many distros (including Debian, as you say), which was the last nail in sysvinit’s coffin: at this point sysvinit is mostly hanging on as a legacy option (Debian in particular cares very deeply about stability and compatibility). Choosing between Debian sysvinit and Debian systemd isn’t so much a choice of init system, it’s a choice of whether or not to agree with the Debian developers’ choice to switch init system. And that choice was between systemd, upstart, initng, runit, daemontools, dmd, etc. They abstained (stuck with sysvinit) for many years, until around 2015 when the systemd vs upstart competition was resoundingly won by systemd, with Ubuntu switching away from upstart and Debian switching away from sysvinit.

                                          As I saw all of this going on, my interpretation was:

                                          • Around 2005 every popular distro was using sysvinit because of its entrenched base, a few users advocated for alternatives like initng but the distros didn’t find the improvements to be worth the cost.
                                          • Ubuntu switched to upstart, making init systems a hot topic: sysvinit became viewed as legacy, upstart was being looked at closely by other distros and it seemed like, once it got enough real-world usage, many might switch over.
                                          • Systemd appeared, inspired by Apple’s launchd, and gradually gained users. At this point sysvinit was already seen as legacy and the question was what systemd offered that upstart didn’t.
                                          • Debian debated switching init system, and with input from Ubuntu developers they both agreed that systemd was the better option (from my understanding, systemd’s “lazy” approach was a fundamentally better fit to the init problem than upstart’s “eager” approach). Upstart basically died at this point.
                                          • All subsequent debates about systemd focus on how it’s better than sysvinit, which was never really in question.

                                          To me, comparing systemd to sysvinit is like those shampoo adverts which claim their product gives an X% improvement, but the fine-print says that’s compared to not washing ;)

                                          1. 1

                                            OpenRC is drop-in and works perfectly fine. I dropped it in and I’m using it on all my installs with no issues.

                                            1. 1

                                              I think maybe you’ve misunderstood me.

                                              I don’t mean you can install systemd and it will continue to work with sysvinit scripts.

                                              I’m referring to systemd’s “drop-in” unit configurations. You can override specific parameters of a unit without having to replace the whole thing.

                                              https://coreos.com/os/docs/latest/using-systemd-drop-in-units.html

                                  2. 6

                                    If systemd had restricted itself to services, it would have been a nice init replacement. The problem I have with systemd is everything else it does.

                                    1. 1

                                      IIRC those can be configured off if you only want to keep the init part.

                                    2. 3

                                      I administer Linux systems since 1995-ish (mostly amateur, though some was for real work at the uni).

                                      I am far from an expert but the startup of Linux services has always been a nightmare. Maybe that there is a great philosophy behind but starting service A after service B was horrendously complicated.

                                      It took me 20 minutes to understand how to do that in systemd and move all my services there. Never looked back never had issues.