1. 77

  2. 41

    DHCP is a protocol that lets machines on a network discover what config they should use by shouting aimlessly at everyone on the network until someone tells them what they want.

    Is a great description of DHCP.

    1. 11

      I’ll be sure to let the person I shamelessly stole that description from know that! She’ll probably be ecstatic to know her shitpost hit the mark.

      EDIT: she was

    2. 8

      One fun incompatibility I ran in to a while ago is that GNU libc will try the first namserver first, and then move on to the second if the first doesn’t work. Musl libc however will just always pick a random nameserver from the list.

      So if you do something like:

      nameserver    # Nameserver with some special resolving rules.
      nameserver       # Fallback in case the above doesn't work.

      You’re going to run in to trouble on musl-based systems.

      1. 5

        One fun incompatibility I ran in to a while ago is that GNU libc will try the first namserver first, and then move on to the second if the first doesn’t work. Musl libc however will just always pick a random nameserver from the list.

        I was curious what musl’s documented behavior is, so I went searching, evidently it issues queries to all your DNS servers in parallel and then uses the first response it gets back in an effort to reduce latency for lookups.

        1. 1

          Ah yes, you’re right; I remember it now. I actually ran in to this about 5 years ago and misremembered the details. I ran in to this when I was working on my DNS-based proxy/filter program, which worked great on my OpenBSD laptop that I initially developed it on while I was away for a month, but had weird intermittent problems on my Void Linux musl desktop system after I got home. It took me quite a while to track it down to this 😅

          1. 1

            I can imagine, it definitely wouldn’t be the behavior I would expect out of the box if you asked me to predict how musl’s lookups would work.

        2. 2

          Which distributions use musl? The behavior you describe for musl is what I remember in whatever we were using in production circa 2003, either suse or redhat.

          1. 5

            Alpine Linux for one. Alpine is commonly used with Docker setups too (because the images are smaller it means you can pack more images per gigabyte of production storage); not to mention Alpine recently added Tailscale as a package so we at Tailscale really have to make it Just Work™️.

          2. 1

            glibc (and most other libcs I’ve encountered) also lets you do this with options rotate (among other things). Granted that’s not default behaviour in most places, but I’ve used that in the past where I wanted some internal systems to do more of a round-robin between internal nameservers.

          3. 7

            Except where I tell systemd-resolved to use my office’s DNS server and it doesn’t, or I tell it to not use the office’s DNS server and it does, and steadfastly refuses to change. Then I go through the whole previous flowchart trying to figure out how to actually tell it to change its behavior, because doing what it tells me to alter what servers it queries silently fails.

            So really it’s just https://xkcd.com/927/

            1. 4

              That’s odd. What version of systemd-resolved do you have?

              1. 3

                The one included in Ubuntu 18.04: systemd 237 +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

                1. 2

                  In my experience the the one in Ubuntu 20.04 does work for this: systemd 245 (245.4-4ubuntu3.6) +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

                  It does have an out-of-tree patch applied that produces some logspam on NXDOMAIN though.

                  1. 1

                    Someday my office will upgrade to 20.04, I look forward to giving this a good hard try. Thanks.

                  2. 2

                    Apparently the version in 20.04 is a lot better. Let’s hope you can upgrade soon.

                2. 2

                  The dot diagrams seem to imply they thought and incorporated the other use-cases and “standards” so that xkcd is hardly relevant.

                3. 7

                  Love the decision tree diagrams! DNS on Linux has in the past been surprisedly immature, it is great too see the progress made with the consolidation around systemd.

                  1. 8

                    I’m grateful to still be running the BSD dream. I absolutely love the simplicity of plain-text config files.

                    1. 4

                      I don’t see how the config files are related? It just seems like all the daemons that handle DNS lookups all suck except for systemd-resolved which sucks the least and is closest to feature parity to either Windows or Mac resolvers.

                      1. 6

                        The article mentions how /etc/resolv.conf comes from BSDlandia. Some of us still run a BSD derivative, even on our laptops. :)

                        So all I have to do is edit /etc/resolv.conf and be done with it.

                        1. 16

                          I feel like you completely missed the point of the article. The thing we wrote about was when software on the computer (like Tailscale) has alternate opinions on what the DNS config should be.

                          1. 3

                            Gotcha. Well, there’s only two applications I permit to modify /etc/resolv.conf: dhclient and IPv6 SLAAC. I don’t permit modification of /etc/resolv.conf from any other application, even OpenVPN. If I wanted to disallow modifications to resolv.conf by dhclient or SLAAC, it’s not much more than editing one or two more config files.

                            There was a time when I didn’t even allow those two from modifying resolv.conf. I used to run unbound directly on my laptop. But now that my home network blocks outbound DNS except through my actual DNS server, I don’t do that anymore.

                            Either way, it’s really not nearly as complicated as Tailscale’s image shows for Linux.

                            1. 10

                              I feel like there’s just a lot of useful stuff happening behind the scenes which those of us who come from too Unix-y a background can’t quite grasp at times.

                              Because, I mean, I’ve read a bunch of articles about what systemd-resolved does and why it’s useful and necessary and whatever and as far as I can recall those arguments made sense…

                              …but I admit the whole thing is utterly incomprehensible to me at this point, and that when I see # Generated by resolvconf and especially #This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8). I just know that what’s going to follow is four hours of trying to fix the network today, usually followed by another half hour or so after the next reboot, and I’ll eventually get it working but I have no idea how.

                              Oh, how the wheel has turned, I am now one of those lusers whom, twenty years ago, I was flaming because they couldn’t get their Linux boxes to speak PPP. Linux is user-friendly, it’s just picky about its users, indeed :-).

                              1. 4

                                The traditional UNIX use-case is that a machine sits in a rack or on or under a desk, and the network doesn’t change until something rare happens, on the order of months or years. Obviously you have two DNS caching/resolving/authoritative servers, both of them go in your /etc/resolv.conf, and all is well. If you need private domain name space, you just make sure your DNS servers take care of it.

                                The house network case is similar, except that maybe you don’t run your own DNS server.

                                But then you get into laptops. If you don’t use a VPN, then you expect dhclient to get you an address and a default route and a couple of DNS servers; this will overwrite your resolv.conf but you don’t care much.

                                Finally, the complex case: you have a laptop that’s moving between networks and also using one or more VPNs to either be at home (route everything via VPN) or be partially at home (VPN adds routes for internal IP space). This is the one where everything falls down, because what we really want has to be determined by the responsible human, not by automation.

                                In my ideal world, we have a resolv.conf.d/* structure with a couple of new features: include and scope.

                                /etc/resolv.conf: include /etc/resolv.conf.d/*.on

                                and then in /etc/resolv.conf.d, files with each of those names that specify nameservers and search domains and a new feature, scope:

                                dhcp.on scope . nameserver …

                                dhcp6.on scope . nameserver…

                                slaac.off scope . nameserver…

                                wireguard-house.on scope .local nameserver …

                                tailscale-access.off scope .company.tld nameserver …

                                tailscale-override.on scope override nameserver …

                                If we did this, then:

                                • the base case remains the same

                                • various daemons and services would control their own subentries in /etc/resolv.conf.d, presumably with conflict limited to misconfiguration. Only files named *.on are active; a person or daemon who wants the config to stick around but not be used can change the name to *.off (or anything else).

                                • scope means that it’s much easier to add-in DNS for private systems, including multiple private systems. Scopes are consulted from most specific match to least (.) except for override, which demands to be the only stanza in use. There is no sensible situation in which multiple override stanzas make sense, so most recent timestamp wins and an error gets sent to the log. Multiple ‘scope .’ stanzas do make sense, so they get consulted in most-recent timestamp order, failing to the next as necessary.

                                This also makes it convenient to bootstrap from a generic server to DNS-over-HTTPS or DNS-over-TLS or other schemes that might arise in future.

                                If you spot obvious flaws other than “this requires getting changes into glibc”, please let me know.

                                1. 1

                                  Oh, I understand the problems, what I admit to is not being able to understand is how I’m supposed to employ NetworkManager, systemd-resolved, resolvconf and all else in order to fix them. Honestly, I can barely get these things to work together in the first place – I regularly end up troubleshooting systemd-resolved on systems where all it has to do is set up a static DNS address on a machine that never moves anywhere. Getting it to handle anything like what you describe is beyond my abilities, I’m just a lowly programmer who definitely can’t administer a Linux box these days :-).

                                  The gist of it, IMHO, is this one, which you mention before introducing a solution that, to my lowly programmer’s eye, seems very sensible:

                                  This is the one where everything falls down, because what we really want has to be determined by the responsible human, not by automation.

                                  Linux has a long, tortuous history of adopting automated solutions that make things worse than back when you had to do things by hand.

                          2. 2

                            most people don’t want to manage their dns servers by hand, that’s what DHCP is for

                            1. -1

                              Driving Home Cat’s Podiatrist?

                              1. 7

                                Nope, it stands for Device Hopefully Can Ping!

                      2. 4

                        if you have a better term in mind we are more than happy to take suggestions, but for the sake of this article we’re going to call it “split DNS”

                        I’ve always heard it called “Split-Horizon DNS”.

                        1. 3

                          I have noticed that zerotier doesn’t even bother with DNS on linux.

                          One interesting idea that hadn’t crossed my mind before was to just use public DNS. I saw this in the tailscale docs and while it feels wrong, I don’t think it is wrong.

                          Sadly, some routers (or even perhaps ISPs) block this, so I have to run a private server also. Private takes priority, then linux users and servers can use public DNS. If that fails then they have to manually configure the private server - this hasn’t happened yet. The drawback is that you have to update two places, but I have a script for that…

                          1. 4

                            Some internal Tailscale services use public DNS pointing to their Tailscale IP addresses (such as the grafana instance). There’s no shame in doing it with public DNS in ways that only work behind your private network. It may not be the thing you want for all circumstances, however it does work at the risk of potentially making it easier for attackers to do service enumeration via DNS.

                          2. 1

                            Once I somehow got systemd and dnsmasq to fight over who owns resolv.conf (on Ubuntu 18.04). First one would overwrite it, and then the other would notice and overwrite it again.

                            1. 1

                              I appreciate systemd-resolved’s function and purpose but it is absolutely impossible to debug. I say this as the flowchart for me is missing netplan. Netplan itself is also nearly silent and invisible, like a random chrooted file doing something unexpected.

                              Throwing Docker which confusingly caches resolv.conf… forever? In the mix is still pure pain.

                              At home I use it happily, professionally I turn everything off and try to rely only on resolv.conf.

                              1. 1

                                This covers the laptop scenarios nicely. For Tailscale between laptops and servers, then there’s also netplan, and then stuff like Kubernetes installations which look at the IPs in /etc/resolv.conf and if they don’t like them, generate a new /tmp/resolv.conf pointing to external services and set that as the default for pod creation.

                                I love that systemd-resolved has a fairly sane conceptual approach to managing roaming and VPNs. I loath that it repeatedly breaks DNSSEC whether roaming or not and I end up having to manually edit a config file when I leave home or get back, so as to have the least amount of breakage. And generally a lot of other things systemd-resolved does in DNS protocol land, rather than administration land, end up causing frustration.

                                Which is why on servers which don’t roam, I nuke it with prejudice and use Unbound. For laptops, there’s unbound-anchor which tries to manage the admin side of roaming, but in practice on Ubuntu tends to be an unreliable core-dumping mess which makes systemd-resolved look good.

                                1. 1

                                  Am I the only one who hears Arlo Guthrie’s voice in my head singing about “circles and arrows” every time I come to another DNS diagram while reading this article?