1. 44
  1. 14

    One good reason for having a low DNS TTL not mentioned in the article are DDNS setups.

    Residential internet connections are flaky and sometimes a router cycles through multiple dynamic IPs in a matter of minutes, without anything the customer can do.

    1. 6

      Yes, that’s true. We really need to get over this shitty dynamic IP for home users. IPv6 to the rescue.

      1. 20

        In practice stability of IPs has nothing to do with IPv4 vs IPv6. Some providers will give you the same IPv4 address for years, others will rotate your IPv6 prefix all the time.

        1. 3

          Yep, anecdote here: AT&T Fiber has given me the same IPv4 address for years, even across multiple of their “gateways” and a plan change.

          1. 2

            Anecdata: this is true in theory but I’m not sure in practice? Specifically, I used to get the same IPv4 address for weeks at a time - basically until my modem was rebooted. Then in 2014 ARIN entered phase 4 of their IPv4 exhaustion plan (triggered by them getting down to their last /8 block) and all of a sudden my modem’s IPv4 address refreshed far, far more often, IIRC every couple days.

            I guess maybe this was not technically required though, and was potentially just my ISP overreacting? 🤷

            CenturyLink in the Seattle area, FWIW.

            1. 1

              At least here in Germany will have a different IPv4 address every 24 hours or on reconnect for far most of residential Internet access. The point with IPv6 is that you not only have one public. But again in Germany it‘s even hard to find a not regularly changing IPv6 prefix for residential Interner access. They think it‘s more Staat protection friendly… like cash. Crazy thoughts.

              1. 5

                Ideally, a v6 provider would give you two subnets, one for inbound connections that remained stable, one for outbound connections that was changed frequently. Combined with the privacy extensions randomising the low 64 bits, this should make IP-based tracking difficult.

                1. 3

                  They think it‘s more Staat protection friendly…

                  The origin story of the 24h disconnect is that it used to be the differentiator between a leased line and a dial-up line, which belonged in different regulatory regimes (the most obvious aspect to customers has been cost but with a few backend differences, too). The approach has stuck since.

                  It’s also a (rather barebone) privacy measure against commercial entities, not the government: the latter can relatively easily obtain a mapping from IP to the address by giving a more-or-less refined reason.

                  1. 2

                    Commercial entities have “solved” the tracking issue by using cookies etc.

                  2. 1

                    They think it‘s more Staat protection friendly… like cash. Crazy thoughts.

                    I.e. customers prefer changing IP addresses for privacy reasons?

                    1. 1

                      Which is really weird, because implementing an “I’d like my IP to change / not change” checkbox would be trivial. I don’t get why that’s not more common.

                      1. 5

                        The checkbox isn’t the complicated part here.

                  3. 2

                    That’s not going to be an easy problem to solve. We’ve embraced the current pseudo standards within home internet connectivity, maybe static, maybe dynamic IP, asymmetric speeds, CGNAT, no IPv6, etc. for so long that many people think that these are real industry standards with cost structures and cost savings behind them and we must suffer with them if we want cost effective internet connectivity at all. A lot of home ISP customers suffer from Stockholm syndrome.

                  4. 2

                    I think that DynDNS.org is using a 60s timeout on A records so for a DynDNS/residential setup I think that the article author would approve of something like this:

                    foo.mydomain.com 14400 IN CNAME bar.my-dyndns-server.com

                    bar.my-dyndns-server.com 60 IN A 192.168.1.100

                    Specifically, I don’t think that the original author is complaining about the 60s TTL on records at my-dyndns-server.com since that company has to deal with the lack of caching in their DNS zones. He finds sub 1hr TTLs in the CNAME records to be a problem. And he finds CNAME TTLs shorter than the A record TTLs to be a bigger problem. Honestly even Amazon does this 60s TTL on dynamic resources trick.

                  5. 12

                    I went into this expecting to be called out for having a few records with a 15 minute TTL - fair enough, it’s on the low end, it’s not ideal, but it’s convenient.

                    But seriously, 2 minutes?! Really? I had no idea this was such a common practice, and fair enough that the author is complaining, it’s really annoying. Even when I was running a DDNS setup at home I rarely had issues with a 15 minute TTL - that means an average refresh time of 7.5 minutes, and given it typically coincided with a modem reboot it was never a big issue. You could reduce it to 5 minutes - giving you an average refresh time of 2.5 minutes - and still be in the top 50% of the internet according to this article. If you can’t afford an average 2.5 minutes of downtime every time you restart your modem, it’s, er, time to invest in a static IP allocation.

                    1. 3

                      I usually set mine to 30s before making real changes then wait for the old TTL to expire and then turn it up again.

                    2. 10

                      I’m not including ‘for failover’ in that list, as this has become less and less relevant. If the intent is to redirect users to a different network just to display a fail whale page when absolutely everything else is on fire, having more than one-minute delay is probably acceptable.

                      A place I worked at in the past did this, but it wasn’t for “fail whale”, it was so that we could serve people the fully-functional site over a completely different CDN if one of them was on fire. Because, well, one of our CDNs was on fire once, and we switched DNS, and someone told management “okay, we made the change, but it will take up to 60 minutes to get full recovery for everyone because of DNS caching”, and of course the response was “make it so that DNS caching isn’t a factor. We don’t want a single minute of unnecessary downtime. Downtime is lost money.”

                      1. 3

                        Exactly this. It’d be nice if the DNS clients would forcefully clear a cache when the IP would time-out or return a RST or even better, something that’s not a success for the protocol involved. This requires feedback from the application level, though, and that wouldn’t work with caching intermediates.

                        1. 2

                          It would be nice to build something like this into QUIC with server public keys and host names in DNS and a flag saying ‘cache this and retry the lookup if the server doesn’t respond or has a different key’. This would catch cases (like the DynDNS case above) where the IP is recycled and someone happens to be running something listening on the same port.

                          1. 2

                            Yeah, that’d be even better. But it’s the same issue - DNS was never designed with a feedback mechanism in place, and intermediate caching makes this effectively impossible.

                          2. 1

                            Multicast DNS (RFC-6762) does have a flag to indicate a record should be flushed from cache (section 10.2), but that’s multicast, not unicast, and can only be sent as part of a reply.

                            1. 1

                              Part of the reason for the lack of invalidation is that DNS was designed as a stateless protocol with single-packet requests and responses. I wonder if that’s still necessary now that it’s often done over TCP and even TLS. If you’re keeping a connection open then you might already be tracking enough state to send explicit invalidations.

                              I’m also curious whether it would be possible to build something like a bloom filter in DNS that would let you query whether a record in a coarse-grained set had been invalidated periodically, so DNS caches would be able to cache a long TTL record but explicitly invalidate it.

                          3. 1

                            Exactly. In my experience, the need for fail over is surprisingly common and usually the TTL is the minimum time required for fail over to function fully. Furthermore, if you only use the CDN for static asset caching then a long TTL can mean a split brain situation in which clients are getting old data from the CDN and new data from the live server.

                          4. 5

                            Stop designing blindly cached systems whose invalidation policies amount to wishes and dreams. Nobody wants them except the people running them.

                            1. 2

                              I’d be really interested in seeing the distribution of cache fills which got exactly the same response as last time.

                              1. 2

                                I don’t think DNS is the proper place to implement failover, making the argument moot. The correct approach is to use Failover-IPs/subnetworks, which can be booked at most (serious) hosters (like Hetzner, for example). This way, you have one IP address that can automatically point at another server (in another data center) when the main server fails.

                                Regarding DDNS, I have learnt to appreciate Wireguard as a stateless VPN solution. It doesn’t matter that my mobile computers have dynamic IPs when I can always address them in my virtual 10.0.0.0/24 subnet. If you want to host from a residential computer, I would set up a minimal VPS at a hoster which then locally forwards all traffic to it to the residential computer via Wireguard. It’s not like this reduces your autonomy, given you would have to do a lot more to become “independent”.

                                I currently have my TTL set at 600, but tend to let it “cool down” (comparable to Kelvin versioning) over time when a certain web project becomes more and more stable.