1. 20

When setting my watch, anything within a minute is good enough. TOTP tokens are valid for a few minutes after Google Authenticator displays them. Meetings at work always start a minute or sometimes two after the hour. And when writing distributed software, systems other than cloud spanner (which rely on special hardware) must be built to tolerate clock differences of several minutes between systems.

So if we don’t ever rely on millisecond accurate time (without cesium and satellites being involved the worst case delta is still bad), why even attempt to achieve it? Why is NTP worth the complexity on most systems versus something like time protocol or course time?

  1.  

  2. 37

    Several points here. First NTP isn’t really that complex, compared to e.g. register allocation in modern compilers, sending packets via WLAN or mobile data, or even minor decision trees in modern software.

    Second, it doesn’t have to be needed by a majority, it just has to be needed by enough to make a suitable default. Some number of servers need synchronised time, or time that cannot be easily manipulated by an attacker. Rather like how you might be happy with rsh, but some number of servers store sensitive data etc and need ssh. So systems like debian have to ask: Is accurate time, or reliable time, or encryption, necessary for enough debian users to make it the default? That’s not a majority question, it’s a tradeoff. Functionality versus cost.

    Third, we rely on accurate time more often than you might think, for various meanings of “accurate”. Accurate can mean that the servers in a load balancing cluster agree, or that communicating software agrees, or that time is monotonous across reboots, or that a server agrees with the stratum-0 NTP servers far away. Right now I’m running something that uses 84 third-party libraries, how much will you bet that all of those 84 require zero of the four varieties of accuracy?

    NTP was developed for telecommunications. Telecoms requires accurate time, in the second and/or third senses. The second probably matters for you right now: Your portable device and the base station have to agree at the nanosecond level. They don’t need to agree with distant stratum-0 servers, but sharing airtime efficiently requires close agreement among the broadcasters and listeners in the local area. There’s no cesium involved, and the worst case is still in the nanoseconds, because the use case doesn’t involve stratum 0.

    I’ve seen that in another context, which might be an illustrative anecdote. It concerns a large server farm hosted at two cloud providers. Now and then a load balancer would create another backend instance and the instance would come up with a bad clock, and that eventually led to slow pageviews for a page that involved a specific kind of database write. It didn’t have to, the software could have been written to to tolerate time skew. But it wasn’t, and that wasn’t caught during in the test environment, because the test environment used fewer instances and so had little chance to get bad time. You say software “must be built to tolerate”. No. We had a choice, and the possibilities were to have the developers be on guard against another kind of bug, or expand the test environment such that this kind of problem would be caught by automated testing, or sync time in the production development.

    We did the latter, because it was both simplest and cheapest. We didn’t sync time with stratum 0, of course, we synced time within the environment. NTP provided excellent agreement, with no cesium, satellites or even standard hardware.

    1. 10

      I want to add something here.

      As developers, we generally experience monotonous time, so of course that’s what we develop for. If you write and test anything, there’s a good chance that your computer and your closest coworker’s computer are synchronised well enough that your code experiences monotonous time. Even higher if you run your code in a debugger, because that slows down the program.

      Both your brain and the code running in the debugger experience monotonous, synchronised time.

      Having the code run without monotonous, synchronised time in production is begging for trouble because of that. Introducing a difference between the development environment and the production environment is dangerous, and should not be done without good reason. “Because NTP is complicated” is IMO not even nearly good enough.

      1. 24

        As developers, we generally experience monotonous time

        Only on boring projects. Sorry, I know you meant “monotonic” but it was a lovely typo.

        1. 2

          Not a typo, I was slack and typed in too much hurry.

          This has made me so curious about why the Greek original ended up as three different loan words in English, and as one in Norwegian and German. Loan word grammar is full of interesting oddnesses. (Interesting to people like me, anyway. ;)

          1. 2

            Appears to be the same root word in English, but “tonic” indicates use in scientific context while “tonous” indicates use in figurative context. I guess you could say “there was a monotonic quality to the professors voice as he described monotonic time”. According to the online dictionary, the figurative use of monotonic in English is an import from France and it makes sense that French should have innovated on ways of describing boredom.

    2. 22

      It is because your core premise is incorrect. We do rely on synchronized time on the order of milliseconds (and less) for the modern communications systems to operate. Human-scale use case is mostly irrelevant.

      1. 7

        cloud spanner (which rely on special hardware)

        I had to google this, and it’s a Google product: https://cloud.google.com/spanner/. It’s better to capitalize a product, otherwise it might be confused with a generic concept.

        The special hardware is probably some sort of GPS receiver to sync time.

        So if we don’t ever rely on millisecond accurate time

        This is a wildly inaccurate premise, as @varjag has mentioned. You might not rely on millisecond time, but many other people, systems, and organizations do.

        Why is NTP worth the complexity on most systems versus something like time protocol or course time?

        (I believe you meant to write “coarse time” here).

        This is the heart of the question. NTP is used because the infrastructure existed when the Internet started to take off. It was a reliable system (as designed) with a widely available spec. As such it exhibited all the characteristics of “worse is better” and maybe tragedy of the commons. By the time people identified issues with security and scaleability the inertia of the entrenched infrastructure was enormous.

        Now, I imagine everyone from the maintainers of the NTP pools on down would be thrilled if every Tom, Dick or Sally didn’t ping a timeserver every few minutes just in case the internal clock has changed. But someone, somewhere has to take the lead in implementing an alternative that works transparently in the same way NTP does.

        1. 2

          It was some really, clever stuff that challenged many peoples’ assumptions. You might enjoy the Spanner (2012) and F1 RDBMS (2012) papers. Both PDF’s. Main competition are FoundationDB and CochroachDB with early versions of FoundationDB having rigorous verification. I don’t know if the FOSS’d versions are getting that kind of verification.

        2. 7

          Accurate timekeeping can also improve performance, e.g. log correlation window can be smaller.

          1. 5

            I know for sure when I was running gitlab on my desktop for development the 2FA was failing and I worked out it was because my desktops time was out of sync by a few seconds.

            1. 5

              As someone who started in the workforce before computers all used NTP by default (and before everyone carried mobile devices / smartwatches that always have correct time), I feel like meetings start much closer to “on time” than they did back then. On the occasion that I join my daily 9:45 AM video call at 9:46, there’s a very good chance they’ve already started without me.

              1. 4

                Agreed. I also remember trying to check two wall clocks before changing my watch–a single wall clock was no more likely to be correct than my watch!

                I believe that mobile devices can get time from the cellular network, GPS satellites, and NTP (once the IP network is up). My first cellphone (which only had the first of those three options) would think it was decades in the past if you turned it on without a SIM card.

                1. 4

                  It’s comforting in a weird way to know that it’ll always be the first moment of 1970, 1980, or last of 1969 (maybe others?) in a few small corners of the planet.

              2. 4

                The basic protocol is trivial: send message asking for time; adjust clock based on what you get back. do it again in a few seconds.

                It’s a lot more more complex when you need good accuracy, but one thing people often don’t get is that without some clock sync it is very easy to diverge by seconds, minutes, or hours.

                1. 4

                  Without synchronized clocks it certainly makes understanding the logs of firewalls and applications rather difficult when you’re trying to understand the steps that were taken in an attempted attack or a successful security intrusion.

                  1. 2

                    The fundamental database commit protocols all depend on synchronous time.

                    1. 2

                      With an inaccurate resolution or system used to synchronise time we’d see much higher differences between time on systems, which could easily lead to differences bigger than the few minutes other systems are willing to tolerate.

                      1. 2

                        Because errors accumulate. It’s okay for your first tier to be a minute off, but it’s not okay for your fifth tier to be five minutes off.

                        1. 2

                          If you’re looking for a less complicated system, I submit HTP:

                          http://rkeene.org/oss/htp/

                          http://rkeene.org/docs/oss/htp/htp.pdf

                          1. 1

                            This is a good question. I was even wondering around cryptographic purposes but like you’ve mentioned some cases where it’s not a big deal.

                            1. 1

                              Can anyone recommend a way to just set the time for embedded device (which does not have clock battery)? Error even of 5 minutes is acceptable. It should either have public servers or be easy to install on my server.

                              There are daytime (RFC 867), time (RFC 868), SNTP (RFC 4330). Or it’s possible to just use Date header from http responses. What to choose? Looks like SNTP is standard and simple way, and seems that it’s a subset of NTP so any NTP server can be queried as SNTP (not sure about it).

                              1. 6

                                There’s no point using SNTP unless you have to implement it yourself; and it is NTP in a way. The first two you mentioned are on the obscure side these days. NTP is something all administrators are familiar with.

                                If it’s a *nix system, there’s ntpdate doing just that, which can be installed standalone. For embedded Linux specifically, busybox has an ntp client applet.

                                1. 4

                                  Parsing the Date header out of an HTTP response has the following benefits:

                                  • It’s straightforward; the request and response are plaintext and the latter is easy to search for the needed header.
                                  • It’s accurate to within network latency and the skew of the server’s clock (usually fairly precise).
                                  • It punches through firewalls. (I’ve used it for precisely this reason. Even if your target is blocked, the firewall itself may return a response with a usable Date header.)
                                  • If your device is network-connected, it’s (depressingly) likely it’s already making HTTP requests, so you’ve got the infrastructure there.
                                2. 1

                                  In older releases of ntpd, you were warned to choose your NTP servers carefully. e.g see the manual from NeXTSTEP 3.3