Threads for tylerholien

  1. 8

    Redlock assumes a semi synchronous system model where different processes can count time at more or less the same “speed”.

    This is flawed reasoning. See https://aphyr.com/posts/299-the-trouble-with-timestamps.

    1. 1

      You’re correct that it is a huge problem in distributed systems, but the reasoning itself is not flawed - just something that needs to be known and compensated for.

      Most of these sorts of systems that depend on distributed nodes having roughly the same ability to keep track of time handle the problem by using timeframes much longer than would be affected by small deviances in clock rates or synchronization. Using NTP configured properly, for example, it’s generally pretty safe to assume your clocks are in sync to about a couple milliseconds on a LAN, so if we just build into the protocol a safety factor much greater than any reasonable deviance you can accomplish good (but not perfect) safety. For example, acquire a lock for 30 seconds from timestamp A to B, after A wait 5 seconds, do your work to prepare to commit, but abort if B is in less than 5 seconds. Unless another system in the network is so far out of sync that it believes that that lock has already expired, the system should be safe.

      It does mean that your system is vulnerable to clock/network/config issues, though, so those need to be monitored. I once saw a system mistakenly configured to use some random external NTP source (which should’ve been blocked by the network config anyway, but was not) out of sync with the rest of the network by 10s of seconds. One of the nice things about NTP is that it’ll give you an estimate of how far out of sync it is - such nodes should remove themselves from service if they’re approaching safety bounds or their skew is unknown due to failures to sync.

      1. 6

        While your response might be true, Martin’s analysis covered it failry well by saying if mutual exclusion is important for the correctness of your system, Redlock is not a good choice. So saying things “should” be safe and hoping that your CPU doesn’t just hit a bug one day and start telling the time wrong is an insufficient guarantee. It still means that Redlock is broken (or at least that’s the claim) in the face of a known possible error mode, an error mode with known solutions.