1. 15
    1. 8

      The title is a bit of clickbait because while I guess the Raft paper does not describe a solution for this one liveness issue, Diego Ongaro’s PhD thesis did describe the solution. And this blog post talks about that solution. Still, I thought this was interesting.

      1. 7

        Well, this could easily have been “Stop using Raft” or “Does Raft guarantee liveness in the face of network faults?”. I read it and didn’t feel it’d been oversold.

        But more topically, are any quorum protocols provably free of metastable failures when the network is partially connected in arbitrarily annoying ways?

    2. 3

      Given that TLA+ supports detection of liveness issues, and Raft has been modeled in TLA+, was this an issue the modeling showed, or only discovered after?

      This covers shortcomings in the initial Raft paper, and how they’ve been solved, but it also relies on human reasoning rather than an automated proof system pointing out the issues (and when it’s been solved). How do we know no liveness issues remain?

      1. 4

        From the Raft paper (page 116):

        We leave specifying and proving cluster membership, log compaction, and client interaction to future work, along with liveness and availability properties of the basic Raft algorithm.

        So these basic liveness properties were never verified, just the basic safety properties. If you look at the official TLA+ spec there aren’t any safety or liveness properties specified.

        1. 1

          Thanks for the clarification! That is a bit surprising no liveness properties are specified in the official spec.

          1. 1

            I was real surprised too!

    3. 3

      Given the blog is titled “Decentralised Thoughts” I thought it was funny that they ended the piece with:

      “Please answer/discuss/comment/ask on Twitter.”