1. 5
  1.  

  2. 2

    This seems like an interesting algorithm.

    There are a couple of things missing from this description:

    • Clearly it doesn’t allow updates on a minority side of a network partition, which is fine, but should be stated up front.
    • How do you decide who’s currently a voter and who’s not? If different processes disagree on how many voters they are, they might disagree about who won the election. But if the set of voters is static, then eventually, permanent node failures will leave you with less than a quorum of voters, and you won’t be able to update any more. Under some circumstances, “eventually” might mean “after several years”, and human intervention is a reasonable solution.
    • If you have three or more candidates at the same time, or two candidates if at least one node has failed, it’s possible that neither one will initially get a majority of votes. It says, “Every process will consider the election aborted after some time that is smaller than the retry time,” which allows you to recover from this; but what if some processes consider the election won at the same time that others consider it aborted?
    • At startup time, your frequency is -1. I guess eventually you have some timeout after which you decide that nobody has a valid frequency, so you start an election?

    A thing I’m not clear about: the distinction between currentEpoch and frequencyEpoch suggests that sometimes the currentEpoch gets incremented even when you are not choosing a new frequency. When does that happen? If it doesn’t happen, the algorithm can be simplified by eliminating the distinction, and the frequencyEpoch becomes simply a version number for the frequency variable. Maybe it’s when you see a proposed new frequency, but don’t accept it?