Looking forward to his more in-depth response as his current response is limited to statements like “I think …” rather than any kind of proof.
He also is claiming that while there might be a theoretical problem, in practice clock skew is not a problem beyond around 10%:
and that the fact different processes can count relative time, without any absolute clock, with a bound error (like, 10%) is absolutely credible
What a I find problematic about this response (and we’ll see what he says in his blog post) is this seems to just be a blank assertion in the face of numerous points of evidence to the contrary. On top of that, it leaves out the issue of Risk Assesment. That is, we know that it is possible for clocks to skew more than 10% for a variety of reasons, but it’s unlikely. That means the real question is how bad is it when they do. If it’s catastrophic, then using this algorithm is a bad idea. If it’s just an annoyance, then using it might be acceptable. But one has to take into account the cost of the consequences of the algorithm failing. But as the blog post states, if the lock algorithm not guranteeing mutual exclusivity is ok, then just 2 redis' that one fails over to is probably a simple solution with as good guarantees.
What lease expiry period is this article talking about?
I mean - of course the code is broken if locks don’t mean that you .. keep a lock.
The crucial piece of code in this article is unfamiliar to me, while the end message of the article is correct. By distributing cookies/lamport clocks/sequence numbers you can avoid problems to the shared resource.
A distributed lock manager will only allow a client to hold a lock for a specified period of time, because it cannot distinguish a client which is slow from a client which has crashed. Since clients do crash, the locked resource will be made available to other clients after the expiry period. Since there is some chance the client was just slow instead of crashed, you have to check for an expired lock when the client attempts to use the locked resource.
The underlying storage not providing a conditional write/transaction cookie just seems like an outright bug to me, this article is really shining a light on redis' implementation for me. Thanks for your comments.
@antirez response on the HN thread can be found here:
https://news.ycombinator.com/item?id=11061062
Looking forward to his more in-depth response as his current response is limited to statements like “I think …” rather than any kind of proof.
He also is claiming that while there might be a theoretical problem, in practice clock skew is not a problem beyond around 10%:
What a I find problematic about this response (and we’ll see what he says in his blog post) is this seems to just be a blank assertion in the face of numerous points of evidence to the contrary. On top of that, it leaves out the issue of Risk Assesment. That is, we know that it is possible for clocks to skew more than 10% for a variety of reasons, but it’s unlikely. That means the real question is how bad is it when they do. If it’s catastrophic, then using this algorithm is a bad idea. If it’s just an annoyance, then using it might be acceptable. But one has to take into account the cost of the consequences of the algorithm failing. But as the blog post states, if the lock algorithm not guranteeing mutual exclusivity is ok, then just 2 redis' that one fails over to is probably a simple solution with as good guarantees.
What lease expiry period is this article talking about?
I mean - of course the code is broken if locks don’t mean that you .. keep a lock.
The crucial piece of code in this article is unfamiliar to me, while the end message of the article is correct. By distributing cookies/lamport clocks/sequence numbers you can avoid problems to the shared resource.
A distributed lock manager will only allow a client to hold a lock for a specified period of time, because it cannot distinguish a client which is slow from a client which has crashed. Since clients do crash, the locked resource will be made available to other clients after the expiry period. Since there is some chance the client was just slow instead of crashed, you have to check for an expired lock when the client attempts to use the locked resource.
The underlying storage not providing a conditional write/transaction cookie just seems like an outright bug to me, this article is really shining a light on redis' implementation for me. Thanks for your comments.
@antirez’s rebuttal:
https://lobste.rs/s/lko2eh/is_redlock_safe