Worth mentioning: this might work some of the time, but in an asynchronous (e.g. real) network, it could be unsafe. The described locking scheme does not actually ensure no two processes hold the lock at the same time. Even if it did, it would not ensure that side effects, like writing to block storage, would be safe. Martin Kleppmann has a terrific overview of why “distributed locks” generally don’t do what people think, and what to do instead: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
Thanks for this feedback. Actually my design already takes this problem into account.
Kleppmann warns that a process can freeze for an arbitrary amount of time due to garbage collection, network problems, CPU starvation, etc. I mitigate this in my design through two ways:
It’s described in section “Long-running operations”.
Kleppmann proposes the use of fencing tokens, which indeed guarantees that such problems don’t occur, but it requires sufficient support by all systems. My design doesn’t provide as strong of a guarantee, but we can make it arbitrarily approach 100% safety by configuring the various timing settings (TTL, refresh interval).