Timeouts are traditionally used to fix this problem and work well. With the advent of better clock sync, timeouts can be more precise, but the failure problem is one that was solved in the 1960s and is described in many textbooks.
That was an awesome read! I’ll add that fault-tolerant architectures like NonStop address both the coordinator failure and other failures. Might make some of this easy to just do a low-cost version of NonStop.
Where can I learn more about NonStop?
I have some link here you might find helpful.
This website is absolutely heinous on mobile lol
Is it? Works fine for me on Android+Firefox.
It’s a blogspot thing - it’s atrocious on iPhone and has been for a long time. It’s a Google property; they post a lot of official blogs there, and even those are unreadable by default. There’s no way they don’t know about it, so I can only assume it’s deliberately unfixed.