This is another fantastic article about distributed systems testing by @aphyr. I have been really glad to see that he’s been breaking down how to actually use Jepsen–perhaps we’ll start seeing more and more people applying Jepsen to their projects.
With respect to etcd in particular, this has finally pushed me over the edge to the point where I’d consider building a system with etcd, rather than only considering ZK.
I would actually hold off if safety is critical. Given how ZK, Doozer, Chubby, etc went, it’ll be another five years or so before they iron out all the kinks, haha.
Yeah I was gonna say, this seemed to me to argue that sticking with Zookeeper for now is probably the right choice.
One relevant question for @aphyr: IIRC your original ZK article was with an old version of Jepsen without the linearizability checker, have you tested ZK with knossos at all?
Not yet, no. Each post is between 50-100 hours of work, and this is all nights+weekends, so it takes a while.
Yeah I understand, just curious. Thank you for doing these, we all appreciate it :)
Is there any way I can tip you? A paypal account maybe? I have learnt a lot about distributed systems from your blog. I want to show my appreciation by sending some funds.
As someone who has been looking into implementing Raft and something like etcd/ZooKeeper for fun this post was great.
I’m not surprised the reads were a source of trouble. The Raft paper explicitly says to do a heartbeat on a read and I think pretty much anyone first things “but no, that’s too expensive!” and convinces themselves that they can just use the regular heartbeat interval. IMO, it’s dangerous that etcd does stale reads by default. Defaults should always be the most correct, otherwise you get situations like MongoDB’s API.