A few days ago we were able to open source a bunch of the pieces of the mesosphere DCOS, and I think this one may be particularly interesting to people here. My teammate Sargun wrote this because we needed fast failure detection and super HA configuration dissemination for minuteman. The foundation is a modified implementation of the [Hyparview](asc.di.fct.unl.pt/~jleitao/pdf/dsn07-leitao.pdf) membership protocol. Each node gets a full copy of all of the data being stored, so exercise discipline when picking what truly qualifies as dynamic HA globally valuable state. This is still a work in progress, but it’s being used today to make our distributed load balancer extremely robust against network partitions and node failure. We plan on making it much more generic and easy to plug into other systems.