1. 7
  1.  

  2. 3

    There is a lot of interesting research in this paper… One thing I noted is that “unavailable node” failures only cause 24% of catastrophic failures. I wonder if we can build automated tools similar to @aphyr’s jepsen for the other 76%.

    edit err, I’m an idiot, given they describe an automated tool for doing this… I need to look at it.