1. 7
  1. 1

    Interesting results, though obviously only directly useful for your distributed transactional database workloads.

    Latency measurements are a little bit odd here, because you want good latency but not at the cost of correlated failures.

    1. 1

      Yeah, the correlated failure issue was a question I had. My experience with AWS was that asking for a bunch of VMs simultaneously would inevitably lead to correlated failures.

        1. 1

          Those are great for a few specific use cases. If I understand correctly, AWS only allows seven placement groups per zone for the types that are useful for availability. That’s far from nothing, and it’s great for things like a zonal etcd deployment. However, it’s not particularly useful when you have hundreds or thousands of VMs. There’s really no way for a public cloud to provide the kind of rack diversity you can get with your own data center.

          1. 1

            Almost correct: it’s seven instance per PG, but you can have multiple PGs (multiple groups of 7).

            You almost certainly never need size-7 groups unless you’re trying to implement your own disk controllers. size-5 gives you N+2 (1 planned maintenance, 1 unplanned maintenance, 3 operational). size-7 is only needed for N+3 (1 planned maintenance, 2 unplanned maintenance because rewriting an entire drive really does take that long, 4 operational).

            1. 1

              I think you mixed it up: as far as I read it, they only allow seven instances per placement group per AZ for the one type that’s useful for availability. It still can be useful when you have a ton of VMs, but at that point you need to start considering using the AZ’s as the failure domains (don’t put everything into a single AZ!)