I think that this article shows a very interesting application of a technique known as “derandomization”. Usually, derandomization is when you take a randomized algorithm and remove all sources of randomness. In this application, we see it only partially derandomized, to great effect: copysets improve reliability by reducing overall randomness, but still leverage randomness to approximate the solution to an NP-complete problem.
I wonder if other algorithms would benefit from partial derandomization?
I think the best way to describe the insights of copysets and chainsets is that they add constraints to achieve a better result. As I understand it, derandomization helps in the case where sources of randomness are sparse and the algorithm requires more randomness than is available.
Both techniques are limiting the amount of randomness used, but they’re doing it for different reasons. One is optimizing the outcome; the other is reducing resource usage.
Another model I thought of to conceptualize the insight in this paper is that they more the randomization in the algorithm out of the runtime and into the initialization. Thus, it’s a deterministic algorithm at runtime, and thus they’ve reduced the scope of when randomization affects the behavior.
I think that’s a wrong characterization. Copysets and chainsets are about picking replica sets in a way that provides protection against correlated failures. There’s no “runtime” or “initialization” time inherent to either algorithm.
In fact, the way we use chainsets in HyperDex hides the details from every component except the system coordinator. Clients and servers simply see replica sets and have no knowledge of how those sets were chosen. It’s entirely possible to swap random replication into the place of chainsets, and see zero difference at runtime.