The fascinating subtitle of the blog post: “Using less information to make better decisions.”

The summary here is that if you load-balance by picking two random candidate servers and sending your request to the less-loaded one, you do almost as well as if you send each request to the least-loaded of all servers, but that performance is much more robust against outdated load information, which can overwhelm previously-least-loaded servers under a thundering herd if everybody picks the same least-loaded server.

(Mitzenmacher, Richa, Sitaraman, 2001 is the paper the blog post is about, and it turns out to be a 60-page survey of fifteen years of research, rather than a report of a new result.)

This is a really interesting result, and I wonder if analogous results might hold in cases like these:

Minimizing hash lookup time: hash to two buckets, and insert into the bucket with the shorter chain? Or, for read-heavier workloads, insert into both buckets, and do lookups on the bucket with the shorter chain? (Reading the paper, it turns out that they investigated this; the answer is yes! Although they only considered the first of these two variations.)

Lowering the load factor on bloom filters: generate two sets of bit-indices, and set the bits on the one that has more bits already set? Or making bloom filters robust to bit errors: store under both sets of bit-indices, and return true if either of them has survived with all bits set?

Improving the worst-case response time by sending your request to both servers and taking whichever result comes back first? That’s maybe a bit obvious.

Keeping any disk from running out of capacity in a distributed storage system by storing on the less full of two randomly chosen (e.g. via a DHT) disks.

Longish-term investing in the stock market: randomly choose two large-capitalization stocks and buy the one that looks less terrible. (The objective here is to avoid things like buying when the P/E has suddenly shot up to 50 or when the company is manifestly about to go bankrupt, and secondarily to buy stocks that are likely to have higher returns. If only a few people are doing this, it depends on a much weaker form of the result, since even if all the people buy the same stock, its price will remain unchanged; but this choose-from-2-random-stocks approach is one that all the investors could use without screwing up the market collectively.)

Falling in love with the more attractive of two randomly chosen eligible people. (If everybody falls in love with the most attractive person, almost everybody will have their heart broken; even polyamorists have limits.)

Going to the more-interesting-sounding of two randomly chosen open houses advertised this Sunday, so that you’re less likely to end up at a house that will have five above-market offers placed on it that day.

A super simple multi-armed bandit algorithm: choose two random arms of the bandit and pull the one that has most frequently hit the jackpot for you in the past.

…?

A thing that a lot of these have in common is that on a population of underlying variable and mostly poor quality — e.g. lots of really terrible open houses, lots of potential lovers who would be just barely acceptable, lots of always-lose bandit arms — they won’t perform well unless there’s some kind of bias in the original random candidate selection toward better candidates. But if you have that kind of bias, with some arbitrary distribution over the candidates, then this best-of-N-random-choices thing is just a way of transforming one such arbitrary distribution into another one, no? Perhaps it may be a particularly computationally parsimonious way of doing so (maybe your open-house candidates are first selected according to their price distribution, and only then do you go to the trouble of reading the listings) but it doesn’t seem super clear.

You may find cuckoo hashing interesting, it’s similar to your first idea (but with the effect of increasing the practical load factor which on a lot of real hardware is a pretty good performance improvement). Not coincidentally Michael Mitzenmacher has as well as doing a stint as dean of the CS department been studying cuckoo hashing for a while:

But if you have that kind of bias, with some arbitrary distribution over the candidates, then this best-of-N-random-choices thing is just a way of transforming one such arbitrary distribution into another one, no?

The delay in the simulated model is the key to why this approach works so well for the load balancing problem. Generally, in this problem, its not prohibitively expensive to keep track of the state of all downstream services, but you don’t want to do that synchronously. In control theory terms, acting on stale data introduces a “phase shift” which turns the negative feedback that should tame the system into positive feedback that causes oscillation.

In other cases, as described by Mtizenmacher et al, it is parsimony of information or computation that drives this choice of algorithm.

Thank you very much! I hadn’t thought of it in terms of an op-amp oscillating due to phase shift, but you’re clearly right.

I guess the thing I’m wondering about is how to make this kind of approach work (either for computational parsimony or to damp oscillations) in an environment where most uniformly-random choices are going to be unacceptably bad. You could bias your two random choices toward likely-good choices, or maybe you could just make more than two uniformly random choices, say, ⌈Φ/P(a uniformly random choice is acceptable)⌉ choices, or something.

Another perhaps too-obvious example: using median-of-N to select a partitioning element for Quicksort, particularly if you choose the N candidates at random.

The fascinating subtitle of the blog post: “Using less information to make better decisions.”

The summary here is that if you load-balance by picking two random candidate servers and sending your request to the less-loaded one, you do almost as well as if you send each request to the least-loaded of all servers, but that performance is much more robust against outdated load information, which can overwhelm previously-least-loaded servers under a thundering herd if everybody picks the same least-loaded server.

(Mitzenmacher, Richa, Sitaraman, 2001 is the paper the blog post is about, and it turns out to be a 60-page survey of fifteen years of research, rather than a report of a new result.)

This is a really interesting result, and I wonder if analogous results might hold in cases like these:

bothservers and taking whichever result comes back first? That’s maybe a bit obvious.A thing that a lot of these have in common is that on a population of underlying variable and mostly poor quality — e.g. lots of really terrible open houses, lots of potential lovers who would be just barely acceptable, lots of always-lose bandit arms — they won’t perform well unless there’s some kind of bias in the original random candidate selection toward better candidates. But if you have that kind of bias, with some arbitrary distribution over the candidates, then this best-of-N-random-choices thing is just a way of transforming one such arbitrary distribution into another one, no? Perhaps it may be a particularly computationally parsimonious way of doing so (maybe your open-house candidates are first selected according to their price distribution, and only then do you go to the trouble of reading the listings) but it doesn’t seem super clear.

You may find cuckoo hashing interesting, it’s similar to your first idea (but with the effect of increasing the practical load factor which on a lot of real hardware is a pretty good performance improvement). Not coincidentally Michael Mitzenmacher has as well as doing a stint as dean of the CS department been studying cuckoo hashing for a while:

http://www.eecs.harvard.edu/~michaelm/postscripts/esa2009.pdf

The delay in the simulated model is the key to why this approach works so well for the load balancing problem. Generally, in this problem, its not prohibitively expensive to keep track of the state of all downstream services, but you don’t want to do that synchronously. In control theory terms, acting on stale data introduces a “phase shift” which turns the negative feedback that should tame the system into positive feedback that causes oscillation.

In other cases, as described by Mtizenmacher et al, it is parsimony of information or computation that drives this choice of algorithm.

Thank you very much! I hadn’t thought of it in terms of an op-amp oscillating due to phase shift, but you’re clearly right.

I guess the thing I’m wondering about is how to make this kind of approach work (either for computational parsimony or to damp oscillations) in an environment where most uniformly-random choices are going to be unacceptably bad. You could bias your two random choices toward likely-good choices, or maybe you could just make more than two uniformly random choices, say, ⌈Φ/P(a uniformly random choice is acceptable)⌉ choices, or something.

Another perhaps too-obvious example: using median-of-N to select a partitioning element for Quicksort, particularly if you choose the N candidates at random.

We have empirically verified this result, and are seriously considering making a P2C loadbalancer the default for finagle.