1. 6
    1. 2

      Footnote to the footnote: right, choosing without replacement helps bound your runtime. So why are we going back to the beginning on failure? Amend the iterative algorithm to:

      1. Choose k random bins.
      2. If the best of those k has available capacity, choose it.
      3. Otherwise, continue choosing random bins, one at a time, without replacement, until you find one that has available capacity (which automatically makes it the best one you’ve seen so far, so you return it).
      4. If step 3 exhausts the available bins, you can report a failure. Or wait a moment and go back to 1, hoping that the system state shifted in the meantime. Whatever makes sense.

      Yes, I suppose it increases the cost of maintaining a “without replacement”-tracking data structure, but if c is low it pays for itself by decreasing the number of iterations, and if c is high the expected number of iterations is already low which means you’re unlikely to have to track many items anyway. It seems like a reasonable tradeoff in many cases.

      1. 1

        I agree your version has nicer properties. You could save the O(N) storage by doing a deterministic shuffle with a random seed - then you’d only need to store the seed (which, I guess, would require O(logN) space).

        I think I was too oblique about this point in the post: my purpose in analyzing the iterative variant was to show the behavior of a stateless request-response system that does a single best-of-k per request. Then each iteration loop is a different request, and the number of tries relates to the error rate (false “out of capacity” errors) that each request would observe.

        1. 2

          Ah. Yeah, that makes sense but I didn’t get it from what you wrote. Maybe my fault, I wasn’t really willing to think about a “false out-of-capacity error” :)

      2. 2

        If you had a single dispatcher with global knowledge of the cluster, how about:

        • keep your random bins in a tree, ordered by capacity
        • store the size on each subtree

        Since they are sorted you can query only the subset of nodes with at least k units of capacity, with an O(log(n)) search

        Since they are labelled with their sizes you can do uniform random selection of nodes inside the tree if you want

        1. 2

          Yeah!

          What’s great about best-of-k is that it works super well (for larger ‘c’) when you have many dispatchers that don’t talk to each other, and only have stale knowledge of the busyness of the set of workers. That makes fault-tolerant distributed load balancing easy. No need for replication, no need for consensus protocols, no need for coordination of any kind.

          Once ‘c’ gets small you start to lose the ability to go down that simple path, then you either need to go to the “elected leader” pattern (and have to rediscover the worker state on failover) or build some kind of distribution system for the data that’s fresh enough. That’s definitely a possible road to go down, but adds a bunch of complexity. Once you’re there, as you point out, you have some nice data structures for finding empty slots (including your tree variant, a free list, a free heap, etc).

          1. 1

            Thanks. It does immediately make sense that “single dispatcher with global knowledge of the cluster” is a massive pain in the butt to arrange above a certain size.