1. 2
    1. 4

      If you look at their code, this isn’t “conventional” CH. Instead of assigning each pod a slice of the keyspace by hashing its name and having that hash be the edge of a bucket, it divvies up the keyspace evenly among pods by sorting all of the names and then pairing them up with bucket boundaries at [0, 1/n, 2/n, … (n-1)/n] where n is the number of pods.

      This avoids one problem with conventional CH, which is uneven loads — because the bucket boundaries are assigned effectively randomly in conventional CH, some buckets cover more of the keyspace than others, giving their pods a greater share of the clients. But the tradeoff is that it reduces assignment stability: whenever a pod is added or removed, n changes, meaning that every bucket boundary moves, and clients can be displaced from one unaffected pod to another unaffected pod. In fact, roughly one in three clients will be moved between pods every time one pod is added or removed using this scheme (in the limit of large n). Conventional CH only moves 1/n clients on average, and only moves them to or from the added or deleted pod.

      The most common way of dealing with the uneven loads problem in CH is commonly called “virtual bins”: instead of giving each pod one slice of the keyspace based on a hash of its name, we give it many slices of the keyspace based on a family of hashes of its name (there are different ways to accomplish this, but what’s important is that the various hashes for one pod should be as uncorrelated as the hashes for different pod). This keeps the stability properties of conventional CH (it still only moves clients to or from an added/removed pod, while leaving all others alone), but since each pod gets many independently-chosen slices of the keyspace, the total fraction it gets will have a lower variance, meaning that clients balance more evenly among the pods. And since those different slices are likely to have several different predecessors in the hashring, a removed pod distributes its load somewhat evenly across many different pods (likewise, an added one takes a portion of load somewhat evenly from many different pods) instead of having a single “victim”.

      This is much better, but it’s still not perfect. CH with enough virtual bins distributes the load about as well as simply assigning each client to a random pod (with replacement). When # clients / # pods is large-ish that gives a fairly even distribution, but when it’s smaller it can be uneven enough to cause problems. If your #1 concern is ensuring evenness of load, there’s an algorithm called Consistent Hashing with Bounded Loads that came out of Google in 2016. They developed it for a very similar use-case to Knock’s: distributing Pub/Sub shards across servers. It starts off like regular CH but it uses a simple “forwarding” scheme to ensure that no server ever receives a load more than (1+ε) times the average, where ε is under user control. So if you had 10 pods and 100 clients (giving an average load of 10), and chose ε=0.5, the algorithm would guarantee that no more than 15 clients ever land on one pod at a time. This dovetails nicely with autoscalers: if you tell your autoscaler to keep the average server at 60% of capacity, and use ε=0.5, then the worst-case server will be at 90% of capacity.

      Naturally, there’s a cost in terms of assignment stability: a client’s placement depends not just on its hash and the hashes of the available servers, but also on the assignments of all other clients. At high ε it’s neglibile, as you decrease ε it becomes more substantial, but it’s always more stable than what Knock is suggesting here (while also providing a bound on uniformity that isn’t dependent on never getting a hundred clients that all have nearby hashes).

      1. 1

        The 1/3 comes from the fact that if you take the line segment [0,1] and break it into two segments at a random point, the length of the shorter segment averages 1/3 and the length of the longer segment averages 2/3.

        The math turned out to be a bit too much to express in this format, but suppose you remove one pod in Knock’s scheme: each pod to the left of it has both of its boundaries shift right a bit to take up the slack, and each pod to the right of it has both of its boundaries shift left. The distance each boundary shifts is proportional to its distance from the edge it’s moving away from, and the sum of the distances that all of the boundaries move is the total fraction of the keyspace that gets displaced.

        That sum works out to be 1/3 + O(1/n), but also, incidentally, clients that hash somewhere towards the middle of the keyspace will get shuffled to a different pod much more often than ones that hash towards either end.

        1. 1

          OP Here - Thank you for the feedback. The algorithm we were going for here was mostly aimed at simplicity for maintenance and solving the immediate problem, which was shard balancing. The number of pods in play at any given time is relatively fixed - we seldom scale in or out at this point. So the tradeoffs were essentially ease of computation & use in exchange for a naive algorithm that was still very effective for our needs. We can make the algorithm more sophisticated as the tradeoffs involved shift over time (e.g. if we introduce autoscaling so pods come and go unexpectedly), however I’d frankly be more interested in migrating off of Kinesis instead of developing more robust CH for our situation.

          Even so, the richness of the literature around this is great to consider for other contexts!

          1. 2

            Yeah, I started off just meaning to write something quick about virtual bins and CHBL. Since the whole point of the post was “consistent hashing => better balancing” I figured it was worthwhile to point out those improvements that give better balancing (virtual bins, in particular, were in the KLL+ paper in 1997 under the name “replicated buckets”, and implementations have been widely available since at least Ketama in 2003).

            But then while I was checking my facts on that, I took a second look and realized: oh, they’re not doing KLL+ CH sans virtual bins, they’re doing something interestingly different. And then I had to figure out what the consequences of that unusual choice actually were and write them down. I think it’s clever, it’s undeniably simple, and it solves the uniformity problem in a way that I’ve never seen before. So don’t get me wrong, I’m not trashing it. It does throw away one of the things that people usually expect from consistent hashing (the expectation of no more moves than necessary when a bin comes and goes), but yes, it did cross my mind that in your deployment scenario that doesn’t actually cause much trouble at all.

    🇬🇧 The UK geoblock is lifted, hopefully permanently.