One mechanism of sharding that I think is much simpler and easier to scale is range-based sharding. In this scenario, you’d have the shards:
customers-1-100
customers-101-200
customers-201-300
customers-301-infinity
Here, when you start you can simply have customers-1-infinity and as the database begins to reach say 50% capacity, cap it at say customers-1-100 and then make customers-101-infinity.
Once the customers-1-100 shard grows to, say, 90% capacity, you can further, and simply split it in to customers-1-50 and customers-51-100, using fairly simple replication topologies to do this with little-to-no down time.
This range-based mechanism means you don’t have to preemptively guess how many shards you want to hash by.
Another way to do this would be to simply provision customers-1-100 and no -infinity shard, and monitor the highest customer ID you have, and preemptively provision another shard when you get “close” to the customer ID cap of the existing shard.
One mechanism of sharding that I think is much simpler and easier to scale is range-based sharding. In this scenario, you’d have the shards:
Here, when you start you can simply have
customers-1-infinityand as the database begins to reach say 50% capacity, cap it at saycustomers-1-100and then makecustomers-101-infinity.Once the
customers-1-100shard grows to, say, 90% capacity, you can further, and simply split it in tocustomers-1-50andcustomers-51-100, using fairly simple replication topologies to do this with little-to-no down time.This range-based mechanism means you don’t have to preemptively guess how many shards you want to hash by.
Another way to do this would be to simply provision
customers-1-100and no-infinityshard, and monitor the highest customer ID you have, and preemptively provision another shard when you get “close” to the customer ID cap of the existing shard.