Fascinating read, I always love to see conclusions about performance based on statistics rather than hazy mythology.
I found this especially interesting:
One final cool fact I want to leave you with is that if you have a server with α≤1, it has infinite average latency. You can give the service finite average latency by replicating it enough times to make α>1. This means it’s possible to build a service with finite average response time out of servers with infinite average response time!
Jeff Dean et al. have a good paper (and talk) called “The Tail at Scale” about using this as well as other tricks.
If you’re into modeling, queue theory may be a good next step for understanding how to model chains of requests together.