My take on it is that it’s the main thing you should know if you want to understand “why things go wrong in distributed systems”. Queuing theory gets complex fast, but Little’s Law is fairly intuitive. It answers questions like, “why do tasks take exponentially longer to process at 100% CPU utilization?”
Any insight into why queueing in a service like this is better than queuing in a socket connection backlog queue? Is it just because read timeouts tend to be longer than connection timeouts (if this is even true)?
The Amazon Builder’s library has a good article on load shedding. If you wait for timeouts, you’ll get into a situation where you’re spending 100% of your resources processing requests where the client has already given up and sent a retry; this is the cause of a lot of self-inflicted DDoS attacks. Instead of completing old requests, you should proactively shed the load (e.g. via HTTP 429) as it comes in. If you shed load properly, you’ll return to metastability sooner.
When you start implementing a load shedding service, the first challenge is knowing your own health — when do you start shedding load? when do you return to normal? This service uses Little’s Law to dynamically tune the shedding algorithm.
Woah, thanks for the details. So if I understand correctly, instead of just waiting for the connection to die with a fixed timeout (which usually takes a while) it actively says something along the lines of “With my current service latency I can handle up to X clients in the queue per time T” and if client X+1 tries to enter the queue it pre-emptively says “sorry, no” and the client can fail early instead of sitting at the end of the queue twiddling its thumbs saying “when is it going to get to me?”. And this setup proactively measures what X should be for a given T and adjusts the queue size respectively.
Thanks for sharing that Amazon guide. It makes a good point about deep SOA stacks leading to cascading retries. Even if an application can do post accept() dynamic load shedding, doing it in aggregate at the front door gives you better averages and avoids some cascades.
I’m not making any assumption about the distribution of arrival times. Everything I said holds true if all customers arrive simultaneously, or uniformly spaced throughout the day, or anything else. I said “if n customers arrive every 24 hour window”, but the same things hold true if we’re just talking about a single 24 hour window. Modulo the customers at the end whose wait time falls outside the 24 hour window. Is all the complexity just dealing with that fussy detail?
On average, N arrive per 24h; that’s not necessarily true of any specific period. Eg, a distribution where a years traffic arrives on one day, and none the rest of the year.
The things I said hold true for the 24hour period at the beginning of the year, and the next 24hour period, and also for the whole year. Modulo the customers at the end whose wait time falls outside the window.
Skimming the paper, the complexities arise from these end periods together with the fact that the theorem isn’t about any fixed amount of time, it’s about the limit as time goes to infinity. The theorem holds even in some (but not all) cases where the customer density increases without bound.
https://pubsonline.informs.org/doi/pdf/10.1287/opre.22.2.417
The paper actually addresses my question :-P. It says “A common heuristic argument for L=lambda W involves observing that, apart from certain end effects, the total waiting incurred by all customers until time T can be calculated either by integrating L(t) over [0, T] or adding the wait times of all customers that arrived in [0, T].” (I was the second one.)
Ooooh, shiny graphs. Never heard of Little’s Law before but this is a pretty cool demonstration of how it works.
My take on it is that it’s the main thing you should know if you want to understand “why things go wrong in distributed systems”. Queuing theory gets complex fast, but Little’s Law is fairly intuitive. It answers questions like, “why do tasks take exponentially longer to process at 100% CPU utilization?”
Any insight into why queueing in a service like this is better than queuing in a socket connection backlog queue? Is it just because read timeouts tend to be longer than connection timeouts (if this is even true)?
The Amazon Builder’s library has a good article on load shedding. If you wait for timeouts, you’ll get into a situation where you’re spending 100% of your resources processing requests where the client has already given up and sent a retry; this is the cause of a lot of self-inflicted DDoS attacks. Instead of completing old requests, you should proactively shed the load (e.g. via HTTP 429) as it comes in. If you shed load properly, you’ll return to metastability sooner.
When you start implementing a load shedding service, the first challenge is knowing your own health — when do you start shedding load? when do you return to normal? This service uses Little’s Law to dynamically tune the shedding algorithm.
Woah, thanks for the details. So if I understand correctly, instead of just waiting for the connection to die with a fixed timeout (which usually takes a while) it actively says something along the lines of “With my current service latency I can handle up to X clients in the queue per time T” and if client X+1 tries to enter the queue it pre-emptively says “sorry, no” and the client can fail early instead of sitting at the end of the queue twiddling its thumbs saying “when is it going to get to me?”. And this setup proactively measures what X should be for a given T and adjusts the queue size respectively.
That’s how I’m interpreting their readme
Thanks for sharing that Amazon guide. It makes a good point about deep SOA stacks leading to cascading retries. Even if an application can do post accept() dynamic load shedding, doing it in aggregate at the front door gives you better averages and avoids some cascades.
I’m confused why there were multiple papers on proving Little’s Law. It seems elementary; anyone know what I’m missing or defining differently?
If n customers arrive every 24 hour window, and the average amount of time it takes to serve a customer is t, then:
which obeys Little’s Law, W*lambda=L.
What you’re missing is it holds true regardless of what class of distribution is generating the arrival times.
I’m not making any assumption about the distribution of arrival times. Everything I said holds true if all customers arrive simultaneously, or uniformly spaced throughout the day, or anything else. I said “if n customers arrive every 24 hour window”, but the same things hold true if we’re just talking about a single 24 hour window. Modulo the customers at the end whose wait time falls outside the 24 hour window. Is all the complexity just dealing with that fussy detail?
On average, N arrive per 24h; that’s not necessarily true of any specific period. Eg, a distribution where a years traffic arrives on one day, and none the rest of the year.
The things I said hold true for the 24hour period at the beginning of the year, and the next 24hour period, and also for the whole year. Modulo the customers at the end whose wait time falls outside the window.
Skimming the paper, the complexities arise from these end periods together with the fact that the theorem isn’t about any fixed amount of time, it’s about the limit as time goes to infinity. The theorem holds even in some (but not all) cases where the customer density increases without bound. https://pubsonline.informs.org/doi/pdf/10.1287/opre.22.2.417
The paper actually addresses my question :-P. It says “A common heuristic argument for L=lambda W involves observing that, apart from certain end effects, the total waiting incurred by all customers until time T can be calculated either by integrating L(t) over [0, T] or adding the wait times of all customers that arrived in [0, T].” (I was the second one.)