Maybe I’m just dense, but I have some questions/problems.
The point Gil makes is that the 99th percentile is what most of your web pages will see. It’s not “rare.”
Parts of this discussion are technically true, but not presented fairly imo. It’s true that individual resources of the page have varying latencies, so with enough resources-per-page you are bound to run into an “outlier”. Which ultimately skews the latency of the page upwards.
But it’s not really fair to characterize that page load latency will always fall in the 99th percentile. The resource latencies are part of the total page load latency, so your percentiles on total page load will fairly represent the distribution. Yes, the average load time will be higher than you think it should (because some resources are hitting high latency), but no, the distribution won’t always be in the 99th…which is mathematically impossible (otherwise it wouldn’t be the 99th!)
The average over the next 100 seconds is 50 seconds.
Does anyone know how this number was calculated? Wouldn’t all 100 requests block for the duration of the pause (for 100s each), meaning the average latency over the next 100s is 100s exactly? Where does 50 come from? This is exactly the point the author is trying to make in the following “load generator” scenario…but I simply can’t rationalize where the 50 comes from?
And in general, I feel like a lot of the complaints against percentiles can be solved with monitoring percentiles over time with sliding windows? For example, the case where a load generator only issues one request which pends (versus the 10,000 successful results previous), which pulls down all the metrics. If you are plotting the percentiles over time, the “good” results will eventually slide out of the window and you’ll see a giant bump in the metrics. You’d also see a big bump in variance immediately and huge drop in throughput.
So I guess I agree with the conclusion of the article (don’t just look at simple latency). But there are ways to effectively monitor, you just need a more holistic approach?
EDIT: The more I think about it, there are other problems too it seems. What about multiple threads hitting the service concurrently? Or clustering of latencies (e.g. latency is probably not i.i.d., some users will experience higher latency because of location or service provider), timeouts causing early returns and increasing back-pressure, etc.
Disclaimer: I haven’t read this article above but I have watched Gil’s (amazing) StrangeLoop talk.
the 99th percentile is what most of your web pages will see
Gil’s justification for stating that most of your users will see the 99th percentile latency is that (slightly handwaving) your application probably requires 100 requests in order to complete the task that they want to use your application for. Over the course of making 100 requests, you expect on average 1 of your requests to take at least the 99th percentile time. The user will be sitting around waiting for at least the 99th percentile time because they can’t complete their task until all the requests that they need to make for their task are completed.
The average over the next 100 seconds is 50 seconds
Gil’s justification for stating that the true request latency for the 100 seconds while the service is suspended is 50 seconds is (roughly, please excuse if I get the maths wrong) that if you were to try to make a request at a uniform random moment within this 100 second period, the expected time for your request to complete would be about 50 seconds.
Thanks, this helps clarify a bit!
Gil’s justification for stating that most of your users will see the 99th percentile latency […]
Gotcha, I see what he’s getting at. I agree with the sentiment: load time is always slowed because you always have one resource acting as an “outlier”, meaning the “sum of the parts” is always slower than you’d expect.
I guess I’m just quibbling with the semantics now, which feels massaged a bit. Everyone can’t actually be sitting at an page-load 99th percentile…there has to be a distribution there too. The distribution just happens to be shifted a lot higher (and probably skewed rather than normal) compared to the individual resource latency distributions.
Gil’s justification for stating that the true request latency for the 100 seconds
Ahhhhh ok, this makes sense. The article didn’t explain that at all, it just says “We can intuitively characterize this system […]” which was very confusing. Thanks!
I’ll go watch Gil’s talk instead, since this summary article rather confused me with missing details :)
Everyone can’t actually be sitting at an page-load 99th percentile
Maybe it’s a little overstated in order to try to get over people’s hesitance to look at the outliers? How many people get to feel the 99%lie depends on the application and how many of those requests are asynchronous (*).
(* asynchronous with regard to what the human user wants to do, of course, not just “did I use XMLHTTPRequest?”. If I’m staring at a loading GIF until a particular request completes then it isn’t asynchronous from my perspective. If it’s preloading an image that I haven’t scrolled the viewport to yet then it’s asynchronous from my perspective.)