1. 8
  1. 2

    Creating a fully descriptive model of modern web app latency is tough. There’s a whole bunch of considerations to keep in mind, and some abstractions that most folks don’t really “pierce the veil” behind:

    • Absent any fan-out or caching, service response times can be modeled as exponential distributions (and can be decently approximated by half-normal distributions because of their similarity as distributions of the exponential family)
    • In practice multi-modal distributions are often observed due to fan-out and caching. At $WORK, we often see bimodal distributions for cached and non-cached response times.
    • Latency measurements themselves are time-varying. As the rate of incoming requests on the system increases, resource utilization on the service increases, which can cause latency to increase. Time-varying models are much more complicated than stationary models. Little’s Law offers us some visibility into what’s happening (because Little’s Law tells us that time averages in the system equal ensemble averages of runs) but only at the mean. When measuring tail latency, we don’t get much.
    • No model of a web app is complete without also looking at how deep the incoming request queues and modelling that in a large queuing model

    I’ve thought about writing a modeling framework for web apps based on my own observations, but the whole project does seem like a lot of work. At $WORK we’ve had good success measuring mean and 99th-percentile latencies (as means are always a safe summary statistic and are also meaningful for exponential distributions), but more of that is due to how we set our SLAs rather than an informed attitude on the experience we wish to offer users.

    1. 1

      I’ve had decent success with counting the number of requests that are over some threshold (SLI) in addition to using histograms.