1. 8
  1.  

  2. 1

    I think all the points raised in this article are reasonable. There are two big problems with it, however:

    1. The only bottleneck you can observe in production is the one you’re already hitting. In my line of work that is often much too late.
    2. Having identified a putative bottleneck, you lack a good way to confirm your hypothesis. You can fix the problem, which will be wasted effort if you were wrong, or try to make the problem worse, which will make your customers angry if you were right.

    I would suggest that the best way to find performance bottlenecks is to build a scale model of production using exactly the same specifications for everything, and then test it.

    1. 2

      I imagine specific problem domains do end up with different best approaches, yeah, so I’d love to hear what your specific line of work is; for context my main focus on the site is batch jobs / data processing pipelines.

      If things you care about are e.g. “we hit 2x the visitors unexpectedly and now there’s a cascading backlog in the all the website’s backend services” then yeah you want to prevent that, not diagnose it.

      1. 1

        I would suggest that the best way to find performance bottlenecks is to build a scale model of production using exactly the same specifications for everything, and then test it.

        I believe we have established, pretty authoritatively, that it is not possible to effectively model “production” of any nontrivial application in a staging or test context. The real-world interactions of the uncountably many components that formulate your prod system are simply too many to make the cost of modeling those things less than the value that those models deliver. This is the root of the “test in prod” line of thinking that underlies the current, highly effective!, model of observability.