1. 14
  1. 7

    One of the most useful observations I’ve heard regarding parallelism is that

    All computers wait at the same speed.

    In other words, if you are introducing parallelism to get more work done per time unit, look really long and hard at all sources of blocking in your code. That could be sleep calls in a polling loop, it could be monitors or other synchronisation constructs like locks (!), or dear Lord barriers.

    For each source of blocking, ask yourself, “Am I blocking here because I’m truly out of useful things to do and this wait is a wait for more work to be produced? Or am I actually waiting for something here despite there being more work that could be done?”

    Very frequently, in my experience, it’s the latter.

    And in hindsight, it looks ridiculous, of course. Here is a part of the code that literally will not go faster, no matter what hardware you put it on. It has been designed with a hard limit on how fast it is allowed to run.

    Just by doing this and scoring some cheap wins in restructuring the parallelism around this principle, I have saved so much money on expensive hardware that was never needed and barely would have helped anyway.

    1. 3

      I find thinking about concurrency as a code design problem a better way to reason about it than as a execution model. It’s about designing your application so that the individual units are agnostic of execution order. They could execute sequentially as defined by the source or reordered without effecting the overall outcome of the application.

      1. 1

        That’s definitely a good way to design an application. However, I am arguing in the blog post that you also need to consider the execution model if performance is critical for your application. And for many applications out there, performance is important because of latency constraints and volume of data. Today, to maximize performance, you are pretty much forced to exploit parallelism because single-threaded performance is stagnating. To exploit parallelism, you need to consider your specific workload and find a programming model and an application architecture that suits it best.

        1. 2

          Oh, yeah I wasn’t arguing against that all. Just stating a difference between how I view parallelism and concurrency.They both have their benefits and with their powers combined you can have a well tuned application, barring any hard-to-debug errors.

      2. 2

        Pretty nice introductory article, although I didn’t grasp the idea behind Gustafson-Barsis’ law at first. It might be me, though, since I found the wikipedia article on Gustafson’s law also a bit hard to follow.

        1. 2

          Reading it now, I didn’t do a very good job of explaining it either.

          One of the key observations from Gustafson’s paper is:

          One does not take a fixed-size problem and run it on various numbers of processors except when doing academic research; in practice, the problem size scales with the number of processors.

          I updated the blog post to hopefully reflect that difference better.

        2. 2

          Is Little’s law a useful model for the systems you deal with? The systems I work on tend to have power law and/or multi-modal latencies and in that context, knowing the mean latency is surprisingly uninformative.

          1. 3

            Mean latency is just the inverse of throughput and indeed uninformative from a system latency perspective (which is dominated by the tail latency). The reason I mention Little’s Law in the blog post is that it establishes the relationship between latency, throughput (or bandwidth), and concurrency/parallelism. A key point is an observation by Gustafsson there’s some lower bound on latency dictated by physical constraints, which means that at some point more throughput requires more parallelism (https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-09766-4_79).

            1. 1

              Good reply. Thanks!