1. 18

  2. 9

    This is my favourite secret superpower. Secret because nobody else seems to care about it, superpower because after starting to apply it, I am more efficient than ever.

    But what is inspection? Inspection is when you take a produced widget (or printed circuit board, or car), measure its properties, and compare to some specification. On the basis of that check you decide whether to ship the product, or throw it away (or re-melt it), or fix it.

    There are two things about inspection that I think are important to understand, that this article omits:

    1. It’s not that inspection doesn’t work. It works, it is just extremely expensive to throw out a portion of your production, compared to making things right in the first place.
    2. This applies also to a more general, abstract concept of “inspection.”

    As a technical example of what the second point means: My team maintains an application that has fairly bad response time characteristics. I’ve multiple times heard the suggestion that we should have an alert that wakes someone up if the response time of the deployed application exceeds one of our soft requirements. Of course, if you take a page from Shewhart/Deming, you’d see the response time is in statistical control. It is terrible, and exceeds the soft requirements about 4% of the time, but it is still in statistical control. Adding alerting on the soft requirements just gets you an alert that is guaranteed to fire falsely about 4% of the time. Very bad for alarm fatigue.

    The key takeaway is that taking a Shewhart/Deming look at the system reveals that we know what the distribution of response times in production will be. If we want to improve that, we have to improve those latency characteristics of the code. We can’t “inspect quality into the product” by alerting on when the soft requirements are exceeded. Well, we can, but it has a great cost that can trivially be avoided.

    A more non-technical example which also happens to be a peeve of mine: When there is an increase in violent crime in a society, politicians often speak loadly and almost exclusively about how they will increase the size of the police force, the police will be better armed, and so on and so forth. If you ask the domain experts what to spend money on, they will talk about preventative work, enrolling youth in activities, and trying to steer humans away from a criminal career in the first place.

    There’s a conflict here. The politicians focus on what sounds like a strong, firm response to a terrible problem. The experts suggestion sounds like coddling and being nice to would-be criminals. How awful!

    But through the eyes of Shewhart/Deming, it is obvious that the experts are right. The politicians suggest “inspecting quality into the product” (i.e. removing the offenders from society after they have become offenders), whereas the experts suggest changing the system to produce fewer defects in the first place. The latter will be cheaper and have greater effects in the long run, but it doesn’t appear as serious.

    Oh, and that’s not even getting into how politicians and media always focus on how measurements have changed from one period to the next. It’s always “crime is up by 15% since the corresponding period last year!” or “our students reading abilities have dropped by 5% since last year.” Well… maybe that’s part of the natural variation of these measurements, and nothing has changed about the system at all. Maybe it’s a bad idea to go poking the system based on individual incidents, when you haven’t shown that there is a system change in the first place?

    This applies to software issues too. It’s not worth changing the system in response to every incident. I’m going to take another example from performance engineering because that’s what I have most readily in memory: a streaming processing application appears to be lagging behind more than three days, and there is a soft requirements on at most two days of lag. Why didn’t we have alerting on this? Someone, please set it up and investigate why it’s lagging behind now.

    Well, the student of Shewhart/Deming would do one more thing first: check how much it has lagged behind historically. Turns out three days of lag is common. The amount of lag is in statistical control. This is just what the performance of the application is. There’s no point in investigating what the root cause of this particular case of lag is. Desire less lag? Fix the underlying performance issues instead – or accept that it can lag up to 7 days with the current level of performance. Only if it lags more than 7 days is there cause to be concerned about an uncommon issue that warrants investigation.

    Sorry, I have way too much to rant about this. I should really get around to writing my article series.

    As far as I understand it, there isn’t a hard theorem why they should sit at +/-3σ, but there is both empirical evidence for it and statistical arguments that suggest values in that ballpark.

    They control limits are not even guaranteed to sit at +/-3σ! Statistical process control inverts the traditional statistical method for extracting signal out of variation. Instead of picking a percentage of measurements (or, equivalently, a number of sigma based on a probabilistic model) and seeing what limits that yields, in statistical process control, we (intelligently) pick the actual limits first. We don’t know whether those limits are at 3 sigma, 4 sigma, 1 sigma, or 57 sigma, because knowing that depends on knowing the underlying distribution of values, and we don’t (generally) know that either. Fitting any distribution to the measurements so far is more perilous than one expects.

    So how does this even work? Well, the (intelligently) picked limits are picked in such a way that if they were applied to a probability model – any model – they would happen to filter out almost 100% of the values, but not quite. The limits are chosen because for any sane probability model, they happen to filter out somewhere around 98% to 100% of the data. That’s all we need. We don’t need to know exact p values or what sigmas this corresponds to. It’s a very practical approach.

    1. 4

      There are two things about inspection that I think are important to understand, that this article omits:

      A third thing: Deming used a bunch of specific technical terminology, and if you read Out of the Crisis one of the things he says is that knowledge work is something that should be reviewed by other people before it’s accepted. So there are cases where he thinks that inspection is a necessary process.

      1. 3

        Oh, yes, that is a critical point. Thank you for bringing it up.

        To be honest, it is something I’ve struggled with understanding myself, and I must have missed it when reading Out of the Crisis. Do you have a page number or more specific reference/quote at hand?

        1. 3

          In my copy it’s chapter 8 page 261, “adminstration of inspection for extra high quality”.

          1. 3

            Ah, yes. So essentially there seems to me to be two conditions:

            • When you are creating something that is one-of-a-kind, it’s impossible to talk about statistical control, so “100% inspection” (a strict requirement on review) is required.

            • When you are doing something with great consequences (he mentions as an example calculations involving interest rates, in which any errors obviously compound), strict requirement on review might make economic sense.

            Makes total sense to me!

      2. 2

        Is there any reading you would recommend to learn more?

        1. 2

          Deming’s books are very readable. Shewhart is interesting too, for a slightly different perspective on the same idea. For more statistical reasoning, Wheeler is a great resource – he even has a lot of stuff freely available on the Quality Digest website.