1. 17

  2. 9

    When discussing failures, people need to feel safe to share all relevant information, with the understanding that they will be judged not on how they fail, but how their handling of failures improved the team, their product and the organization as a whole. Teams with operational responsibilities need to come together and discuss outages and process failures. It’s essential to approach these as fun learning opportunities, not root-cause obsessed witch-hunts.

    Having worked in a “witch-hunt” environment in the past, I still struggle to articulate information about failures. If they’re my own, I wonder how they will impact my future. If they’re someone else’s, I feel like I’m throwing them under the bus. It’s very difficult to overcome that, even though I know I’m no longer dealing with the same people.

    1. 8

      “…Each time someone wrote something they learned on the whiteboard, we toasted them. The team that left that room was utterly different to the one that entered it.”

      1. 12

        That is indeed a good bit, and I thought it was even better when I read it in full. It also resonated with my own experiences: I’ve been in a few rooms where somebody thought ‘let’s take the time to cheer all the things that went well’, and, well, it just felt good. Knowing you’re appreciated is one thing, but feeling appreciated (basking in appreciation?) is quite another, and a very nice high to get from time to time. (A little voice in my head is saying this sounds a lot like ‘happiness is mandatory’ culture. That little voice is confusing ‘saying it out loud doesn’t come naturally’ with ‘the feeling isn’t genuine’. They were nice companies, and nice people.)

        Anyway, here’s the bit I liked in full.

        […] the team spent three long days, and three long nights rebuilding the zone. Once it was done, they – and I – were dejected. Demoralized. Defeated. An amazing manager who was visiting our office realized I was down, and pointed out that we’d just learned more about our new storage stack in those three days, than we had in the previous three months. He reckoned a celebration was in order.

        I bought some cheap sparkling wine from the local supermarket, and with another manager, took over a big conference room for a few hours. Each time someone wrote something they learned on the whiteboard, we toasted them. The team that left that room was utterly different to the one that entered it.

        I also appreciated the bit about “the standard you walk past is the standard you accept”.


        1. 3

          This should be part of the sprint retrospectives (if doing scrum). At $job we often follow the pattern “Good, Bad, Angry”, we put post-its (yea post-its hell) on what’s been good, what’s been bad, and what made you angry. At the end, we aggregate, and discuss all the angry, and pick 2 from Good and 2 from Bad to discuss and find which effort should be continued and which one should be started.

          For example good can be:

          • We now have a test env per PR which makes it easier to test the deployment at each PR => We can then discuss what can be improved to be even better (and we obviously celebrate it)

          Bad can be:

          • Didn’t like how the last meeting ended, I find the conclusion unclear => How can we make sure that this doesn’t happen again?

          Angry can be:

          • I’m frustrated by how we resolved the last outage, I found myself struggling to help and it was panic everywhere. => Since it’s a “angry” point, we need to resolve it ASAP, and this is priority for the team to work well.

          We discuss all this as a team, and at every sprint, we have some allocated time to work on these. Everybody was skeptical about this, but after months, it’s the one “meeting” everybody look forward to go to.

        2. 4

          While I certainly feel that most of the practices that the article recommends are good, I don’t think it makes a very good argument for them. Frankly most of it is simply unsupported declaration of fact, the “findings” linked at the beginning of the article are actually a guide (in its own terms) that’s more of the same: advice that sounds good but is generally ill supported by evidence.

          In particular this quote bothered me:

          Unsafe teams can deliver for short periods of time, provided they can focus on goals and ignore interpersonal problems. But eventually, unsafe teams will break or underperform drastically because people can’t introduce change.

          because one has to wonder what evidence we might gather to support this proposition. An observational study will never produce this result, if you have a team which is “unsafe” but high functioning, in absence of a crystal ball, you have no idea if it’s on its way to “breaking” or not. Certainly it’s fair to say our intuition is that “unsafe” teams aren’t sustainable but as a hypothesis it’s unfalsifiable because any evidence to the contrary can be explained as “well, eventually this isn’t going to work anymore”.

          And half the article is spent elaborating on all the negative outcomes of a hypothetical situation. One could as easily say “Karen was embarrassed by her mistake but she learned from it and became a more capable engineer as a result”, or concoct some opposite scenario where a safe team fails to impress the importance of reliability and Karen’s mistake is repeated. Of course I think all of our intuition is that’s not what happens but short of anecdotal evidence each such hypothetical is about as defensible as the next.

          I suspect the article is correct in that “psychological safety” as described is important but it doesn’t seem to make a very good case for it.

          1. 1

            It’s good to have good arguments for correct things, but yeah honestly that quote is something I’ve observed myself.