1. 21

  2. 6

    “Everybody tests in production. Some people are careful enough to also test elsewhere”.

    1. 1

      I heard it as:

      Everyone has a staging environment, but some people are fortunate enough to have a separate production environment.

    2. 5

      The meme “I don’t always test, but when I do, I do it in prod” is funny/sarcastic, because it implies that you don’t even try to test your code, before pushing to production. Doing the simple tests like unit tests and manual testing of your stuff is something you should always do. This is double checking your work and limiting tech debt that comes with code that has no automated tests.

      I’m also one of the people who works on systems too complex to test in reliably in a fully automated way - payments systems - , the “real world” being responsible for most failures. So yes, I also have to test in prod a lot. I wrote a post on my learnings operating/monitoring large systems previously, which overlaps with some parts of this post.

      It’s actually a great wake-up call when you realize your system is too complex to write tests cases for all failure cases. It forces you to invest in monitoring, alerting and instrumentation. Those are the things that will help you catch failures you would have otherwise missed.

      Also, not mentioned in the article, but the most common scenario of testing in prod are migrations. Beyond some level of manual testing, migrating between backend or frameworks is usually one that’s overly complex to test. If you find yourself in the middle of a migration, the monitoring strategies in this post are really good ones.

      1. 3

        Working on payment systems has definitely made me less dogmatic in my thinking about automated testing. Rarely does a week go by without some new never-before-seen failure that has nothing to do with our code.

        You can have a perfect test suite with 100% condition coverage and lightning-fast execution time, and you can have a huge QA group with absolute veto power over releases and a predilection for nitpicking… and then in the middle of the night a partner company suddenly adds a column to the middle of the headerless CSV file they send you for reconciliation, and your customers’ account balances stop updating for however long it takes you to figure out what happened and fix your parser.

      2. 3

        End users are the best at finding bugs, they are also the worst at describing them.

        1. 1

          Nice list of things to have in you monitoring system on production. I’ll be adding a few lines that will report firmware-versions back to me as a consequence of this post.

          1. 1

            Tiny reminder that the definition of A/B testing is testing in production. Usually doesn’t imply testing of the code, rather testing user behavior. But that’s not always true :)