1. 12
  1. 2

    Second, the new system was built in an elegant way, without any workarounds. However, the legacy system it replaced has a lot of edge cases: edge cases we discovered during the shadowing/rollout phase. In some cases, we needed to make tough business decisions whether to support an edge case or drop it.

    This sounds like a roundabout way of saying “The new system was made smaller than the old system, and smaller systems are easier to make.” Am I getting that right? Not criticising the choice of words; genuinely curious.

    Very salient learnings, either way. A lot of it echoes Demings teachings of quality and continuous improvement.

    1. 3

      The new system we built was more complex than the old one, as it was more future-proof. What was not clear when we did the rewrite was

      1. The amount of undocumented workarounds the old system had (“hidden business logic”)
      2. How some known bugs in the old system were, in fact, features and fixing these led to unexpected behaviours. So by fixing these, we had clients fail as they built on top of these.

      The crazy part was we knew this might happen so we did our research. But it was with rolling out and monitoring we caught rouge clients using the old system in weird ways or teams who built on top of explicitly deprecated APIs that were supposed to be retired 12-18 months ago but never were.

      Note that the system I’m talking about had a good 20-30 known consumers and a few new ones we discovered during migration.

    2. 2

      @gregdoesit what has the holocaust memorial in Berlin got to do with distributed systems?

      1. 1

        @james - it doesn’t and thanks for flagging. I did not check what the photo was taken of. I’ve replaced the image with something that hopefully is more neutral.