    Really enjoyed this, but I do want to stand up for “treating the symptoms” as a useful approach. You should always try to fix the underlying cause, but I’ve never seen a large, distributed system without finding new, surprising failure modes. Systems that can’t self heal via back pressure, incremental progress, or other similar techniques are custody one weird edge case away from a really bad day.