1. 20
  1.  

    1. 18

      One important part of this imo is:

      On-call attempts to re-enable the R2 Gateway service using our internal admin tooling, however this tooling was unavailable because it relies on R2.

      I knew when we saw this outage that something exactly like this had happened. When you have such a useful generic tool with high SLAs it’s too easy to build your own internal tooling against it which creates dependency loops during incidents.

      Glad they figured it out but yeah that’s spooky.