    Overall, I think this event is the worst I’ve experienced with Amazon. It took twenty-two hours for full recovery of some systems, and the analysis here indicates that this was at least partially because of some really poor architectural choices, like using a thread per server in the farm on every server in the farm and running so many critical subsystems on the same substrate with no damage control measures.