This is a good example for how one should consider caching. The issue here is that the DB cannot take the natural read load of the users, so when their caching got it’s legs taken out from under it, the db could keep up. This is where circuit breakers and graceful degradation need to be part of the core of the system. In that case they could have automatically shown a simpler version of the website and expanded it out as the caches warmed up and the latencies went down.
Not really a post-mortem. They don’t give any interesting details of what went wrong, claiming they’re still doing forensics; also interesting that they regard the sporting events as not worth catching up on, and do not offer any consolations for the inconvenience.