I read through the DynamoDb problem and was surprised that they had to add special logging to figure out and disable hot keys like ‘user_id’. If they had setup structured logging, a quick python script over the log would have shown this issue.
(I debugged this issue) This is roughly what we did. I guess it didn’t come across in the blog that way?
From the blog
So we dreamt up a simple hack to give us the data we needed: anytime we were throttled by DynamoDB, we logged the key
Is there a reason you don’t want to log every single key?
a) it’s sensitive data belonging to third parties, b) at the scale they are dealing with the logs would impose significant size and time penalties
Isn’t that unnecessary? You really only care about when there’s a problem, or else your logs will be noisy. If you log every time, it’ll be kind of hard to differentiate which keys are okay and which keys are a problem.