Really enjoyed this post!
Cool to see how Slack deploys into prod via checkpointing, and deploys new updates onto servers. Personally, I utilize containers for everything, so a quick docker push to my container registry automatically rolls out updates (via docker swarm). If any problems occur, a rollback is issued and I work through the error logs.
Overall this looks like a pretty good deploy process, but I’m a little surprised by the combination of 12 deploys per day and “additional manual testing” in the staging tier. This sounds pretty tedious for whoever gets to be the “deploy commander” that day.
Hopefully this implies the presence of very good tooling for performing that test, though in that case I’d wonder why it’s still “manual” and not automated. ;-)