I thought this was a pretty awesome talk. I do wish he would have expanded on some points he made especially at the start when he called out STONITH in passing.
Oh, hey, there’s my name on lobste.rs.
I can handle questions here, if you’d like to ask some. I’ll be in and out today, so responses may take a while.
Specifically on STONITH I was hoping you could expand on why it’s a bad practice? We currently use it today with our postgresql cluster and at least on the surface it seems like a good way to manage clusters of dbs that don’t have built in cluster mechanics.
The shortest answer is that machines lie, and they lie often, and if your system doesn’t have reconciliation strategies (which it doesn’t, or else you wouldn’t need STONITH), you’re asking for the machines to knock each other out. Or for the unhealthy machine to knock out the healthy one.
This happens a lot and if you haven’t seen it, yet, you’ve gotten lucky.
By the way, lying here can take many forms. Maybe there was a temporary network problem (when both think the other is being too slow or refusing connections, what happens?), or maybe the metric used to determine “health” were out of date (one machine is rejecting all normal requests, but is happily serving the heartbeat while the other one is handling more than its share and struggling to) and so on.
It’s really just broken. Leader election or manual intervention is much preferred.
Meta comment: What should be the policy around including speaker / author names in titles? We have a few stories on the frontpage that have a person’s name at the front or the back of the title. My preference would be that we remove them in the future as I think it clutters the headlines.
Agreed, and fixed. (This is one of the more commons causes of moderator