The server looked livelocked and it takes too long to get a remote console on it, so I just remotely power cycled it and took the opportunity to quickly upgrade to OpenBSD 5.3 for an unrelated issue once it came back up. After doing that, I realized that the site hadn’t come back up after the reboot, as mysqld refused to start claiming there was a corrupted page “somewhere” without showing where (but graciously dumped tons of hex versions of the corrupted page to the log file).
innodb_force_recovery = 4 and mysqld was able to start in recovery mode, so I dumped all of the databases. I
CHECK TABLE’d each table and they all reported fine, but apparently that doesn’t mean anything since there was still corruption somewhere. I ran
innochecksum on each
.ibd file and found one that reported corruption, so I deleted it and re-imported the table. Still claimed corruption.
I finally wiped out
binlogs, and all of the table data files and re-imported everything from the dump. It all appeared to be fixed with no actual data corruption that I can see. I’m not sure what
ibdata1 was even being used for since I have
innodb_file_per_table set. Anyway, mysqld started and wasn’t reporting any corruption.
But then none of the Rails/Unicorn processes were starting properly and were spinning at 100% CPU. I
ktraced one and found it was looping over and over in a syscall (I forget which) so, assuming there was some kind of incompatibility between the installed compiled Ruby modules and the recently upgraded system, I wiped out the installed Bundler bundles and did
bundle install --deployment again.
The processes started properly and were not eating CPU, but nginx was just reporting 404’s for everything. During the upgrade I switched from the previous nginx package to the one that now comes in base, but I forgot that the one that comes in base is now chroot()ed. I fixed that and now it’s back up.