1. 9

The server looked livelocked and it takes too long to get a remote console on it, so I just remotely power cycled it and took the opportunity to quickly upgrade to OpenBSD 5.3 for an unrelated issue once it came back up. After doing that, I realized that the site hadn’t come back up after the reboot, as mysqld refused to start claiming there was a corrupted page “somewhere” without showing where (but graciously dumped tons of hex versions of the corrupted page to the log file).

I set innodb_force_recovery = 4 and mysqld was able to start in recovery mode, so I dumped all of the databases. I CHECK TABLE’d each table and they all reported fine, but apparently that doesn’t mean anything since there was still corruption somewhere. I ran innochecksum on each .ibd file and found one that reported corruption, so I deleted it and re-imported the table. Still claimed corruption.

I finally wiped out ibdata, the binlogs, and all of the table data files and re-imported everything from the dump. It all appeared to be fixed with no actual data corruption that I can see. I’m not sure what ibdata1 was even being used for since I have innodb_file_per_table set. Anyway, mysqld started and wasn’t reporting any corruption.

But then none of the Rails/Unicorn processes were starting properly and were spinning at 100% CPU. I ktraced one and found it was looping over and over in a syscall (I forget which) so, assuming there was some kind of incompatibility between the installed compiled Ruby modules and the recently upgraded system, I wiped out the installed Bundler bundles and did bundle install --deployment again.

The processes started properly and were not eating CPU, but nginx was just reporting 404’s for everything. During the upgrade I switched from the previous nginx package to the one that now comes in base, but I forgot that the one that comes in base is now chroot()ed. I fixed that and now it’s back up.

  1.  

  2. 1

    Happened again overnight, though with no corruption this time.

    I have a remote console on it and ddb.console enabled, so hopefully I’ll at least be able to see where everything is hanging when it gets in this state again.

    1. 1

      Bummer. Every time I’ve ever said, “You know, I’ll take this opportunity to do a quick update” it’s kinda screwed me over.

      1. 1

        Yeah, if I had been paying attention and realized mysqld hadn’t come back up properly, I would have just fixed that and got the system back up as it was. An important OpenBSD patch needed testing and I figured I’d take advantage of the (previously minor) downtime to start running with it.