this is kind of weird. no-one used “explain” at any point before that?
maybe it’s worth mentioning again SQL Performance Explained which shows exactly how to get decent performance, how to understand explain, and even mentions various ORM oddities (my review got no love here).
I’m not sure what the history was that caused such a system to be put in place the way.
But yes, now that my team owns it, “explain” is the workhorse that will pull us through
“We need a new architecture, with read-optimized databases and queue-based messaging.”
What?? Nobody even knows what the problem is and they have already decided what the solution should be?? No no, this isn’t how you solve problems!
In hindsight, it’s super clear. In the heady panic those first two days in, when one of us came forward with a prototype of the new database schema using an off the shelf ORM, we said, “sure, you keep working on it”. We still had at least a couple days of work to get enough data to test and the system up and running on our machines. I don’t think letting a few of them poke around on it was a huge mistake, but that is why I wanted to log this story, to remember how I was feeling/thinking, so next time I can be better equipped. I do not regularly take over a half million line legacy codebases, so when I make mistakes, even minor ones, I want to ensure they make me better.
The main problem with such a thing is it’s almost guaranteed to put you in a worse situation. You have just inherited a huge codebase that you don’t understand. It’s clear that your team doesn’t really understand databases either. And someone wants to redesign the most important component in your entire system. This would not end well (and to your team’s credit, not the solution they chose).
In my opinion, the biggest lesson to be learned from this (based on the content of the writeup) is that you should spend the half day reading on how other people profile their databases. For example, turning the Slow Query Log on would have shown you want was going on pretty quickly. Using an EXPLAIN would have made the cost pretty clear.
In your position I probably would have had one group working on getting the system running locally with some data in it and the other working on researching the unfamiliar parts of the stack, specifically related to performance as well as looking for obvious places to add some timing/logging to the codebase. Once the first group got the system up the second group could immediately say “apply these options and look at these log files”
But inheriting 500,000 lines of code is terrifying no matter what you do.