That is neat, the kind of things you are really thankful to the blog’s author to have written, as it’s a rather obscure edge-case (at least from my point of view).
You can see that the planner wrongly estimated the amount of rows the query would return (the “rows x” column and up/down arrows compared to “rows” column).
Vacuum in PostgreSQL is a process for reclaiming removed rows, preventing ID wraparound, updating statistics, dumping the pending list of a GIN index to the main tree and other maintenance tasks. This is a costly process and takes time.
There is also a VACUUM FULL which does full table rewrites. That’s something you almost never run or even want to run - it takes A LOT of time.
There are also autovacuum processes that are automatically ran to do the above in smaller incremental steps.
Now, in this particular case. In my humble opinion a vacuum was not needed. The author could have just ran ANALYZE TABLE order_items or even ANALYZE TABLE order_items purchase_order_number would have the same result in a fraction of the time.
I’m not a MySQL expert, but that database also has statistics (https://dev.mysql.com/doc/refman/8.0/en/analyze-table.html) and a step like this also might help the planner after large imports or adding new columns to big tables. I don’t see anything in the linked stackoverflow post indicating that MySQL would be immune to outdated statistics leading to an inefficient plan.
That is neat, the kind of things you are really thankful to the blog’s author to have written, as it’s a rather obscure edge-case (at least from my point of view).
For those (like me) wondering why there’s no such mechanism on MySQL : https://stackoverflow.com/questions/25153532/why-is-it-a-vacuum-not-needed-with-mysql-compared-to-the-postgresql
PS: I’d suggest removing the [ruby] tab, and go with a [postgre] one.
Just one clarification as PostgreSQL vacuum just does a lot of things and I think this behavior could have also happened with MySQL.
In this specific blog post, the problem was caused by outdated table statistics.
If you look at the query plan: https://explain.depesz.com/s/7qJI
You can see that the planner wrongly estimated the amount of rows the query would return (the “rows x” column and up/down arrows compared to “rows” column).
Vacuum in PostgreSQL is a process for reclaiming removed rows, preventing ID wraparound, updating statistics, dumping the pending list of a GIN index to the main tree and other maintenance tasks. This is a costly process and takes time.
There is also a VACUUM FULL which does full table rewrites. That’s something you almost never run or even want to run - it takes A LOT of time.
There are also autovacuum processes that are automatically ran to do the above in smaller incremental steps.
Now, in this particular case. In my humble opinion a vacuum was not needed. The author could have just ran
ANALYZE TABLE order_items
or evenANALYZE TABLE order_items purchase_order_number
would have the same result in a fraction of the time.I’m not a MySQL expert, but that database also has statistics (https://dev.mysql.com/doc/refman/8.0/en/analyze-table.html) and a step like this also might help the planner after large imports or adding new columns to big tables. I don’t see anything in the linked stackoverflow post indicating that MySQL would be immune to outdated statistics leading to an inefficient plan.