1. 8
  1. 2

    I don’t have personal experience, but @hynek recommended it on https://talkpython.fm/episodes/show/352/running-python-in-production

    1. 2

      I’ve been a huge fan of PgMustard for the last couple years. They’ve helped me improve (and understand) more than a handful of slow queries. It’s one of those tools I keep in my back pocket and it always comes in handy. I recommend trying it out next time you run EXPLAIN.

      1. 1

        see also: https://explain.depesz.com/

        and there is another graph-like explain visualize which I can’t remember

        1. 2

          FTR the big difference is that pgMustard doesn’t just visualize the EXPLAIN, but also interprets it and makes practical suggestions how to improve your query and/or indexes. Which is a big deal if SQL performance isn’t your bread-and-butter.

          1. 1

            shrug I tried it once and all it suggested was that I had already created the indices I needed.

            1. 1

              That’s odd, in my case it told me that my index was wrong and why so they clearly should have an awareness. Might be different case, might be they improved since you tried.

              1. 3

                OK lets consider https://app.pgmustard.com/#/explore/51f2da78-0b10-4a45-b82c-a67fd8942097 and compare it with https://explain.depesz.com/s/MH7Y (slightly different query with a different parameter)

                This corresponds to this query (schema) which is the data source for this page. My site is a bit unusual in that I mostly do “OLAP” queries, and this page is perhaps the worst offender. It calculates statistics for a player grouped by other players they’ve played with. This query is rather pathological, since the player_stats_peers index’s column (logid aka the primary key for the games played) isn’t clustered by the access pattern (by steamid64 aka the player’s primary key). This results in a lot of blocks being read for each row. I’ve had a lot of trouble optimizing this query because of this.

                So what does mustard suggest?

                • Operation #11: Heap Fetches: 29,942
                • Operation #11: Read Efficiency
                • Operation #11: Cache Performance: 69.4%

                The cache performance is expected. This query completes around 10x faster if it is run in quick succession. Unfortunately, most pages are not reloaded immediately, and I use caches closer to the user to deal with those situations. This ties in with the poor read efficiency, which is due to the index’s column mismatch. These are both somewhat highlighted by the “read” column of depesz, though it shows absolute numbers and not relative.

                Mustard suggests that there are a lot heap fetches because the visibility map is out of date and that I should run vacuum. Indeed, running vacuum did increase the performance. Much of my dataset is effectively read-only, while most of the churn happens in the most recent hour of rows. This causes the autovacuum threshold to be set too high with the default scale factor of 0.2. I’ve been meaning to partition my tables to address this a bit, but I haven’t gotten around to it. I adjusted my autovacuum settings so hopefully this gets taken care of automatically next time.

                The heap fetches are visible in depesz, but not highlighted as concerning. So that’s a win for mustard I’d say. Is it worth $100/year? Maybe once I start spending more than $120/year on the server :)