1. 37
    1. 2

      Thanks a lot, I haven’t found any of these links previously. Turns out that PostgreSQL gathers statistics for arrays, so using arrays with @> can produce better plans than jsonb.

      1. 1

        Source for array statistics? I don’t doubt you, I’d just like to learn more.

        1. 2

          It was mentioned in the thread linked by root comment, look here https://www.postgresql.org/message-id/54C96AAA.9050901%40agliodbs.com

          Also, you can run a query

          select oprname,typname,oprrest from pg_operator op
              join pg_type typ ON op.oprleft= typ.oid where oprname = '@>';
          

          and there will be a row for arrays with a different… oprrest? This one: @> | anyarray | arraycontsel

          So I searched PostgreSQL repo for arraycontsel and its implementation looks far more elaborate than return 0.001.

          So I haven’t dug more to understand exactly what PostgreSQL collects for arrays and how it affects different queries, but it does collect some statistics indeed.

          1. 2

            Digging deeper into the code, it looks like the main statistics are a histogram of distinct elements, and the opt-in most common element statistics. I only really skimmed the code but from my quick read it looks like you need to enable MCE stats manually to get useful estimates on tables with mostly unique array elements. That is, if the most common element appears 3 times in a 100k row table, you know straight away that searching for any element will return max 3 rows.

            I don’t know what const means in this context, but it looks like selectivity stats are only available when one side of the operator is a literal. PostgreSQL may consider the value const if it comes from a lateral join loop or subquery loop, but I have no idea.

            Interesting, thank you!

            Edit: multi-column MCV statistics are opt-in, but I don’t see any evidence the array MCE statistics need enabling. But I don’t truly know either way.

            1. 2

              JFYI, I’ve added a small note to my post about arrays. I don’t plan to write a separate post about arrays, I want to have a blog about many things that bother me, and now it’s all about databases, lol

    2. 6

      We use jsonb heavily in Pleroma and have also had to deal with this. It can be quite annoying because things work most of the time, but sometimes you get these query-breaking bad plans. Especially with the lack of query hints to tell postgres to use a certain index, this leads to essentially unsolvable situations where you just need to extract data from jsonb into a ‘real’ column.

      It’s not too bad, but I hope jsonb will get some form of proper statistics some day, it would really help a lot.

      1. 5

        I like the article, I think it’s valuable, but please…

        Since time immemorial PostgreSQL supports JSON fields and can even index them. By immemorial I mean …

        I perfectly remember the world where PostgreSQL had no JSON support

        why yes, that is exactly the opposite of “immemorial”! And yes, I know that the phrase “since time immemorial” is almost always used hyperbolically, but what’s the use of a hyperbole that you unceremoniously shoot down just one sentence later?

        1. 2

          I interpreted that comment as a joke