1. 42
  1.  

  2. 7

    EDIT: subthread TL;DR for future readers: the overall query plan has a wildly inaccurate cost estimation, based on some dubious row count estimation on both sides of a hash join. The massive cost estimate trips the thresholds for JIT compilation, even though the real cost of the query is negligible. The 13 second JIT compilation time is a red herring: PostgreSQL measures wall time rather than CPU time, so the times were artificially inflated by the extreme system load of numerous competing compilations. Run in isolation, the query compiled in ~100ms.

    A few things jump out at me, that I’ll address one at a time:

    • The initial cost estimate is quite small: cost=0.86..12.89
    • JIT was slow for every query, not just one.
      • The query has an inlined constant: Index Cond: (user_id = 7002298).
    • Initial code generation ran shockingly long: Timing: Generation 108.775 ms.
      • Consider code generation is fairly similar to writing a C file into memory with sprintf.

    The initial cost estimate was basically 13. Why on earth was this query JIT optimized? The PostgreSQL 12 query planning docs for JIT cost parameters specify queries won’t be JIT optimized at all until their cost is 100,000 by default. Inlining and optimization passes default to 500,000. Yet all 3 occurred.


    Why was the JIT slow for every query? Surely PostgreSQL caches query plans and JIT code? If so, could the inlined constant interfere with plan caching in this case? I actually don’t know, so I looked it up and found the PostgreSQL 12 docs for the plan_cache_mode planner option. Here’s the relevant snippet:

    Custom plans are made afresh for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions. Thus, use of a generic plan saves planning time, but if the ideal plan depends strongly on the parameter values then a generic plan may be inefficient.

    From this I surmise it’s entirely possible PostgreSQL did not reuse a generic plan in this case. @piranha specified prepared statements did not help in another Lobste.rs comment. It might be worth trying this query again using force_generic_plan, just to see what happens.


    Why did code generation take so long? This query seems extremely simple–both from what we see in the explain, and what @piranha tells us the query does.

    Going out on a limb here, I guess that the long code generation time indicates a lot of code was generated. Then the inlining phase took 1s and the emission phase took a whopping 5s! I further guess that whatever took so long to inline exploded the code size even more, leading to the massive emission time.

    But what could cause this? The PostgreSQL 12 JIT accelerated optimizations docs specify:

    Currently PostgreSQL’s JIT implementation has support for accelerating expression evaluation and tuple deforming.

    Not much. Either the query contains some expression with lots of underlying complexity, or the table schema is quite complex to decode. For both cases I suspect something like JSON field extraction. If the JIT pulls in a full JSON decoder for this query, that would definitely increase the code size and compile time.

    But I’m really reaching on the JSON guess, with hardly a shred of evidence.


    That’s what comes to mind so far, but there’s really no way to tell without seeing explain (analyze, verbose, buffers), the table schema, the full query, and all of the query planner settings used in production.

    1. 3

      Quickly replying on last part: I’ll get to my laptop in an hour or so and update post with isolated query, schemas and full explain. I’m not sure if we changed any query planner settings at all…


      OTOH, I’m wary a bit of leaking too much information, but looking at query plan it’s understandable that PG just thinks too many rows will return from this join. So JIT enables because its default costs configuration permits that. :) No problem here.

      Why it takes so much time is indeed a more interesting question!

      1. 2

        For the part of the plan you posted, PostgreSQL correctly estimates the result will have 1 row. Does the redacted part of the plan have a higher cost= field?

        1. 3

          https://paste.in.ua/4215/ - that’s the whole plan, but without verbose, buffers - will check on that later on. :)

          EDIT: argh, something has changed since april and when I enable JIT right now for this query results are like this:

           Planning Time: 2.240 ms
           JIT:
             Functions: 101
             Options: Inlining false, Optimization false, Expressions true, Deforming true
             Timing: Generation 13.719 ms, Inlining 0.000 ms, Optimization 3.280 ms, Emission 83.755 ms, Total 100.753 ms
           Execution Time: 102.812 ms
          

          EDIT2: I think that’s because of load. When I enable JIT for a single query it takes 105 ms, but when it’s enabled for everything, then this being one of the most common queries just destroys CPUs. So that’s where the weird 13 seconds are coming from. Updated my post.

          1. 2

            I thought about that possibility, and I hoped PostgreSQL was recording CPU time rather than wall time for JIT compilation. Sadly you are correct. PostgreSQL ExplainPrintJIT clearly uses instr_time, which in turn clearly uses clock_gettime.

            This time around your query only compiled 101 functions, not 138, so there may be a difference there. But under tremendous load 100ms could become 13s–I’ve seen worse.

            I think that closes the book on the 13s compilation time. Now you just need to convince PostgreSQL to reuse the query plan, or estimate properly. Looks these are your two biggest problems estimation-wise:

            ->  Seq Scan on role role_1  (cost=0.00..2.49 rows=98 width=64) (actual time=0.013..0.013 rows=1 loops=1)
            

            Based on the table name I assume you could slap a unique index on role and get a correct rows=1 estimate. Honestly that looks like an oversight, and should probably be fixed if so. In theory there’s nothing wrong with a sequential scan on a small table, but it can confuse the query planner.

            ->  Bitmap Heap Scan on supplier sup  (cost=9.23..145.41 rows=62 width=85) (actual time=0.070..0.070 rows=0 loops=1)
                  Recheck Cond: (emails @> ARRAY[(u.email)::text])
                  ->  Bitmap Index Scan on supplier_emails  (cost=0.00..9.22 rows=62 width=0) (actual time=0.057..0.057 rows=0 loops=1)
                        Index Cond: (emails @> ARRAY[(u.email)::text])
            

            For this one, I’d guess supplier_emails is also relatively unique. I’m not sure how PostgreSQL analyzes array values, so if you filter on emails often, converting the array column to a join table may help out the optimizer.

            1. 2

              Part about role is like that:

              LEFT JOIN
                (SELECT ru.user_id,
                        unnest(role.permissions) AS permissions
                 FROM ROLE
                 INNER JOIN roles_users ru ON ru.role_id = role.id) role ON role.user_id = u.id
              

              roles_users is:

              # \d roles_users
                           Table "public.roles_users"
               Column  |  Type   | Collation | Nullable | Default 
              ---------+---------+-----------+----------+---------
               user_id | integer |           | not null | 
               role_id | integer |           | not null | 
              Indexes:
                  "roles_users_pkey" PRIMARY KEY, btree (user_id, role_id)
                  "roles_users_role_idx" btree (role_id)
              

              role is:

              # \d role
                                                    Table "public.role"
                 Column    |          Type          | Collation | Nullable |             Default              
              -------------+------------------------+-----------+----------+----------------------------------
               id          | integer                |           | not null | nextval('role_id_seq'::regclass)
               name        | character varying(80)  |           |          | 
               description | character varying(255) |           |          | 
               permissions | text[]                 |           |          | 
              Indexes:
                  "role_pkey" PRIMARY KEY, btree (id)
                  "role_name_key" UNIQUE, btree (name)
              

              not sure what can I make unique here? Am I missing something?

              As for supplier_emails… It’s a new addition, we slapped it on to do something quickly and yeah, it’s being converted to a join table, because it’s used more and more.

              1. 2

                Ah. PostgreSQL thinks that the average user has 3 roles and the average role has 10 permissions. Because of that, it’s using a hash join for role inner join roles_users, instead of a nested loop lateral join on index seek. You have all the unique indexes you need, the query planner just doesn’t care.

                I’ve seen this plenty of times with PostgreSQL. You could try running ANALYZE on roles_users and roles to update statistics, but it may never get better. Autovacuum should analyze tables automatically, but those tables may not have changed enough to trip the automatic analyze, as they are presumably relatively small. Worth a shot.

                Luckily this doesn’t actually create a real problem for you, as the query plan works alright as long as the JIT is disabled. It’s much more frustrating when this kind of thing creates an outrageously slow query plan and you have to separate your query into subqueries at the application layer, just to avoid PostgreSQL picking a wildly inappropriate join strategy.

                I’ll look at this again in the morning but for now it’s 3AM in my time zone, so I’m going to bed. I don’t get to analyze SQL queries at my current job, so I’m getting a kick out of it.

                1. 2

                  PostgreSQL thinks that the average user has 3 roles and the average role has 10 permissions. Because of that, it’s using a hash join for role inner join roles_users, instead of a nested loop lateral join on index seek. You have all the unique indexes you need, the query planner just doesn’t care.

                  Luckily this doesn’t actually create a real problem for you, as the query plan works alright as long as the JIT is disabled. It’s much more frustrating when this kind of thing creates an outrageously slow query plan and you have to separate your query into subqueries at the application layer, just to avoid PostgreSQL picking a wildly inappropriate join strategy.

                  What if the JOIN was transformed into a LATERAL one? It’d be perfect if Postgres figured it out by itself, but perhaps query separation at the application layer isn’t necessary.

                  1. 2

                    I’ve had this problem before, and tried exactly that. It ignored the lateral join same as it’s already ignoring the unique index on role. It thinks it’s doing the cost-effective thing as-is.

                    Unfortunately even if the lateral join worked, I don’t think it would change the row estimate or overall cost estimate. A lateral join would be more efficient here, but the query is already fast enough (unlike the other times I’ve seen this happen). The cost analysis is what’s improperly triggering the JIT.

      2. 2

        I looked at this yesterday as well, and I couldn’t really figure out why the JIT was being used. One thing that struck me about the JIT query plan though is Functions: 138, which seems rather high. I suspect this may be the cause, and may also explain why it takes quite long to compile?

        Bit difficult to say without looking at the actual query/schema though, and I couldn’t really find what what “Functions” means exactly (it doesn’t seem documented as far as I can find, but I think it’s the number of generated functions).

        CC: @piranha

        1. 2

          https://lobste.rs/s/r6ydjp/postgresql_query_jit#c_qfxrsk - I think EDIT2 captures the essence of the problem. :-)

      3. 4

        You left us hanging… What effect did “Emissions” optimization had on the query? How did you solve it eventually? Just disabled JIT completely?

        I just upgraded a large (Django) DB from 11 -> 12. Can’t say I encounter weird plans such as the one described in the article.

        1. 3

          Ha-ha, funny how errors repeat. :) “Emissions” are not an optimization, it time that JIT took to emit optimized query! That whole line is statistics about JIT, not about query itself.

          So yeah, we’ve just disabled JIT for now. “Large” is relative. Our DB is ~300GB, but it seems that amount of rows in auth_user and user_userprofile is a little bit over default limits for JIT.

          1. 1

            I assume that auth user and user profiles have roughly the same amount of rows, so NL is probably not the right choice. I wonder what made the DB pick that plan.

            I haven’t had a chance to explore JIT yet, so I was glad to see an article that mentioned it. Thanks 👌

            1. 1

              Nested loop is the right choice when both sides of the join are point lookups to a single row. 1 * 1 = 1. You can see in the analyze that’s what happened: loops=1. The plan is fine, JIT is the problem here.

              1. 1

                Right! I missed the fact it’s fetching only one user.

        2. 3

          This talk https://www.youtube.com/watch?v=oF_lIZHJPig from CMU on Deepgreen DB (a Postgres/Greenplum commercial offering with JIT) goes into some good details around the benefits / drawbacks of JIT in databases.

          1. 1

            I have watched that talk twice, it is damn excellent!

          2. 2

            Does your application use prepared statements?

            1. 2

              Nope, but it seems it doesn’t help.

            2. 1

              Whatever Postgres is doing in order to JIT compile, how could it take 13 seconds to do it? I imagine it must be producing something equivalent to C code or maybe LLVM bytecode that executes the query plan and I can’t imagine that code being too long (how complicated is the overall query plan?). So how does it take 13 seconds to compile it?

              1. 2

                Haha, I’d like to know as well! I actually had reproduced that thing on just isolated join between those two tables. I guess need to read the docs to see if it’s possible to debug this. :)

                1. 1

                  I actually had reproduced that thing on just isolated join between those two tables.

                  Wow, this has to be a bug, one rarely runs queries long enough to justify a 13 second startup time to gain some percentage of improvement at run time.

                  Though I think one point I’ve missed initially is that the Optimisations and the Emissions are steps performed by Postgres and not the compiler (i.e it’s not just compiling with -O3), so one can imagine very inefficient buggy corner cases that your query (or DB state) might be running into. Like maybe you have a thousand indexes on a related table etc.

                  1. 2

                    Hmmm… okay, maybe I really need to report this to postgresql developers. :)

                    I think I’ll get to it next week.

                    1. 1

                      Sounds great! Meanwhile I’ll go celebrate this rare moment when my aimless whining might cause a positive influence on the world :)

                      1. 1

                        After I dug a bit deeper it seems it’s not a bug in pg: https://lobste.rs/s/r6ydjp/postgresql_query_jit#c_qfxrsk

                        1. 1

                          Thanks for pointing it out, that makes total sense and solves the mystery.

                    2. 1

                      Wow, this has to be a bug, one rarely runs queries long enough to justify a 13 second startup time to gain some percentage of improvement at run time.

                      Turned out I need to re-check what I post. :-) It was a different thing and not related to JIT, you need more costs to get JIT running. :)