1. 18

  2. 7

    CTEs are great, but it’s important to understand the implementation characteristics as they differ between databases. Some RDBMSs, like PostgreSQL, treat a CTE like an optimization fence while others (Greenplum for example) plan them as subqueries.

    1. 2

      The article mentions offhand they use SQL Server, which AFAIK does a pretty good job of using them in plans. I believe (not 100% sure) its optimiser can see right through CTEs.

      1. 2

        … and then you have RDBMSs like Oracle whose support for CTE is a complete and utter disgrace.

        I praying for the day Oracle’s DB falls out of use, because I imagine that will happen sooner than them managing to properly implement SQL standards from 20 years ago.

        1. 2

          At university we had to use Oracle and via the iSQL web-interface for all the SQL-related parts in our database-courses. It was the slowest most painful experience, executing a simple select could take several minutes and navigating the interface/paginating results would take at least a minute per operation.

          I would always change it to show all results on one page (no pagination) but the environment would do a full reset every few hours requiring me to spend probably 15-30minutes changing the settings back to my slightly saner defaults. Every lab would take at least twice as long because of the pain in using this system. I loved the course and the lecturer, it was probably one of the best courses I took during my time at university, but I did not want to use Oracle again after that point.

          I’ve heard that they nowadays have moved the course to use PostgreSQL instead which seems like a much more sane approach, what I would have given to be able to run the code locally on my computer at that time.

        2. 1

          I didn’t know this, so using a CTE in Postgres current would be at a disadvantage compared to subqueries?

          Haven’t really used CTEs in Postgres much yet but I’ve looked at them and considered them. Is there any plans on enabling optimization through CTE’s in pg? Or is there a deeper more fundamental undelaying problem?

          1. 5

            would be at a disadvantage compared to subqueries

            it depends. I have successfully used CTEs to circumvent shortcomings in the planner which was mi-estimating row counts no matter what I set the stats target to (this was also before create statistics).

            Is there any plans on enabling optimization through CTE’s in pg

            it’s on the table for version 12

            1. 2

              It’s not necessarily less efficient due to the optimization fence, it all depends on your workload. The underlying reason is a conscious design decision, not a technical issue. There have been lots of discussions around changing it, or at least to provide the option per CTE on how to plan/optimize it. There are patches on the -hackers mailing list but so far nothing has made it in.

            2. 1

              Does anyone know if CTEs are an optimization fence in DB2 as well?

            3. 3

              Understanding CTEs was a revelation for me. For me it is so much easier to compose a non-trivial query from custom sets built with CTEs as opposed to sub-queries, which tend to get cluttered and don’t stand out as obviously to my perception as set building blocks.

              1. 2

                Recursive CTEs are also very cool. They somewhat remind me of “unfolds” or “anamorphims” from functional programming, in that they start from and initial “seed set” of rows and add new rows by repeatedly applying a query to the results from the previous step. https://stackoverflow.com/questions/3187850/how-does-a-recursive-cte-run-line-by-line/3188127#3188127 Maybe they should be called co-recursive CTEs instead!

                1. 3

                  We use recursive CTEs at work to traverse graph structures that we store within the database. It’s really neat, but also a performance nightmare. For large tables, recursive CTEs have a tendency to hold a lot more stuff in memory than they need to. This has caused lots of unintuitive problems for us e.g. our database running out of space because a large recursive CTE query ate up a ton of memory, swapped it all to disk, then timed out and never cleaned the swap files up.

                  1. 1

                    What database engine are you using?

                    1. 1

                      We’re running on PostgreSQL.