1. 65

  2. 1

    One thing I’m missing in this article. Having this knowledge, how and should it affect the way we write queries or it’s just good to know this and completely relay on implementation optimizations?

    1. 12

      From the article:

      this diagram answers questions like:

      • Can I do WHERE on something that came from a GROUP BY? (no! WHERE happens before GROUP BY!)
      • Can I filter based on the results of a window function? (no! window functions happen in SELECT, which happens after both WHERE and GROUP BY)
      • Can I ORDER BY based on something I did in GROUP BY? (yes! ORDER BY is basically the last thing, you can ORDER BY based on anything!)
      • When does LIMIT happen? (at the very end!)

      Once you know that you can’t GROUP BY a window function result, you know (or can learn) that a query like this (10 lowest public transport bike usages) won’t work:

          dense_rank(usage) OVER year AS usage_rank
      FROM ov_fiets
      WHERE usage_rank < 10

      and must be rewritten to move the window function to a separate table expression, such as a WITH clause:

      WITH tmp AS
          SELECT rank(usage) OVER year AS usage_rank
          FROM ov_fiets
      SELECT usage_rank
      FROM tmp
      WHERE usage_rank < 10

      (SQL written on phone and not checked or tested.)

      1. 3

        thanks, that additional explanation really helped!

      2. 4

        It’s knowledge like this that allows you to write more sophisticated SQL queries with less headaches from bashing your head against the wall going “Why won’t you let me write this?”.

        Optimizing SQL queries is a rather different kettle of fish, and has a lot more to do with understanding your specific database engine, and how to set up indices, and queries that target those indices well. It also involves laying out your tables in a way that’s easier to optimize, and being able to pick apart execution plans and see what they have to tell you. There are some general principles, such as only selecting the data you need; making sure that you’re able to properly filter your biggest tables based on the indexes you have, and so on.

        You can, and I have in the past, optimize SQL queries without understanding this concept explicitly, though eventually, working in SQL enough, you’ll internalize some version of execution order.

        This does help, however, in being able to build up larger queries, akin to Unix pipelines.

        1. 1

          BTW, if you found anything unclear or missing or feel like the article would be better if expanded: jvns is very approachable about concrete and actionable criticism.

          She takes no bullshit, though, in a good way.

        2. 1

          I like the post, and there’s really not much to add, but..

          You can order by some aggregate you did in group by, but that group by can also have its own ordering, and that ordering can even have its own grouping.. so I guess what I’m trying to say is there’s not just one order by unless we’re just counting the top level.

          Also, part jokingly: SQL queries don’t start with SELECT, they often start ;with .. a CTE, or at least a declare/\set/&c. There’s like a whole nother topic of which CTE, if any, a query starts working first, and that depends on db + version (PostgreSQL 12 finally inlining?).

          1. 1

            (I really want to find a more accurate way of phrasing this than “sql queries happen/run in this order” but I haven’t figured it out yet)

            Maybe precedence?

            1. [Comment from banned user removed]

              1. 20

                I don’t agree with that, many people have no structured knowledge of SQL at all.

                Also, a ton of jvns work is writing down stuff other people find “too basic” in a clean fashion, so “um, yes” is a bit of a jerk response.