1. 13
  1. 5

    Step 2 would be to use a “real programming language” (Python) with an “extensible templating system” (Jinja) … and then you’ll have reinvented DBT.

    1. 2

      My first thought on reading the article title was “Wow, that sounds horrible”.

      Then, I read (quickly scanned, to be honest, but I got the gist) the article, and … Yeah, it still sounds absolutely horrible.

      Constraints for a less horrible solution would vary a lot depending on constraints of real-world scenarios, like do your users know only SQL, how many of them there are, technologies allowed in the company, etc., but I’d definitely draw a hard line on templating (which is what macro preprocessing is, really) for code generation. It never results in a good user experience, see C macros, helm.

      If users are limited to just SQL and unwilling to change, I’d try for a fully external DSL that tries to be as compatible as possible with SQL. If they know python, I’d seriously consider building a thin wrapper around sqlalchemy core (not the ORM, put down the pitchforks) and see how that goes.

      1. 2

        M4 is a rich macro definition language which also supports features like […] loops […]

        Uhhhh citation needed here; I’ve used m4 for years and while it’s good for a few things, the lack of loops is one of the most annoying shortcomings.

        You can fake it with recursive macros but it’s kind of hideous.

        1. 2

          Now let me pose this question: would this be even better with Tex as the preprocessor? The library definitions wouldn’t need the funky quoting for a start.

          1. 2

            m4 is a a general macro processor, TeX is focussed on typesetting.

            I’ve never heard of anyone using TeX outside typesetting, I’d love to learn more about it!

          2. 2

            Have a look at PRQL (www.prql-lang.org)!

            Full disclosure: I’m a PRQL contributor.

            There will be a big PRQL release in January. Best source for current state is the prql-lang.org website and the github repo (github.com/PRQL/prql).

            Also try the playground(www.prql-lang.org/playground) - there you can try live queries on real data in your browser.

            1. 1

              I was trying to translate the query in the article, and I got stuck at both parts of count(distinct $trans_fields(S)) as transactions.

              First, for distinct $trans_fields and with distinct represented as group [field] ( cnt = count ), it appears I am allowed to define a function which names specific columns, as in func trans_fields row -> [row.day, row.store, row.till, row.transaction], but I can’t seem to figure out how to get PRQL to then accept any form of that as the fields for a group? If this was grouping by some custom function over multiple keys, I could probably derive to introduce a new column of that result, and then group over the derived column, but that doesn’t work for distinct where the author of the post specifically wants a layer of indirection for the field names.

              Second, this distinct is already in a group by, so I would need to nest a group within a group? How would I translate that correctly?

              I also can’t seem to find a PRQL-native version of if or case, to use to translate sum_if, but bailing out to using S-strings seems to have worked just fine: func sum_if p x -> s"case when {p} then {x} else 0 end"

              https://gist.github.com/thisismiller/33726cce15c764f852313f4c404e6ec0 is the current state of what I have in the playground, which hits errors for (at minimum) the above reasons. I think my sum_if in the groupby probably isn’t allowed either though…

              EDIT: I kept working at it, and found that count_distinct is implemented, but not documented. I also realized group can contain a full pipeline, which means I… almost got something working, but I appear to have discovered a compiler bug as I get an error about IR not lowering. I’ll update the gist with my current state and go open an issue.

              EDIT 2: The bottom of the gist now contains a working query (the top is preserved for the bug report), but I still don’t have a solution for trans_fields.

            2. 1

              I think the author correctly identified that raw SQL is difficult to maintain for complex read operations. I would say that the right approach to manage this complexity is a domain specific language (DSL).

              The DSL should provide development/compile time feedback on what the DSL request does and whether it fits business and performance constraints.

              The implementation technology for DSL can be varying – and I think that part would be driven by functional-capabilities, ‘inhouse/dev knowledge’, build-vs-buy, enterprise-tech-stack-compliance, readiness-for-security-and-access-control integration, and similar constraints.

              In this case apparently a templating language (eg M4) was chosen, but usually in a larger enterprise there are more constraints than ‘inhouse/dev knowledge’.

              1. 1

                Reminds me of the 2000s, one would find find this practice in the wild in many PHP scripts and projects.