My first thought on reading the article title was “Wow, that sounds horrible”.
Then, I read (quickly scanned, to be honest, but I got the gist) the article, and … Yeah, it still sounds absolutely horrible.
Constraints for a less horrible solution would vary a lot depending on constraints of real-world scenarios, like do your users know only SQL, how many of them there are, technologies allowed in the company, etc., but I’d definitely draw a hard line on templating (which is what macro preprocessing is, really) for code generation. It never results in a good user experience, see C macros, helm.
If users are limited to just SQL and unwilling to change, I’d try for a fully external DSL that tries to be as compatible as possible with SQL. If they know python, I’d seriously consider building a thin wrapper around sqlalchemy core (not the ORM, put down the pitchforks) and see how that goes.
Now let me pose this question: would this be even better with Tex as the preprocessor? The library definitions wouldn’t need the funky quoting for a start.
I was trying to translate the query in the article, and I got stuck at both parts of count(distinct $trans_fields(S)) as transactions.
First, for distinct $trans_fields and with distinct represented as group [field] ( cnt = count ), it appears I am allowed to define a function which names specific columns, as in func trans_fields row -> [row.day, row.store, row.till, row.transaction], but I can’t seem to figure out how to get PRQL to then accept any form of that as the fields for a group? If this was grouping by some custom function over multiple keys, I could probably derive to introduce a new column of that result, and then group over the derived column, but that doesn’t work for distinct where the author of the post specifically wants a layer of indirection for the field names.
Second, this distinct is already in a group by, so I would need to nest a group within a group? How would I translate that correctly?
I also can’t seem to find a PRQL-native version of if or case, to use to translate sum_if, but bailing out to using S-strings seems to have worked just fine: func sum_if p x -> s"case when {p} then {x} else 0 end"
EDIT: I kept working at it, and found that count_distinct is implemented, but not documented. I also realized group can contain a full pipeline, which means I… almost got something working, but I appear to have discovered a compiler bug as I get an error about IR not lowering. I’ll update the gist with my current state and go open an issue.
EDIT 2: The bottom of the gist now contains a working query (the top is preserved for the bug report), but I still don’t have a solution for trans_fields.
I think the author correctly identified that raw SQL is difficult to maintain for complex read operations.
I would say that the right approach to manage this complexity is a domain specific language (DSL).
The DSL should provide development/compile time feedback on what the DSL request does and whether it fits business and performance constraints.
The implementation technology for DSL can be varying – and I think that part would be driven by functional-capabilities, ‘inhouse/dev knowledge’, build-vs-buy, enterprise-tech-stack-compliance, readiness-for-security-and-access-control integration, and similar constraints.
In this case apparently a templating language (eg M4) was chosen, but usually in a larger enterprise there are more constraints than ‘inhouse/dev knowledge’.
Step 2 would be to use a “real programming language” (Python) with an “extensible templating system” (Jinja) … and then you’ll have reinvented DBT.
My first thought on reading the article title was “Wow, that sounds horrible”.
Then, I read (quickly scanned, to be honest, but I got the gist) the article, and … Yeah, it still sounds absolutely horrible.
Constraints for a less horrible solution would vary a lot depending on constraints of real-world scenarios, like do your users know only SQL, how many of them there are, technologies allowed in the company, etc., but I’d definitely draw a hard line on templating (which is what macro preprocessing is, really) for code generation. It never results in a good user experience, see C macros, helm.
If users are limited to just SQL and unwilling to change, I’d try for a fully external DSL that tries to be as compatible as possible with SQL. If they know python, I’d seriously consider building a thin wrapper around sqlalchemy core (not the ORM, put down the pitchforks) and see how that goes.
Uhhhh citation needed here; I’ve used m4 for years and while it’s good for a few things, the lack of loops is one of the most annoying shortcomings.
You can fake it with recursive macros but it’s kind of hideous.
Now let me pose this question: would this be even better with Tex as the preprocessor? The library definitions wouldn’t need the funky quoting for a start.
m4 is a a general macro processor, TeX is focussed on typesetting.
I’ve never heard of anyone using TeX outside typesetting, I’d love to learn more about it!
Have a look at PRQL (www.prql-lang.org)!
Full disclosure: I’m a PRQL contributor.
There will be a big PRQL release in January. Best source for current state is the prql-lang.org website and the github repo (github.com/PRQL/prql).
Also try the playground(www.prql-lang.org/playground) - there you can try live queries on real data in your browser.
I was trying to translate the query in the article, and I got stuck at both parts of
count(distinct $trans_fields(S)) as transactions
.First, for
distinct $trans_fields
and withdistinct
represented asgroup [field] ( cnt = count )
, it appears I am allowed to define a function which names specific columns, as infunc trans_fields row -> [row.day, row.store, row.till, row.transaction]
, but I can’t seem to figure out how to get PRQL to then accept any form of that as the fields for agroup
? If this was grouping by some custom function over multiple keys, I could probablyderive
to introduce a new column of that result, and thengroup
over the derived column, but that doesn’t work for distinct where the author of the post specifically wants a layer of indirection for the field names.Second, this
distinct
is already in a group by, so I would need to nest a group within a group? How would I translate that correctly?I also can’t seem to find a PRQL-native version of
if
orcase
, to use to translatesum_if
, but bailing out to using S-strings seems to have worked just fine:func sum_if p x -> s"case when {p} then {x} else 0 end"
https://gist.github.com/thisismiller/33726cce15c764f852313f4c404e6ec0 is the current state of what I have in the playground, which hits errors for (at minimum) the above reasons. I think my sum_if in the groupby probably isn’t allowed either though…
EDIT: I kept working at it, and found that
count_distinct
is implemented, but not documented. I also realized group can contain a full pipeline, which means I… almost got something working, but I appear to have discovered a compiler bug as I get an error about IR not lowering. I’ll update the gist with my current state and go open an issue.EDIT 2: The bottom of the gist now contains a working query (the top is preserved for the bug report), but I still don’t have a solution for
trans_fields
.I think the author correctly identified that raw SQL is difficult to maintain for complex read operations. I would say that the right approach to manage this complexity is a domain specific language (DSL).
The DSL should provide development/compile time feedback on what the DSL request does and whether it fits business and performance constraints.
The implementation technology for DSL can be varying – and I think that part would be driven by functional-capabilities, ‘inhouse/dev knowledge’, build-vs-buy, enterprise-tech-stack-compliance, readiness-for-security-and-access-control integration, and similar constraints.
In this case apparently a templating language (eg M4) was chosen, but usually in a larger enterprise there are more constraints than ‘inhouse/dev knowledge’.
Reminds me of the 2000s, one would find find this practice in the wild in many PHP scripts and projects.