1. 49

Hi everyone,

I’ve been working on Preql for the past year. This is my first time posting it online, and I can’t think of a better community for introducing it than lobsters.

Preql came out of my love for compiler theory, and my long frustration with SQL, both as a data engineer and as a web developer. The main insight that propelled me to write it was to treat SQL the same way C treated Assembly - as a solid base on which to build high-level programming.

This is still a very young project, with a lot of missing features, but I believe it is already useful, and can be used to do real work.

I’m looking forward to hearing your thoughts, ideas, and even criticisms :)

  1.  

  2. 13

    The license is interesting; I can’t recall ever seeing this before. Is that something you added yourself? If I search for “Interface-Protection Clause” then it just turns up this project.

    It’s kind of contradictory though, because first it says:

    Without limiting other conditions in the License, the grant of rights under the License will not include, and the License does not grant to you, right to Sell the Software.

    But then in the MIT license:

    Permission is hereby granted [..] to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software.

    Wouldn’t it be clearer to just modify/write a new license instead of putting a clause on top which contradicts the next clause?

    1. 6

      I’ve spent a lot of time thinking about the license, and I could not find anything that fit what I wanted. It’s a long discussion, but I believe the popular licenses are flawed, in that they don’t properly support those who want to be able to write open-source for a living.

      The one that came the closest was The Commons Clause (https://commonsclause.com/), and I modified it. But the “contradiction” you mention was already there, purposely put by a well-respected lawyer.

      Wouldn’t it be clearer to just modify/write a new license

      Maybe I think too much like a programmer, but I don’t see why I shouldn’t write my license using a “library license”, well-tested and recognized, instead of reinventing the entire wheel from scratch.

      1. 23

        Maybe I think too much like a programmer, but I don’t see why I shouldn’t write my license using a “library license”, well-tested and recognized, instead of reinventing the entire wheel from scratch.

        This is akin to having a lawyer write your auth code.

        1. 3

          Right; fair enough. I had temporarily forgotten about the Commons Clause 😅 I never really liked the “tacking on” of that either, but if it works (legally speaking), it works, so whatever.

          I think your license and modifications to the Commons Clause make sense, by the way; by removing the “consulting/ support services related to the Software” and changing “whose value derives, entirely or substantially, from the functionality of the Software” to “the third party having access to the interface of the Software” it limits the scope a lot to what you actually want to prevent.

          That being said; I’m very far removed from being a lawyer. Perhaps @kemitchell wants to chime in?

          1. 3

            don’t properly support those who want to be able to write open-source for a living.

            I recommend using a strong copyleft license (GPLv3 or AGPL) and dual-licensing. You can sell licenses that allows use inside proprietary software or services (that do not provide source code and rights to the end-user).

            And if someone complies with the copyleft license and provides all the rights and code to the end-user and make some money while doing it – does it harm you in any way?

            1. 2

              What the commons clause claims in its FAQ vs what it says in its content are very different. To quote the clause itself:

              For purposes of the foregoing, “Sell” means practicing any or all of the rights granted to you under the License to provide to third parties, for a fee or other consideration (including without limitation fees for hosting or consulting/ support services related to the Software), a product or service whose value derives, entirely or substantially, from the functionality of the Software. Any license notice or attribution required by the License must also include this Commons Clause License Condition notice.

              I specifically want to call out this bit:

              a product or service whose value derives, entirely or substantially, from the functionality of the Software.

              What does “substantially” mean? When does use of software under the commons clause become substantial enough to be in violation of the license? More than anything, lawyers hate ambiguity. This line is very ambiguous.

              Your modified version of the license reads:

              a product or service whose value depends, entirely or substantially, on the third party having access to the interface

              “Access” is not defined. One could argue that “access” means that the code paths are being exercised by code available to a third party. This modification, in my opinion, is more restrictive and more ambiguous than the original commons clause.

              I understand that your objective is to prevent this from being repackaged and sold directly as a product. This is a pretty hot topic in general right now and I’d say no one has quite figured it out. That said, based on my experience with 3p OSS licensing, no company would knowingly accept this license for any use case. Commons clause is already a tough sell and the modifications made make it near impossible. If you’re open to feedback, I would strongly consider using the Commons Clause language without modification.

              1. 2

                Thanks, I appreciate the feedback. I was also concerned about the meaning of substantially, but since it only applies to access to the interface, that shouldn’t be a problem. I suppose I can be more specific about the use of the word “access”. In essence, it can be solved by being described as a dataflow query (i.e. taint analysis).

                But the Commons Clause, as it is, is too restrictive, and will drive away many of the early adopters.

                I’m definitely open to feedback. Like you said, no one has figured it out yet. And I might change my license in the future. But at the moment, I don’t see an alternative.

              2. 2

                Maybe I think too much like a programmer, but I don’t see why I shouldn’t write my license using a “library license”, well-tested and recognized, instead of reinventing the entire wheel from scratch.

                Common practice is that amendments* to a license loosens original rules and adds more rights to the licensee. So using a name of a well-known license as your „library license“ and adding a restricting amendment is at least confusing.

                Using the name of an original free software / open-source license is quite parasitic and harmful. Common users will see Apache or MIT + something else that they do not read. Yes, it is their fault that they do not read and understand it fully. But I consider it unethical to exploit someone else’s mistakes this way. Such common users will then doubt whether Apache, MIT or any other free software license grants them rights that it actually grants them.

                Nobody prevents you from creating a new license or a new category of software. But please, do not exploit existing licenses and categories, well known-names and do not confuse users.

                *) for example the Classpath Exception that could be added to GPLv2 – licensee can do everything allowed by GPLv2 + something else allowed by the amendment.

                1. 1

                  Have you thought about using something like https://spdx.org/licenses/BUSL-1.1.html ?

                2. 6

                  Good to notice. Then it is neither free software nor open-source.

                  1. 4

                    It is free to use, and the source is available for everyone to see and modify.

                    1. 19

                      This license will ensure that nobody will touch this software.

                      1. 9

                        I don’t see why; it doesn’t restrict almost any use case, and I don’t see any fundamental objections, and minor issues of language (if any) are fixable. Who cares about some arbitrary list that some arbitrary organisation made decades ago?

                        1. 4

                          It essentially makes it against the license to pay anyone else to make modifications of the software, and that could include employees.

                          That arbitrary list is fundamental to the essence of the license, so who cares? The legal department cares.

                          1. 4

                            It essentially makes it against the license to pay anyone else to make modifications of the software

                            What makes you think so?

                            1. 3

                              For purposes of the foregoing, “Sell” means practicing any or all of the rights granted to you under the License to provide to third parties, for a fee or other consideration (including without limitation fees for hosting the Software), a product or service whose value depends, entirely or substantially, on the third party having access to the interface of the Software.

                              So this would seem to limit any ability to retrieve monetary compensation for any of the rights later listed in the license: “including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,”

                              1. 4

                                “Sell” means … a product or service whose value depends, entirely or substantially, on the third party having access to the interface of the Software.

                                If the Preql language itself isn’t a substantial part of your product (and in most cases, it wouldn’t be), then you’re not selling it, the clause doesn’t apply, and your license is effectively MIT.

                                1. 8

                                  If the Preql language itself isn’t a substantial part of your product (and in most cases, it wouldn’t be)

                                  Imagine that you are in that position – author of such product or a manager. How could you be sure that it is „not a substantial part“?

                                  1. 1

                                    If you are not sure, then it probably is. I don’t see how it could become such, unless it’s on purpose.

                        2. 2

                          I hope you’re wrong.

                          1. 16

                            Not a lawyer, but in many of the companies I worked for, there was a whitelist of licences. If we wanted to use something and license is on the whitelist, we were free to do so. If the license is on another list, if we build a good enough case and endure several meetings with managers, we could use it. If the license is completely new, that means involving lawyers, which almost no one bothered to do and people just went looking for other stuff or implementing things from scratch.

                            1. 7

                              I know you don’t create these policies and that it’s not your intention to say exactly this, but what you’re saying pretty much comes down to “gimme free labour or I don’t want it” in practice.

                              1. 10

                                I doubt it’s about it being free; most companies would gladly shell out money to buy nonfree software, yet still be very wary about using code under a legally untested or unclear license. It’s all about ass-covering; companies are terrified they might get sued by some patent troll or copyright holder (think stuff like Oracle suing Google over their reimplementation of the Java APIs).

                                1. 6

                                  Yes, exactly. Most of the companies I worked for were even requiring that we pay for support (when it existed), as any downtime costs money. As you nicely put, it’s all about risk mitigation – no one want to have their reputation tarnished or involved lawyers, as it means loss of money or reputation.

                                  1. 5

                                    Yeah, but how would you actually convince them to pay you if you use a standard open source license? There’s no obligation at all to pay anything, and in practice most won’t, or throw only breadcrumbs your way.

                                    I’m absolutely not a money-driven person, but “sending back patches” puts just as little bread on the table as “make me free stuff for exposure” does for artists. I feel a lot of people are rather naïve about this. Sure, open source ideals and all of that are wonderful and in a perfect world we’d all be happily coding together and paying our fair share. But not everyone is playing by the same rulebook here. Jeff Bezos’ goal is to make profit, nothing more, nothing less. If he can make profit by paying you $1 instead of $10 while netting $100k then he will have no qualms in doing so. He doesn’t care one iota if that $1 is a living wage or anywhere near “fair” or proportional. Jeff Bezos is hardly an outlier here.

                                    How much money does CodeYellow send to Debian, PostgreSQL, PHP, Laraval, and all the other projects you use (assuming you still work there)? Perhaps CodeYellow is better most – that would be nice – but all companies I worked for that amount was literally “$0”. And sure, CodeYellow provides plenty of added value, but just as a builder pays for their raw materials, it doesn’t strike me as unreasonable that CodeYellow (and AWS, etc. etc.) also pays for them.

                                    At any rate, if you’re concerned about the non-standard license then you can simple contact the author to buy a different one. Sure, it’s a bit more effort, but you can’t have your cake and eat it too.

                                    1. 2

                                      Yeah, but how would you actually convince them to pay you if you use a standard open source license?

                                      That’s one of the great unsolved problems of our age. I think there are some nice initiatives around (like for example GitHub’s sponsoring option) but they’re not gaining a lot of traction yet.

                                      There’s also the question of distribution: how do you fairly distribute any payment/sponsorship across a large open source project? One-man projects are a different story of course, but a lot of the widely used projects are multi-person efforts.

                                      I’m absolutely not a money-driven person, but “sending back patches” puts just as little bread on the table as “make me free stuff for exposure” does for artists. I feel a lot of people are rather naïve about this.

                                      Totally agree on this. For myself, open source was always more about freedom of source and didn’t care as much about money, but then the vast majority of companies don’t subscribe to these ideals and just take the free software and lock it away. And it could be argued the world would be a much better place if people would be able to make a living off of free software, because it would mean more viable free software could make its way into the hands of actual users, and less developers would feel like they’d have to work on proprietary software to support their free software producing “hobby”.

                                      How much money does CodeYellow send to Debian, PostgreSQL, PHP, Laravel, and all the other projects you use (assuming you still work there)?

                                      I left a couple of months ago, I’m now working at bevuta IT on a project for the EU, which is likely to be open sourced.

                                      Code Yellow actually sponsored a CI test server for the CHICKEN project and of course I was paid to work on some open source projects of our own. But I agree with you, most companies don’t pay at all, and as far as I remember, Code Yellow did not sponsor any of the projects we were actually using day to day (unless you count the odd code contribution and bugfix) that weren’t developed in-house.

                                      1. 5

                                        I think GitHub sponsors is great and I don’t want to knock it, but it’s also not really a viable business model. You certainly can make a living out of it (people are), but it’s not easy, mostly works well for individual devs (rather than business), and is not suitable for all projects either (generally only for “user visible” projects, and even thn only for “dev users”, rather than the general population).

                                        Take something like vim-go for example; the current maintainer nets $102/month (the previous maintainer got a bit more, about $400/month IIRC). All of that is nice pocket change, but is it really in proportion to the usage? Even with just 5,000 companies using it (probably a low estimate) times $1/month we would be talking about $5k/month. We could discus what a “fair” amount would be exactly for a long time, but these are tiny investments, and if you consider your devs will get a better editor in return it’s an investment that will actually give you a return too.

                                        The entire world and their mother is running OpenSSH, but OpenBSD still gets in to intermittent financial troubles. They run on $300k/year (2020 fundraising goal; CAD, about 240k USD), which may sound like a lot, but this is the salary of what, one dev in silicon valley? Maybe two? Even by the lower European wages we’d be talking about ~4 devs. Spread out over all users of the software this is nothing. The kind of money you won’t even bother picking up if you drop it.

                                        If every company would pay $1 for ever open source tool, library, or application they use we’d be talking about a $100-$200/month investment, at the most, and often probably much smaller (let’s ignore npm and it’s 20k dependencies, that’s a special problem). This wouldn’t solve everything, but it’s a simple start.

                                        Yet it’s not happening; not even close. In my observation most people sponsoring projects tend to be private individuals like you and me, who care about these things. I feel a big reason it’s not happening is because companies don’t have to and just can’t be bothered. So what you need is the proverbial “stick behind the door”, which is what these kind of licenses are. It’s sad that this is needed, but it is what it is.

                                        Or, maybe, what we need is a single organisation that you can tell “we use these projects” and pay that $100/month (or whatever is appropriate), and they distribute it to the projects. This would certainly simplify the administration and burden for the companies, as well as lower payment processing overhead and the like. One would imagine that something like the FSF would do that, but they seem more concerned with their crusade to fight the Open Source pagan heretics, or something, and the OSI doesn’t seem to be doing much at all except bicker amongst themselves and write blogspam.

                                        It’s kind of hilariously sad that GitHub Sponsors is an initiative from a company mostly writing proprietary software. I’m not a purist and don’t see it as a terribly great problem as such, but my irony meter is going in to overload.

                                        For myself, open source was always more about freedom of source and didn’t care as much about money

                                        Yeah, I don’t really care about it as an end-goal either; just as a means to ensure viability.

                                        I left a couple of months ago, I’m now working at bevuta IT on a project for the EU, which is likely to be open sourced.

                                        Aha, nice; it mentions that “CHICKEN Scheme is one of our favourite programming languages” :-) Looks like Felix is working there as well?

                                        1. 4

                                          Looks like we’re in total agreement about this.

                                          So what you need is the proverbial “stick behind the door”, which is what these kind of licenses are. It’s sad that this is needed, but it is what it is.

                                          Here you run into the problem that these licenses are “new” and untested in court, so legal departments won’t touch them with a ten-foot pole. It also doesn’t help that they don’t match the OSI’s criteria (or the FSF’s), which such licenses won’t be endorsed by trusted parties.

                                          maybe, what we need is a single organisation that you can tell “we use these projects” and pay that $100/month (or whatever is appropriate), and they distribute it to the projects.

                                          This sounds like a good solution and something the FSF (or FSFe) could be doing. Have you contacted them to suggest something like this? It seems that recently the FSF is doing some good advocacy (like the public money, public code campaign and the ethical tech giving guide) instead of just preaching to the choir like they used to.

                                          Aha, nice; it mentions that “CHICKEN Scheme is one of our favourite programming languages” :-) Looks like Felix is working there as well?

                                          Yep, I have several colleagues that I’ve known for a while and am happy to finally be working with them!

                                          1. 1

                                            While I’ve argued the that FSF/OSI should do more direct funding of projects many times before, I didn’t have that specific idea until I wrote that comment. So, no, I haven’t contacted them yet :-) You’re the first I told it to! Consider yourself special!

                                            I hope to finish my article about this soon™; I’ll be sure to post it to those kind of channels as well. It’s good to hear the FSF is improving (finally…)

                                            As for the license, you can still sell access to a more standard license if you want, like SQLite does (even though it’s public domain).

                                  2. 5

                                    This is exactly how corporations or software departments often work. If something is free of charge, they will use it. Why not? On the other hand, they are usually ready to pay for proprietary software or external services. So the strong copyleft + dual-licensing is the way to go. Your software will then fall in the „proprietary software“ category for them, but it is no problem. Their lawyers will be happy in the same way as if the company buys a license for Photoshop, AutoCAD, Oracle DB, Total Commander, WinRAR etc. and you will get money.

                                2. 3

                                  Well with regular software licensed in common open source ways, nobody else is prevented from modifying it or hosting it for a fee. You have essentially limited the users of to exactly one person who can do anything involving any kind of money for the software. That is a non-starter for almost anyone outside of the plain hobbyists - and it effectively preventing any adoption besides a few insignificant users.

                                  Looking at the license, it isn’t entirely clear what “interface of the Software” means either.

                                  1. 4

                                    Then I think you misunderstood.

                                    It is possible to create commercial software that uses Preql, with or without modifications, and sell it however you want. The only limitation is that in that case, you can’t provide the language itself to your users (i.e. the interface). That restricts specific use-cases, such as offering it as a cloud service to developers, or selling a Preql plugin to existing database software.

                                    1. 2

                                      It is possible to create commercial software that uses Preql, with or without modifications, and sell it however you want.

                                      you can’t provide the language itself to your users (i.e. the interface).

                                      Can you explain what exactly that means? How do I include the language without modifications but not provide the language to the users??

                                      1. 6

                                        It means you can’t provide a product which is built around the ability for users to use the language directly.

                                        Integrating it in your desktop or web app and selling that is okay.

                                        Making AWS SuperQL™® or CloudPostMyMariaSQLiteBlockchainML++ which are essentially cloud versions of of Preql (and thus expose the language/interface to users directly, as a major feature) would not be okay.

                                        In short, it’s intended to prevent exploitation from these kind of people, as they have done before and will undoubtedly continue to do in the future.

                                        1. 3

                                          Let’s say you are writing a product that uses Preql to save and fetch user data from the database. It is called from internal routines, that respond to unrelated user activity. In that case, there is no reason for the user to ever write any Preql code, which means he isn’t being given access to the language. In that case, Preql can be used under MIT conditions, and you can sell your product without a worry.

                                3. 3

                                  “free to use” and “free software” are not the same thing. I don’t know if your rider actually excludes you from being free software or not.

                            2. 7

                              First off, congrats on putting this out there. There have been many attempts to replace or abstract over SQL with few successes. In the README you note the target audience:

                              It is designed for use by data engineers, analysts and data scientists.

                              Based on my experience these folks have different degrees of experience with SQL (ranging from 0-100) and likewise the comfort/willingness to learn alternate languages.

                              Can you speak more towards who the actual audience is? Likewise, what are the actual advantages of Preql over well-written, properly managing SQL code? It’s all code at the end of the day, so what advantages would inspire me to use this?

                              1. 8

                                Thanks!

                                There have been many attempts to replace or abstract over SQL with few successes

                                My approach is to use the existing SQL databases, and always allow a fallback to SQL if necessary. I’m not aware of other attempts that took this approach (there is one library for R that does it halfway)

                                what are the actual advantages of Preql over well-written, properly managing SQL code?

                                I’d say there are many advantages. For example, it is very hard to re-use code in SQL, because there is very little support for functions. Even Postgres, that supports defining them, won’t let you pass them as arguments. In Preql functions are first-class, and act just like functions do in Python or Javascript. Also, since the functions (and the rest of the code) are part of the client, and not the server, you can use version control, such as git, to keep history, code review, accept pull-requests, and so on.

                                Another one is type-safety. SQL’s approach to types causes a lot of trouble, for example SELECT "a" + 1; returns 1, which makes no sense, and is hard to debug in a long query. In Preql, "a"+1 throws an exception. In addition, SQL’s handling of NULLs is a source of many issues.

                                I don’t want to get too long-winded. A few more points are mentioned in this page: https://preql.readthedocs.io/en/latest/comparison_sql.html

                                who the actual audience is

                                I think there are several audiences -

                                • Anyone who has to deal with complex business logic using databases. I’ve seen SQL queries that were hundreds of lines long, and they were very hard to understand and maintain.

                                • Data Engineers, who need to pipeline structured data, change its configuration or structure. That’s very common for ML projects, but not only.

                                • Analysts who work with databases, and need to interact with them on a daily basis. The short syntax, REPL autocompletion, and the ability to create your own library of functions (e.g. for research), can be very useful in the daily work.

                                I hope that clears things up a little!

                                1. 5

                                  SQL’s approach to types causes a lot of trouble, for example SELECT “a” + 1; returns 1, which makes no sense, and is hard to debug in a long query. In Preql, “a”+1 throws an exception. In addition, SQL’s handling of NULLs is a source of many issues.

                                  This depends on the dialect, right? SQLite is very dynamic, but PostgreSQL will error out on this unless you explicit cast things (and 'a'::int will error out as well). In MySQL/MariaDB it’s dynamic by default (or used to be, not sure what the current defaults are), but can be configured to be more strict. I’m not sure about other engines as I never worked with them.

                                  At any rate, the biggest question I have for something like this is one of performance; sometimes even quite simple changes to a query can cause huge performance differences. How “smart” is Preql in generating reasonably optimised queries?

                                  1. 4

                                    This depends on the dialect, right?

                                    Absolutely right. Which is actually part of the problem. Every dialect has different restrictions, semantics, and syntax, and it’s hard to remember all the little gotchas.

                                    How “smart” is Preql in generating reasonably optimised queries?

                                    Right now Preql mostly relies on the database engine to optimize the query. But, it’s definitely in my future plans. The SQL passes through being an Ast just before the final compilation, which makes at least some class of optimizations relatively easy.

                                    But I think that well-written and performance-aware Preql code currently already generates reasonably efficient SQL statements.

                                  2. 3

                                    I’m not aware of other attempts that took this approach (there is one library for R that does it halfway)

                                    Wanted to ask if you could take a look at this Haskell package [1] OpalEye and see if does something similar (albeit for Postgres only).

                                    With regards to the License. Yes, I think in any corp with staff lawyers, this will not be allowed (because it is different, and it has lots of ambiguity in the intent: * as long as your product doesn’t base its value on exposing the Preql language itself to your users.* ) But smaller shops, devs would look at it, still, I think.

                                    I was trying to think if I could come up with a suggestion for your License (am I not a lawyer).

                                    But this was not easy for me (although I like challenges fitting stuff into legal frameworks) couple of obstacles:

                                    • (a) First I am not sure that a programming language definition (or interface as you imply) – can be copyrighted in all the jurisdictions (EU, USA, etc).

                                    Probably library part (if it exists) could be copyrighted, but I think language is more problematic. see [2], and see [3]

                                    So you can certainly copyright your work doing something with language definition, but not the language definition. Then, the issue with your license is that it relies on a notion of an ‘interface’, which is not copyrightable. And I had rarely seen a license where ‘interface’ was well a well defined terminology.

                                    • (b) Again challenging your inclusion of word ‘Interface’.

                                    If I create a language just like yours, by simply prefixing every keyword in your language (so not parenthesis, commas, semicolons) with ‘vL-’. Then I would expose that to my users, and then strip the ‘vL-’ to passing to your interpreter. Did I just ‘bypass’ a requirement for commercial license ?

                                    • (c) I think your intention is to prevent a Compute resource provider, to integrate your language into their offer (without obtaining commercial license).

                                    For example say, a well known corp A offers a paid ‘lambda function’ service were developers get charged by how often their program is switched from dormant to active state, and that program can be written by a developer in Python or Preql or Python+Preql.

                                    So, you do not want them to offer PreQL without getting a commercial license from you, (let me know if I am interpreting the intent correctly)?

                                    If yes, then may be you can recast your definitions. Do not use ‘interface’. and see if you can fit into software-as-service or language-interpreter-as-service.
                                    Then you can look at CocroachDB license and see if something you can leverage. And that would also solve (b).

                                    Only in that case, remember that people will not be able to build PreQL playground online services (paid or unpaid or ad based) without getting your commercial license…

                                    [1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/master/Doc/Tutorial/TutorialBasic.lhs

                                    [2] https://phys.org/news/2011-11-language-copyrighted-eu-court.html

                                    [3] https://cdt.org/insights/copyright-week-software-interfaces-shouldnt-be-copyrighted/

                                    1. 3

                                      Thank you for your thoughtful response, and sorry for taking a while to reply.

                                      I do see some similarity between OpalEye and Preql. I would say that OpalEye is somewhere on the continuum between ORM and Preql, and I can also see some advantage to the Haskell integration. But personally, I prefer the clean syntax of Preql, and the integration with Python is especially useful for data-science. Also, it is seems to me that my implementation is much more advanced.

                                      With regards to the License

                                      Thanks for the warning. To be honest, I’m not too concerned with corporations. They very rarely, if ever, contribute back to open-source, and my impression is that their developers are usually those who follow trends, rather than set them.

                                      I’m much more interested in attracting use by open-source projects and small startups.

                                      As for your points:

                                      a) I am not copyrighting the language, but the implementation. If anyone wants to write an entirely new implementation of Preql, they can do that. But if they want to use mine, I should be able to determine how it’s used.

                                      b) You are describing an absurd product that no one will ever make or use. I believe that it’s possible to make the wording less ambiguous, if it comes to that.

                                      c) You got the gist of my intention. I also want to prevent database and analytics software from offering up Preql as an interface.

                                      I think that Cockroach’s license doesn’t fit Preql, and they are very different kinds of software, despite both working with databases. But I appreciate your advice.

                                      So far, the main thing that might persuade me to reconsider my license, is just how much attention it gets. I’d much rather that people focus on Preql itself, rather than the license (no matter if negatively or positively), but for some reason it attracts a lot of fascination.

                                      1. 2

                                        thank you for the follow up. Hope to see PreQL grow into a standard tool for data exploration!

                                    2. 1

                                      Another one is type-safety. SQL’s approach to types causes a lot of trouble, for example SELECT “a” + 1; returns 1, which makes no sense, and is hard to debug in a long query.

                                      Definitely depends on the dialect, the statement above will fail with a type mismatch on SQL Server. SQLite does return 1, but it’s typeless by design: https://www.sqlite.org/datatypes.html

                                  3. 3

                                    Very cool project! I‘m definitely gonna follow the development. Sadly I dont have a use case at the moment for Preql.

                                    Just because out of curiosity, not because I know better. What do you have against the AGPL license?

                                    As I see it with your current license you dont want other people (especially large cloud providers) benefitting from your work. This is very relatable, but Id ask myself what license would make Preql really great. If Preql will get great and find large adoption they will be more money for you than you can reasonably spend. What do you think?

                                    On the use cases in data engineering: They already have solutions like Dask or Spark (provided the data already is in a data warehouse which is not Postgres). Where do you see preql fit in? As a glue language between the production SQL server and the data warehouse or as a end-to-end solution. It might be a little early, but Im interested on your take.

                                    PS: You might add the tag „show“.

                                    1. 4

                                      Thank you!

                                      What do you have against the AGPL license?

                                      The AGPL is both too restrictive for my taste, and too lenient at the same time. It allows exploitation by big cloud providers (see: why mongodb changed to SSPL), and at the same time, it restricts companies from using the AGPLed project in their product, or even projects that use that project. So, a project with MIT license can’t use an AGPL library. The current Preql license, in contrast, doesn’t have that problem. As long as you don’t expose the language, you can include it in regular MIT or Apache licensed projects (or commercial ones) without having to make any change to their license or use.

                                      Where do you see preql fit in?

                                      I think that, especially at this early stage, Preql’s main advantage is its agility and convenience for interactive work, for example when exploring a database, playing around with data, or when scripting a data pipeline.

                                      It seems that Dask and Spark are both production-oriented solutions, but they still suffer from the advantages and disadvantages of their core technologies, and so there’s still space to solve problems and improve the user experience. For example, Pandas is nice for self-contained tables, but isn’t adequate for relational data (i.e. that spans across several tables). Dask doesn’t offer a solution for that.

                                      Since Spark is basically just a collection of technologies, perhaps one day it could include Preql as well!

                                      PS: You might add the tag „show“.

                                      Thanks, I did so. And thanks for the great questions!

                                    2. 3

                                      Very cool! Nice work! I think the project would benefit from a home page, with some easily viewable examples, showcasing the various features of the language.

                                      1. 2

                                        Really cool project, thanks for sharing! I’m looking forward to trying it out on one of my projects later this year.

                                        While looking at the documentation, I’ve seen a couple of things I’m curious about:

                                        • Why did you decide to go with /~ for a integer division? Since the host language is Python, wouldn’t it make more consistent to have the same operator?
                                        • Is list application list{ code } structured that way to mimic functions (func foo(x) { code })?
                                        1. 2

                                          I’m glad you like it!

                                          I wanted to the syntax to follow Javascript/Go as much as possible, which means // should signify a comment. I’m open to suggestions, but personally I think /~ is a pretty good symbol for it.

                                          Is list application list{ code } structured that way to mimic functions (func foo(x) { code })?

                                          No, they are different constructs. The first one returns all the fields and only allows expressions, while the second one only returns the last, and allows statements too (like imports, or assigning variables)

                                        2. 2

                                          Hm this looks cool! What does the IR look like?

                                          I was thinking of doing something similar with a common table expression as the base unit of the IR. I don’t know if that’s how ORMs already work, or if that kind of IR is optimized efficiently by engines, etc.

                                          Do ORMs actually have an SQL IR or are they more like text processors?

                                          1. 2

                                            There isn’t exactly an IR. The Preql-Ast is compiled into “interactive” objects that construct the Sql-Ast on the run, by being applied to each other. Then the Sql-Ast is compiled into actual SQL text, according to the target database. That’s basically how SQLAlchemy does it, and I imagine most other ORMs, except that they start with the interactive objects.

                                            Using CTEs is pretty convenient, but it’s actually bad for performance. Many database engines, for example Postgres, optimize the individual queries inside the WITH, but won’t optimize across the WITH. So unless you’re using it to eliminate repetition, your query will optimize better as a single huge select.

                                            1. 5

                                              Since Postgres 12 CTE are not an optimization fence anymore, so it should perform more or less like inlined query.

                                              1. 1

                                                Good to know!

                                                1. 3

                                                  Note that that’s only true for CTEs that are used once, if I recall correctly! Then they will be inlined. Otherwise they’re still an optimization fence.

                                              2. 1

                                                OK interesting, thanks for the info.

                                                I’d be interested if sqlite can optimize across CTE’s… That is my likely target. if anyone knows or has a reference, please share :)

                                            2. 2

                                              Good luck with the project. Hopefully you will change your mind and switch the license to MIT/BSD.

                                              1. 2

                                                Very interesting! Congratulations on getting it this far.

                                                I think there is a clear case for light-weight pre-processor-style SQL wrappers, a bit like Sass for CSS or various templating languages for HTML. This is clearly more ambitious - if I were to use Preql, I would likely output SQL files for manual inspection and then install that code as views or functions in the db itself.

                                                since the functions (and the rest of the code) are part of the client, and not the server, you can use version control, such as git, to keep history, code review, accept pull-requests, and so on.

                                                I think it’s very important to keep SQL code under version control, and I know the (perceived) difficulty of doing so is a common argument against keeping logic in the database - but there is a rather simple and elegant solution, namely that pioneered by subZero/PostgREST: https://docs.subzero.cloud/managing-migrations/

                                                I’m very curious about what the description “relational programming language” means to you. That term is often taken to mean languages in the Prolog tradition (ie logic programming). SQL and relational algebra/calculus is similar, but more restricted. IMO a common failure of many attempts to wrap or improve on SQL is that they discourage “relational thinking”, ie in terms of sets and relations between sets. Any thoughts on this and how Preql fits into the picture?

                                                1. 1

                                                  if I were to use Preql, I would likely output SQL files for manual inspection and then install that code as views or functions in the db itself.

                                                  I understand your reluctance to trust the magic. When I was using C in the early days, I always wanted to see exactly what Assembly code it produced, to make sure that it’s correct and efficient. But as the compilers got better, and I got busier, I stopped that altogether.

                                                  I’m very curious about what the description “relational programming language” means to you

                                                  I think the magic of SQL, which makes many people love it despite all of its downsides, is its heavy reliance relational algebra, i.e. working with sets and using joins. I took care to put this kind of algebra front and center, so there is a special operator for selection (i.e. where), projection (i.e. select), for set operations (&, |), and so on. I also put a lot of effort to make joins convenient, and I plan to keep improving them.

                                                  … Prolog tradition (ie logic programming). SQL and relational algebra/calculus is similar, but more restricted.

                                                  Do you think it’s technically possible to implement real-world logic programming using SQL? Or is it too restricted for it? (of course, anything can be done with a Turing machine, but I mean natively in the language)

                                                  1. 1

                                                    I understand your reluctance to trust the magic.

                                                    Partly that, but mostly because I prefer to store logic in the db (as I hinted above).

                                                    I think the magic of SQL, which makes many people love it despite all of its downsides, is its heavy reliance relational algebra,

                                                    Agree!

                                                    I took care to put this kind of algebra front and center, so there is a special operator for selection

                                                    Would you say then the resulting language is mostly a sugared SQL, or a separate language in the relational model tradition, which “compiles down” to SQL? I think both are interesting and worthwhile! But it seems to me that all the quirks of SQL makes the latter approach pretty hard, if we want to make full use of SQL-as-it-exists. And I understand that’s why you provided an escape hatch to write arbitrary SQL - which seems useful but makes me wonder again if we can actually go beyond “more succinct SQL” as long as we’re working with SQL databases.

                                                    Do you think it’s technically possible to implement real-world logic programming using SQL?

                                                    With recursive CTEs you can at least theoretically see how it could be done, but a) I don’t think it could be made efficient, b) Prolog is not purely declarative, the procedural interpretation is integral to the language, and that would be very hard to model. Datalog on the other hand is precisely a logic language intended for database programming. I have been looking at the Datalog Educational System as a practical way to compare SQL, relational algebra, relational calculus and Datalog. The pyDatalog project also claims to be able to run Datalog queries on SQL databases, but I haven’t tried it or looked closely at it.

                                                    There have also been some database languages based on set/list comprehensions as they are known from Haskell and Python (and before that Miranda etc). The most successful one seems to have been CPL (Collection Programming Language; here’s a paper that describes it). I think set building notation might be the most intuitive entry into all these paradigms, and SQL, Prolog etc could be described in relation to this foundation. Just something I’ve been thinking about lately.

                                                    1. 1

                                                      Would you say then the resulting language is mostly a sugared SQL,

                                                      I guess that depends, would you say that C is mostly sugared Assembly? There is nothing that you can do with C, that you can’t do with an assembler. And yet, the comparatively high-level nature of C, and its expressiveness, end up making C programming better in almost every way.

                                                      For example, here’s code that you can write in Preql:

                                                      func apply_twice(f, x) = f(f(x))
                                                      func add1(x) = x + 1
                                                      print [1, 2, 3]{apply_twice(add1, item)}
                                                      // result is [3, 4, 5]
                                                      

                                                      This type of coding just isn’t possible in SQL, and opens up a lot of options for code-reuse, and having a real standard library, like Python does.

                                                      all the quirks of SQL makes the latter approach pretty hard

                                                      They definitely make it harder than it should be, but don’t entirely prevent it.

                                                      With recursive CTEs you can at least theoretically see how it could be done

                                                      Recursive CTEs have a very glaring deficiency. They can only UNION on the entire tuple, which means that if you want to keep track of depth in your search (and you usually would), you can’t prevent nodes from being visited twice.

                                                      I think most major databases offer alternative ways to do graph search, but CTEs have very limited utility.

                                                      Thanks for the links, and for your thoughtful response!

                                                2. 1

                                                  Interesting idea! How does Preql compare to Datalog? And wouldn’t the Affero GPL protect you from having your software exposed as a commercial service by others?

                                                  1. 1

                                                    Thanks!

                                                    How does Preql compare to Datalog?

                                                    I’m not very familiar with Datalog, but I think it’s a very different language. Preql is built with the idioms of procedural/functional programming, while Datalog is based around logic-programming (iirc).

                                                    And wouldn’t the Affero GPL protect you from having your software exposed as a commercial service by others?

                                                    No, it wouldn’t protect it from being provided as a cloud service, for example. And it would actually be too restrictive for my taste, because I want other projects to be able to use Preql as a library.