This might be the best language rant I’ve ever read. Nothing is wrong, nothing is described in hyperbole, comparisons to the rest of the ecosystem are constant and appropriate, and examples are numerous.
I agree that it’s a quality rant, but there’s a lot of stuff in there that falls into the buckets of:
“I don’t like SQL because it doesn’t have a more advanced type system/general purpose language feature I like.”
“I don’t like SQL because it doesn’t feel like a ‘modern’ ecosystem.”
“I don’t like SQL because I can’t write terse code in it.”
“I don’t like SQL because implementations aren’t compatible”.
All of those are fine, but I can’t help but wonder if this isn’t just a mismatch in what the author is used to versus the world that created SQL–and being maybe a bit blind to our own modern realities (for example, “modern ecosystems” as in JS or Ruby are superfund sites of structural abandonware spanning generations of code features whereas legacy SQL codebases still by-and-large look like SQL modulo some quirks; would this be an improvement for mission-critical code?).
~
Re: hyperbole…I guess it isn’t exaggeration if you’re instead just wrong. To wit:
Design flaws in the SQL language resulted in a language with no library ecosystem and a burdensome spec which limits innovation.
The amount of money and man hours spent, and fortunes made, using the “burdensome spec” of SQL makes me question what exactly “limiting innovation” would look like. Like, sure, it doesn’t have the greatness of an ML or a LISP…but then again Oracle outlasted Symbolics and sqlite is basically everywhere.
Additional design flaws in SQL database interfaces resulted in moving as much logic as possible to the application layer and limiting the use of the most valuable features of the database.
This might be the case in startups, but large businesses most assuredly do have as much logic as possible in the database layer–much to the dismay and distress of DBAs everywhere. Oracle, Microsoft, and other DB vendors exist in spite of free offerings because so many customers will pay for their database features (arguably this is lock-in, but that’s another discussion all together).
~
It’s a really good read and essay, and I like it as an example of a document which has (what I believe to be) incorrect conclusions but nevertheless the journey of arriving at them is well-substantiated and itself capable of giving enlightenment.
I think you’ve mischaracterized the argument. It’s not “I don’t like SQL because it doesn’t have X”. It’s that “lacking X” causes real consequences for both applications and database implementers.
As a simple example, the links on n+1 bugs and feral concurrency are particularly good, but there are lots more:
Converting queries into rest endpoints by hand is a lot of error-prone boilerplate work. Managing cache invalidation by hand leads to a steady supply of bugs. If endpoints are too fine-grained then clients have to make multiple roundtrip calls, but if they’re too coarse then clients waste bandwidth on data they didn’t need. And there is no hope of automatically notifying clients when the result of their query has changed.
I’d go as far as to say that if you don’t know what he’s talking about, then you either haven’t used SQL in a real application, or your code simply has tons of these bugs and you don’t know it :)
On the implementation side, the complexity arguments are very good and borne out of experience. The core logic for queries over relations is small; the SQL part is large.
It totals ~16kloc and was mostly written by a single person. Materialize adds support for SQL and various data sources. To date, that has taken ~128kloc
This is the same argument about UB and memory bugs when using Memory-unsafe languages like C: “Mistakes leading to dereferencing a null pointer strike me more as a fault of devs than of C.” Yes, a sufficiently careful dev can avoid the sharp edges of SQL but the language does not make it easy or provide much (if any) support.
In this case there’s a particular workflow (ActiveRecord/ORM) that’s causing mistakes rather than SQL itself. Quite a lot of folks use raw SQL for many things.
That’s not a great comparison, because these problems are introduced by a higher level of abstraction: the ORM. A better comparison would be blaming a garbage collector for introducing performance problems: a dev who knows his GC implementation well will be able to write code that doesn’t introduce too many slowdowns, just as a dev who knows his ORM implementation well will be able to write code that doesn’t introduce too many slowdowns.
Someone who doesn’t know his GC well will be surprised by difficult to predict overhead by extra garbage collections happening, and someone who doesn’t know his ORM well will be surprised by difficult to predict overhead of extra roundtrips to the database.
The N+1 query issues can happen without ORMs. It’s very common to abstract SQL queries with functions, and then simply calling those functions in a loop gives rise to the problem.
Of course, but that’s just another example of an abstraction obscuring what’s going on. I’ve seen similar problems with abstractions over other things, so hardly an SQL-specific issue. My point was that blaming UB behaviour and memory unsafety on the user rather than the language is a bit different in that that is something where abstractions help (you’re working at a too low level when you have these issues), whereas here the abstractions are the thing getting in the way (you’re working at a too high level when you have these issues).
This was an absolutely brilliant article! It was fantastically well researched and written someone with expert knowledge of the domain. I’m learning so much from reading it.
The argument about representing JSON objects in SQL were not persuasive to me. I do not really understand why this would be desirable. I see the SQL approach as a more static-typed one, where you would process JSON objects and ensure they fit a predefined structure before inserting them into SQL. For a more dynamic approach where you just thrown JSON objects into a database you have MongoDB. On that note I think the lack of union types in SQL is a feature more than a limitation, isn’t it?
Excellent point about JOIN syntax being verbose, and the lack of sugar or any way to metaprogram and define new syntax. The query language could be so much more expressive and easy to use.
It totals ~16kloc and was mostly written by a single person. Materialize adds support for SQL and various data sources. To date, that has taken ~128kloc (not including dependencies) and I estimate ~15-20 engineer-years
I think these line counts say a lot! The extra work trying to fulfill all the criteria of the SQL standard isn’t necessary work for the implementation of a database system. A more compact language specification would enable implementations to be shorter and enable people to learn it much more easily.
The overall vibe of the NoSQL years was “relations bad, objects good”.
The whole attitude of the NoSQL movement put me off it a lot. Lacking types and structure never sounded like an improvement to me - more like people just wanted to skip the boring work of declaring tables and such. But this work is a foundation for things to work smoothly so I think the more dynamic approach will often bite you in the end. But then the author explains more about GraphQL and honestly it sold me on GraphQL, I would be very open to using that in future rather than SQL after reading this.
Strategies for actually getting people to use the thing are much harder.
This is a frustrating part about innovation in programming but honestly I believe that the ideas he has presented represent too significant an improvement that they are just too good for people not to start using.
If you have data encoded in a JSON format, it often falls naturally into sets of values with named fields (that’s the beauty of the relational model) so you can convert it into a SQL database more or less painlessly.
On the other hand, if you want to store actual JSON in a SQL database, perhaps to run analytical queries on things like “how often is ‘breakfast’ used as a key rather than as a value”, it’s much more difficult, because “a JSON value” is not a thing with a fixed representation. A JSON number might be stored as eight bytes, but a JSON string could be any length, never mind objects or lists. You could create a bunch of SQL tables for each possible kind of JSON value (numbers, strings, booleans, objects, lists) but if a particular object’s key’s value can be a number or a string, how do you write that foreign key constraint?
Sure, most applications don’t need to query JSON in those ways, but since the relational model is supposed to be able to represent any kind of data, the fact that SQL falls flat on its face when you try to represent one of the most common data formats of the 21st century is a little embarrassing.
That’s what the post means by “union types”. Not in the C/C++ sense of type-punning, but in the sense of “a single data type with a fixed number of variants”.
Sorry to nitpick, but a JSON number can be of any length. I think what you were thinking of was JavaScript, in which numbers are represented as 64-bit values.
No, the json standard provides for a maximum number of digits in numbers. Yes I know this because of a bug from where I assumed json numbers could be any length.
Edit: I stand corrected - I’m certain I saw something in the standard about a limit (I was surprised) but it seems there isn’t. That said various implementations are allowed to limit the length they process. https://datatracker.ietf.org/doc/html/rfc7159#section-6
Which standard? ECMA-404 doesn’t appear to have a length limitation on numbers. RFC 8259 says something much more specific:
This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754 binary64 (double precision) numbers [IEEE754] is generally
available and widely used, good interoperability can be achieved by
implementations that expect no more precision or range than these
provide, in the sense that implementations will approximate JSON
numbers within the expected precision. A JSON number such as 1E400
or 3.141592653589793238462643383279 may indicate potential
interoperability problems, since it suggests that the software that
created it expects receiving software to have greater capabilities
for numeric magnitude and precision than is widely available.
In fewer words, long numbers are syntactically legal but might be incorrectly interpreted depending on which implementation is decoding.
The ECMA-303 standard doesn’t talk about any numerical limits at all, and RFC7159 talks about implementation-specific limitations which a) is kinda obvious, because RAM isn’t unlimited in the real world and b) doesn’t buy you anything if you are implementing a library that needs to deal with JSON as it exists in the wild.
So yes, JSON numbers can be of unlimited magnitude and precision and any correct parsing library better deals with this.
Lacking types and structure never sounded like an improvement to me - more like people just wanted to skip the boring work of declaring tables and such.
To some degree it’s the same as the arguments in favor of dynamically-typed languages. Just s/tables/variable types/, etc.
Also, remember the recent post which included corbin (?)s quote about “you can’t extend your type system across the network” — that was about RPC but it applies to distributed systems as well, and the big win of NoSQL originally was horizontal scaling, i.e. distributing the database across servers.
[imaginary “has worked at Couchbase for ten years doing document-db stuff” hat]
The whole attitude of the NoSQL movement put me off it a lot. Lacking types and structure never sounded like an improvement to me - more like people just wanted to skip the boring work of declaring tables and such.
I always thought that NoSQL came about because people didn’t feel like dealing with schema migrations. I’ve certainly dreaded any sort of schema migration that did more than just add or remove columns. But I never actually tried using NoSQL “databases” so I can’t speak about whether or not they actually help.
In practice you still need to do migrations, in the form of deploying your code to write the new column in a backwards compatible way and then later removing that backwards compatible layer. The intermediate deployments that allow for the new and old code to live side by side, as well as a safe rollback, are required whether you use sql or not. The only difference is that you don’t have to actually run a schema migration. A downside of this is that it’s much easier to miss what actually turns out to be schema change in a code review, since there are not explicit “migration” files to look for.
Not if you’re hopping jobs every 1-2 years. If you’re out the door quickly enough, you can honestly claim you’ve never run into any long-term maintainability issues with your choice of technologies.
This was much better than I thought, and worth reading for sql fans.
My main disagreement is that this conflates two things:
Sql being seriously suboptimal at the thing it’s designed for; and distinctly
Sql being bad at things general purpose programming languages are good at.
There’s value in a restricted language with a clearly defined conceptual model that meets well defined design goals. Despite serious flaws sql is quite good at its core mission of declarative relational querying.
In many ways the porosity story is not bad - for example Postgres lets you embed lots of languages. I think a lot of the criticisms here really mean that more than one language is needed, and integrating them smoothly is the issue.
For me, better explicit declaration of what extensions are required for a query to run would make things more maintainable. I think the criticisms in the article around compositionality are in the right area at least - much more clarity would be better here.
In terms of an upgrade path - if we accept that basically sql is pretty sound but too ad hoc - then this is a very similar problem to that of shell programming. I find the “smooth upgrade path” theory of oil shell plausible (and I’d add that Perl in many ways was a smooth upgrade from shell) although many more people have attempted smooth upgrade paths than have succeeded.
My best guess as to how to do it would be to implement an alternative but similar and principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL - that accommodates the different engines being different and allows their differences to be exposed in a convenient way. If you can get the better query language into at least two of those, you’ll be reaching a large audience who are actually trying to do real work. All easier said than done, of course.
Sql being bad at things general purpose programming languages are good at.
I think this (and what follows) is a misinterpretation.
The core idea is not to change things such that SQL is suddenly good a GP tasks, but to adopt the things from GP languages that worked well there, and will also work well in the SQL context; for instance:
Sane scoping rules.
Namespaces.
Imports.
Some kind of generic programming.
These things alone would enable people to write “cross-database SQL standard libraries” that would make it easier to write portable SQL (which the database vendors are obviously not interested in).
Which would then free up resources from people who want to improve communication with databases in other ways¹ – because having to write different translation code for 20 different databases and their individual accumulation of 20 years of “quirks” is a grueling task.
principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL - that accommodates the different engines being different and allows their differences to be exposed in a convenient way
I think most of the ecosystems weakness comes from any non-trivial SQL code being non-portable. I would neither want “differences exposed in a convenient way”, nor would I call a language that did that “principled”.
¹ E. g. why does shepherding some data from a database into a language’s runtime require half a dozen copies and conversions?
I guess maybe I just disagree on the problem. I don’t think portability is a very important goal, and I would give it up before pretty much anything else.
Portability is not the important goal, it’s simply the requisite to get anything done, including things you may consider an important goal.
Because without it, everyone trying to improve things is isolated into their own database-specific silo, and you have seen the effect of this for the last decades: Little to no fundamental improvements in how we use or interact with databases.
My best guess as to how to do it would be to implement an alternative but similar and principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL
Yeah but (a) it’s not available out of the box (b) it’s not obvious there’s a smooth upgrade path here or even that this is the language people want. Which is only somewhat of a criticism- lots of new things are going to have to be tried before one sticks.
Sql being seriously suboptimal at the thing it’s designed for; and distinctly
Sql being bad at things general purpose programming languages are good at.
Excellent point. Bucketing those concerns would make this “rant” even better! I do think that stuff falls into both buckets (though that which falls into the first buckets are trivially solvable, especially with frameworks like linq or ecto, or gql overlays). The second category though does reflect that people do want optimization for some of those things, and it’s worth thinking about how a “replacement for sql” might want to approach them.
I have a draft here about my misgivings with SQL; I love the RDBMS model, but SQL is not the best we can do. Codd’s RMv2 book is a good read on what’s right and what’s wrong with SQL, though I find the biggest issue is that he’s too grounded in a manual data entry mainframe world.
The headings I had for my draft:
Better syntax
A functional programming language that isn’t hostile to functional programming
Less opaque query planners
Sum types, discriminated unions, and pattern matching
The #1 reason for me would be a foreign key that can match on multiple things, something like (from the draft, very rough, syntax is volatile):
table SoftwarePictures =
{
// a foreign key is assumed to have the same type as what it relates to
PictureID: key relates to (Picture.PictureID);
ObjectID: key relates to (Software.SoftwareID | Version.VersionID | Download.DownloadID);
}
insert into SoftwarePictures (PictureID, ObjectID) values (0x1234, DownloadID { 0x1234 });
Then you could query it with something like:
select ObjectID(DownloadID did), PictureID pid from SoftwarePictures where did = 0x1234;
I have some more complex examples I’d like to polish up and put on the blog.
This is a pretty common pattern I see out in the wild along with other EAV patterns. It’s simple for storing the data perhaps but now data retrieval is convoluted.
For something like this with existing tools I’d prefer separate SoftwarePicture, VersionPicture, etc and use view to combine for read and updatable view for insert/updates. This gets you hard fk constraints, simple read and edit.
I do like MS SQL management studios visualization of query plans. It would be interesting if you could have live query plan updates while writing queries.
This also has the advantage of producing only one fk column, so you don’t have to decide whether you want a.fk or b.fk or whether that makes a difference to the query planner.
Is it? I’m pretty sure it isn’t available in MS SQL Server. Tried using it and it seemed to think it was a query hint. It could be that they haven’t implemented that part of the standard, but it would be a strange thing to leave out.
In the old old days there was a humorous non-FAQ document for one of the big SGML newsgroups, which included the following[0]:
Q. What’s so great about ISO standardization?
A. It is often said that one of the advantages of SGML over some other, proprietary, generic markup scheme is that “nobody owns the standard”. While this is not strictly true, the ISO’s pricing policy certainly has helped to keep the number of people who do own a copy of the Standard at an absolute minimum.
[0] I have no idea if it’s still online anywhere; the original URL I bookmarked years ago is now a 404. I do have a saved copy, though, because its answer to “what’s new in this year’s edition” was “Save a copy of this article so next year you can run ‘diff’ on it.”
What are some alternatives to SQL that avoid some of these pitfalls?
One really interesting SQL library I’ve seen is opaleye, which bypasses a lot of the incompressibility issues by constructing queries in a Haskell DSL using reusable and composable monadic primitives, and then compiling those down into SQL (and automatically generating the repetitive parts of the query as part of that compilation).
Great article. My eyes glazed over at most of the examples, but I need little convincing that SQL is a terrible language.
I didn’t see any mention of .NET’s LINQ, Language-INtegrated Query, which seems to me like a clever way to express queries directly in C# in a typical FP style that can then be translated by library code to any underlying query implementation. I’ve never used it, though; does it work well in practice? Or does it expose only a limited subset of query types?
My eyes glazed over at most of the examples, but I need little convincing that SQL is a terrible language.
I recommend taking a second look at the “the SQL spec allows a novel form of variable naming - you can refer to a column by using an expression which produces the same parse tree as the one that produced the column”. It improved my lower bound for terribleness (of any language, not just SQL).
I liked that example too! That’s a “new and creative” thing I haven’t seen in any language :)
But I’m biased and I still think that the worst feature I’ve ever encountered is the fact that bash/ksh (yes multiple shells!) will on occasion look inside your strings, parse them as code, and execute them.
x='a[$(echo 42 > PWNED)]=foo' # this is a string, could be user supplied
echo $(( x )) # PWNED created! data executed as code
This has existed for ~25 years in bash/ksh, but not in dash because it doesn’t have arrays. The executable code has to be in the array subscript!
Demos here, and when I rediscovered this I e-mailed a bunch of people, and Stephane Chazelas (person who discovered ShellShock in 2014) had written about it, and there’s a link from the Fedora security team too:
OpenBSD ksh was fixed based on my report, and I’m pretty sure the the “shopt -s assoc_expand_once” feature in bash 5.0 was inspired by my e-mails to the maintainer about it.
I didn’t see any mention of .NET’s LINQ, Language-INtegrated Query, which seems to me like a clever way to express queries directly in C# in a typical FP style that can then be translated by library code to any underlying query implementation. I’ve never used it, though; does it work well in practice? Or does it expose only a limited subset of query types?
LINQ works well in practice although some queries will need to be written as parametrized SQL. It’s not SQL specific; you can query any data source that has a query provider or implements the right methods.
All of these criticisms are nice in the abstract, but until there is an alternative syntax being proposed, I don’t find any of it compelling. For all of SQL’s failings, it has the benefit of being implemented and therefore proven in the real world.
Propose a better syntax and I’ll be more interested.
Show me how moderately complex SQL queries (with joins, subqueries, “having”, joined inserts with duplicate key update, window functions) can be represented in your syntax and I’ll be very interested.
Show me how complex queries in your syntax can be parsed into an optimised execution plan, and I’ll be extremely interested.
Generally my experience with diatribes on the problems of complex systems is that they rarely consider whether the proposed alternative isn’t replacing a whole bunch of identified problems with a whole bunch of unidentified problems. Who’s to say that someone following this article won’t end up with the SQL equivalent of XHTML?
SQL is an interesting beast because there’s no disputing how crufty and scattered it is, but there’s some underappreciated beauty in it too. I believe that there’s more viability in incremental improvements than in wholesale invention of an entirely new syntax. But I’d be thrilled to be proven wrong.
Oh I completely agree. Which is why I say there should be a smooth upgrade path. I’d mostly worry about fixing names, some syntax for common sub expressions, some import syntax, and the rules about when select is required, and leave it at that.
It looks like a nicer syntax. More pleasant to work in. But I don’t see much that would have any real dent in productivity. The weird and crufty parts of SQL are cheap enough to learn and remember; if I got proficient in EdgeQL I’d probably do a little bit less typing and save minutes per day at most.
This whole thing reminds me of the ORM argument. A whole bunch of people who don’t understand the nuances of SQL advocating for its replacement. Alternatives were designed to make their database pretend to be something they do understand—in the case of ORMs, a series of wonky, half-baked objects.
The difference vs ORM is that EdgeQL is still built on the full relational model and can do anything and everything SQL can do, it’s not a fig leaf over a single use case, but rather a new language with at least equal expressive power.
It won’t have a huge productivity change at first for anyone used to SQL of course, but it the composability etc come out in practise to enable more reusable libraries of functions, or if the more comprehensible wire format for common join cases results in more use of the DB by more users, these sorts of things can improve the ecosystem’s productivity over time.
It won’t have a huge productivity change at first for anyone used to SQL of course
This is why I’m afraid it won’t gain traction. There’s just too much inertia. Now if Postgres upstream would integrate this language, for example, or have a pluggable syntax layer which could be selected at will when making queries, it would be much easier to switch (and even switch over gradually within an application).
This is a very good rant. I really like SQL, but also, these issues the author raises are extremely real.
I would actually regard the incompressible argument as the most substantive from a day-in-the-trenches perspective. Building a DB language such that it could be as terse and type-sane as, say, Rust, would be both monumental and also extremely valuable.
There are a couple big things that keep query languages legitimately different from general-purpose PLs, I think. One, query planners help us out a ton, so you want the language to be transparent to the planner so you can declaratively say what you want simply enough that it can figure out how to get it. You need to be careful inserting wasm blobs or such to not break that. Two, if you move a lot of your app’s computation into the DB, you might pay (already true with piles of stored procs today)–wonky code can overload or crash your DB not just the application instance, and in general you might have to scale your DB deployment for computation needs more which can be suboptimal (especially when DBs were harder to scale period!).
None of that justifies all the quirks of SQL; it wasn’t handed down from on high. More types, more conciseness, and making more imperative-style control a reasonable option all seem interesting. Weird to consider why SQL hasn’t gone anywhere, and even seems to be having a second heyday after NoSQL was the hot thing for a stretch.
Over the last decade or two we’ve had many general-purpose programming languages get to the point where they’re entirely practical to build big new apps in when, in ancient history when I was learning this stuff, it seemed like the implementations and ecosystems were so big and difficult to create we would kind of be stuck with a handful forever. Alternative languages existed, of course; the “practical to work in” part was the hard one. Many things changed, hard to pin down their relative importance–pieces of compilers/runtimes being reusable (LLVM, JVM/CLR, JS backend, etc.), collective learning about how to nurture a language ecosystem (importance of docs/introductions, stdlib, good FFI to fill gaps early on, easy ways to share libs, etc.), better hardware lowering the performance bar for “usable”, more programmers becoming polyglots, just a bigger population of programmers in general to write/use all this stuff?
So why hasn’t there been a similar explosion in query languages? Maybe we’re missing relational backends (with secondary indexes, query planners, etc.) easily separable from the query language? Maybe the risk aversion is greater? (That feels plausible intuitively, but you were also hosed if you wrote your app in a flaky language and we got over that.) Maybe the effort in DBs has gone to building decent distributed database backends, and we’re late to taking a good look at the layer above that?
FWIW, I think there’s another, loosely related, oddity of how we do data now: DBs bundle together lots of functionality that may not have to be bundled. Like, if you have two files that don’t fit in memory, doing a sort join doesn’t inherently require offloading the task to another binary that also has a serialization format, a transaction layer, a query planner, etc.; it just needs an external sort function. I think we’re chipping at it from a few directions–SQLite doesn’t have the server, Trino/Presto/Athena query data in its native format, data science libs do interesting things in an imperative way, there’re plenty of KV libraries. These things are each kind of islands now, and it’s interesting to imagine something more like a proper database that is separable into all these pieces (which, yes, is super hard, but we’re dreamin’ here).
Man, how often did I wish when doing database query generation “please $DB, just let me hand you queries in your own internal format, instead of making me write SQL”.
So I agree with the criticism of the author, but as mentioned in the end of the article … what to do with all the knowledge we have now?
It seems that many previous alternative were not successful:
ORM and “NoSQL” – junior developer ideas that turned out to be worse than using SQL
GraphQL – lacks joins, so is hardly a credible replacement
Other promising approaches seem to end up getting commercialized, sold and closed down.
So what can “we” do, to improve the state of the art?
In my opinion: demonstrating and specifying a practical, well-designed¹ language that various databases could implement on their own as an alternative to SQL.
A Datalog variant. I had a lot of fun playing with differential-datalog, and there’s Logica which compiles datalog to SQL for a variety of SQL dialects.
Relational data maps pretty well to most business domains. NoSQL and ORMs throw out the baby with the bathwater for different reasons (turfing the entire model with NoSQL, trying force two different views of modelling the domain to kiss with ORMs). Anything that makes a join hard isn’t a good idea when an RDBMS is involved.
I think what might be interesting is instead of contorting the RDBMS model to work with OO languages like ORMs do, do the reverse: a relational programming language. I don’t know what that could look like though.
Relational data maps pretty well to most business domains. NoSQL and ORMs throw out the baby with the bathwater for different reasons (turfing the entire model with NoSQL, trying force two different views of modelling the domain to kiss with ORMs). Anything that makes a join hard isn’t a good idea when an RDBMS is involved.
Agreed with the conclusion and I have nothing good to say about most NoSQL systems other than that rescuing companies from them is a lucrative career, but I think this criticism of ORMs is over-broad.
A good ORM will take the scut-work out of database queries in a clean and standardized-across-codebases way without at all getting in your way when accessing deep database features, doing whatever joins you want, etc. I’d throw modern Rails ActiveRecord (without getting in the weeds on Arel) as a good ORM which automates the pointless work while staying out of your way when you want to do something more complicated.
A bad ORM will definitely try to “hide” the database from you in ways that just make everything way too complicated the second you want to do something as simple as specify a specific type of join. Django’s shockingly awful “QuerySet” ORM definitely falls in this camp, as I’ve recently had the misfortune of trying to make it do fairly simple things.
I’m very surprised to see ActiveRecord used as an example if something which stays out if your way. The amount of time I have spent fighting to get it to generate the SQL I wanted is why I never use it unless I’m being paid a lot to do so.
Really? It’s extremely easy to drop to raw SQL, and to intermix that with generated statements – and I’ve done a lot of really custom heavy lifting with it over the years. Admittedly this may not be well documented and I may just be taking advantage of a lot of deep knowledge of the framework, here.
The contrast is pretty stark to me compared to something like Django, whose devs steadfastly refuse to allow you to specify joins and which, while offering a raw SQL escape hatch, has a different intermediate result type for raw SQL
queries (RawQuerySet vs QuerySet) with different methods, meaning details of how you formed a query (raw vs ORM API) leak into all consuming layers and you can’t switch one for the other at the data layer without breaking everything upstream (hilariously the accepted community “solution” to this seems to be to write your raw query, then wrap an ORM API call around it that generates a “select * from (raw query)”??).
ActiveRecord has none of these issues in my experience – joins can be manually specified, raw clauses inserted, raw SQL is transparent and intermixable with ORM statements with no impedance mismatch. Even aggregation/deaggregation approaches like unions, unnest(), etc that breaks the table-to-class and column-to-property assumptions can still be made to work cleanly. It’s really night and day.
Not the commenter you’re asking but they’re both tools that reduce initial amount of learning at the cost of abandoning features that make complexity and maintainability easier to handle.
I’m not sure that that’s true, though. ORMs make a lot of domain logic easier to maintain—it’s not about reducing initial learning, it’s about shifting where you deal with complexity (is it complexity in your domain or in scaling or ???). Similar with NoSQL—it’s not a monolithic thing at all and most of those NoSQL databases require similar upfront learnings (document DBs, graph DBs, etc. all require significant upfront learning to utilize well). Again, it’s a trade off of what supports your use case well.
I’m just not sure what the GP meant by “junior developer ideas” (it feels disparaging of these, and those who use them, but I won’t jump to conclusions). They also are by no stretch “worse than using SQL”. They are sometimes worse and sometimes better. Tradeoffs.
I agree with you on the tradeoffs. I’m not sure I agree on the domain logic thing. In my experience orms make things easier until they don’t, in part because you’ve baked your database schema into your code. Sometimes directly generating queries allows changes to happen in the schema without the program needing to change its data model immediately.
This might be the best language rant I’ve ever read. Nothing is wrong, nothing is described in hyperbole, comparisons to the rest of the ecosystem are constant and appropriate, and examples are numerous.
I still quite like string diagrams for relational algebra.
I agree that it’s a quality rant, but there’s a lot of stuff in there that falls into the buckets of:
All of those are fine, but I can’t help but wonder if this isn’t just a mismatch in what the author is used to versus the world that created SQL–and being maybe a bit blind to our own modern realities (for example, “modern ecosystems” as in JS or Ruby are superfund sites of structural abandonware spanning generations of code features whereas legacy SQL codebases still by-and-large look like SQL modulo some quirks; would this be an improvement for mission-critical code?).
~
Re: hyperbole…I guess it isn’t exaggeration if you’re instead just wrong. To wit:
The amount of money and man hours spent, and fortunes made, using the “burdensome spec” of SQL makes me question what exactly “limiting innovation” would look like. Like, sure, it doesn’t have the greatness of an ML or a LISP…but then again Oracle outlasted Symbolics and sqlite is basically everywhere.
This might be the case in startups, but large businesses most assuredly do have as much logic as possible in the database layer–much to the dismay and distress of DBAs everywhere. Oracle, Microsoft, and other DB vendors exist in spite of free offerings because so many customers will pay for their database features (arguably this is lock-in, but that’s another discussion all together).
~
It’s a really good read and essay, and I like it as an example of a document which has (what I believe to be) incorrect conclusions but nevertheless the journey of arriving at them is well-substantiated and itself capable of giving enlightenment.
I think you’ve mischaracterized the argument. It’s not “I don’t like SQL because it doesn’t have X”. It’s that “lacking X” causes real consequences for both applications and database implementers.
As a simple example, the links on n+1 bugs and feral concurrency are particularly good, but there are lots more:
I’d go as far as to say that if you don’t know what he’s talking about, then you either haven’t used SQL in a real application, or your code simply has tons of these bugs and you don’t know it :)
On the implementation side, the complexity arguments are very good and borne out of experience. The core logic for queries over relations is small; the SQL part is large.
Mistakes in Active Record and ORM usage leading to N+1 bugs strike me more as the fault of devs than of SQL.
This is the same argument about UB and memory bugs when using Memory-unsafe languages like C: “Mistakes leading to dereferencing a null pointer strike me more as a fault of devs than of C.” Yes, a sufficiently careful dev can avoid the sharp edges of SQL but the language does not make it easy or provide much (if any) support.
In this case there’s a particular workflow (ActiveRecord/ORM) that’s causing mistakes rather than SQL itself. Quite a lot of folks use raw SQL for many things.
That’s not a great comparison, because these problems are introduced by a higher level of abstraction: the ORM. A better comparison would be blaming a garbage collector for introducing performance problems: a dev who knows his GC implementation well will be able to write code that doesn’t introduce too many slowdowns, just as a dev who knows his ORM implementation well will be able to write code that doesn’t introduce too many slowdowns.
Someone who doesn’t know his GC well will be surprised by difficult to predict overhead by extra garbage collections happening, and someone who doesn’t know his ORM well will be surprised by difficult to predict overhead of extra roundtrips to the database.
The N+1 query issues can happen without ORMs. It’s very common to abstract SQL queries with functions, and then simply calling those functions in a loop gives rise to the problem.
Of course, but that’s just another example of an abstraction obscuring what’s going on. I’ve seen similar problems with abstractions over other things, so hardly an SQL-specific issue. My point was that blaming UB behaviour and memory unsafety on the user rather than the language is a bit different in that that is something where abstractions help (you’re working at a too low level when you have these issues), whereas here the abstractions are the thing getting in the way (you’re working at a too high level when you have these issues).
This was an absolutely brilliant article! It was fantastically well researched and written someone with expert knowledge of the domain. I’m learning so much from reading it.
The argument about representing JSON objects in SQL were not persuasive to me. I do not really understand why this would be desirable. I see the SQL approach as a more static-typed one, where you would process JSON objects and ensure they fit a predefined structure before inserting them into SQL. For a more dynamic approach where you just thrown JSON objects into a database you have MongoDB. On that note I think the lack of union types in SQL is a feature more than a limitation, isn’t it?
Excellent point about JOIN syntax being verbose, and the lack of sugar or any way to metaprogram and define new syntax. The query language could be so much more expressive and easy to use.
I think these line counts say a lot! The extra work trying to fulfill all the criteria of the SQL standard isn’t necessary work for the implementation of a database system. A more compact language specification would enable implementations to be shorter and enable people to learn it much more easily.
The whole attitude of the NoSQL movement put me off it a lot. Lacking types and structure never sounded like an improvement to me - more like people just wanted to skip the boring work of declaring tables and such. But this work is a foundation for things to work smoothly so I think the more dynamic approach will often bite you in the end. But then the author explains more about GraphQL and honestly it sold me on GraphQL, I would be very open to using that in future rather than SQL after reading this.
This is a frustrating part about innovation in programming but honestly I believe that the ideas he has presented represent too significant an improvement that they are just too good for people not to start using.
If you have data encoded in a JSON format, it often falls naturally into sets of values with named fields (that’s the beauty of the relational model) so you can convert it into a SQL database more or less painlessly.
On the other hand, if you want to store actual JSON in a SQL database, perhaps to run analytical queries on things like “how often is ‘breakfast’ used as a key rather than as a value”, it’s much more difficult, because “a JSON value” is not a thing with a fixed representation. A JSON number might be stored as eight bytes, but a JSON string could be any length, never mind objects or lists. You could create a bunch of SQL tables for each possible kind of JSON value (numbers, strings, booleans, objects, lists) but if a particular object’s key’s value can be a number or a string, how do you write that foreign key constraint?
Sure, most applications don’t need to query JSON in those ways, but since the relational model is supposed to be able to represent any kind of data, the fact that SQL falls flat on its face when you try to represent one of the most common data formats of the 21st century is a little embarrassing.
That’s what the post means by “union types”. Not in the C/C++ sense of type-punning, but in the sense of “a single data type with a fixed number of variants”.
Sorry to nitpick, but a JSON number can be of any length. I think what you were thinking of was JavaScript, in which numbers are represented as 64-bit values.
No, the json standard provides for a maximum number of digits in numbers. Yes I know this because of a bug from where I assumed json numbers could be any length.
Edit: I stand corrected - I’m certain I saw something in the standard about a limit (I was surprised) but it seems there isn’t. That said various implementations are allowed to limit the length they process. https://datatracker.ietf.org/doc/html/rfc7159#section-6
Which standard? ECMA-404 doesn’t appear to have a length limitation on numbers. RFC 8259 says something much more specific:
In fewer words, long numbers are syntactically legal but might be incorrectly interpreted depending on which implementation is decoding.
The ECMA-303 standard doesn’t talk about any numerical limits at all, and RFC7159 talks about implementation-specific limitations which a) is kinda obvious, because RAM isn’t unlimited in the real world and b) doesn’t buy you anything if you are implementing a library that needs to deal with JSON as it exists in the wild.
So yes, JSON numbers can be of unlimited magnitude and precision and any correct parsing library better deals with this.
To some degree it’s the same as the arguments in favor of dynamically-typed languages. Just s/tables/variable types/, etc.
Also, remember the recent post which included corbin (?)s quote about “you can’t extend your type system across the network” — that was about RPC but it applies to distributed systems as well, and the big win of NoSQL originally was horizontal scaling, i.e. distributing the database across servers.
[imaginary “has worked at Couchbase for ten years doing document-db stuff” hat]
I always thought that NoSQL came about because people didn’t feel like dealing with schema migrations. I’ve certainly dreaded any sort of schema migration that did more than just add or remove columns. But I never actually tried using NoSQL “databases” so I can’t speak about whether or not they actually help.
In practice you still need to do migrations, in the form of deploying your code to write the new column in a backwards compatible way and then later removing that backwards compatible layer. The intermediate deployments that allow for the new and old code to live side by side, as well as a safe rollback, are required whether you use sql or not. The only difference is that you don’t have to actually run a schema migration. A downside of this is that it’s much easier to miss what actually turns out to be schema change in a code review, since there are not explicit “migration” files to look for.
This! you’re basically sweeping dirt under the carpet. One day you’re going to have to deal with it..
In my experience this leads to data inconsistencies and the need to code defensively or otherwise maintain additional application code.
Not if you’re hopping jobs every 1-2 years. If you’re out the door quickly enough, you can honestly claim you’ve never run into any long-term maintainability issues with your choice of technologies.
I think that’s unlikely, most NoSQL people probably have no idea what schema migrations are.
This was much better than I thought, and worth reading for sql fans.
My main disagreement is that this conflates two things:
There’s value in a restricted language with a clearly defined conceptual model that meets well defined design goals. Despite serious flaws sql is quite good at its core mission of declarative relational querying.
In many ways the porosity story is not bad - for example Postgres lets you embed lots of languages. I think a lot of the criticisms here really mean that more than one language is needed, and integrating them smoothly is the issue.
For me, better explicit declaration of what extensions are required for a query to run would make things more maintainable. I think the criticisms in the article around compositionality are in the right area at least - much more clarity would be better here.
In terms of an upgrade path - if we accept that basically sql is pretty sound but too ad hoc - then this is a very similar problem to that of shell programming. I find the “smooth upgrade path” theory of oil shell plausible (and I’d add that Perl in many ways was a smooth upgrade from shell) although many more people have attempted smooth upgrade paths than have succeeded.
My best guess as to how to do it would be to implement an alternative but similar and principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL - that accommodates the different engines being different and allows their differences to be exposed in a convenient way. If you can get the better query language into at least two of those, you’ll be reaching a large audience who are actually trying to do real work. All easier said than done, of course.
I think this (and what follows) is a misinterpretation.
The core idea is not to change things such that SQL is suddenly good a GP tasks, but to adopt the things from GP languages that worked well there, and will also work well in the SQL context; for instance:
These things alone would enable people to write “cross-database SQL standard libraries” that would make it easier to write portable SQL (which the database vendors are obviously not interested in).
Which would then free up resources from people who want to improve communication with databases in other ways¹ – because having to write different translation code for 20 different databases and their individual accumulation of 20 years of “quirks” is a grueling task.
I think most of the ecosystems weakness comes from any non-trivial SQL code being non-portable. I would neither want “differences exposed in a convenient way”, nor would I call a language that did that “principled”.
¹ E. g. why does shepherding some data from a database into a language’s runtime require half a dozen copies and conversions?
I guess maybe I just disagree on the problem. I don’t think portability is a very important goal, and I would give it up before pretty much anything else.
Portability is not the important goal, it’s simply the requisite to get anything done, including things you may consider an important goal.
Because without it, everyone trying to improve things is isolated into their own database-specific silo, and you have seen the effect of this for the last decades: Little to no fundamental improvements in how we use or interact with databases.
No I don’t think so.
That’s exactly what I did with Preql, by making it compile it to SQL (https://github.com/erezsh/Preql)
Still waiting for the audience :P
Yeah but (a) it’s not available out of the box (b) it’s not obvious there’s a smooth upgrade path here or even that this is the language people want. Which is only somewhat of a criticism- lots of new things are going to have to be tried before one sticks.
Excellent point. Bucketing those concerns would make this “rant” even better! I do think that stuff falls into both buckets (though that which falls into the first buckets are trivially solvable, especially with frameworks like linq or ecto, or gql overlays). The second category though does reflect that people do want optimization for some of those things, and it’s worth thinking about how a “replacement for sql” might want to approach them.
I have a draft here about my misgivings with SQL; I love the RDBMS model, but SQL is not the best we can do. Codd’s RMv2 book is a good read on what’s right and what’s wrong with SQL, though I find the biggest issue is that he’s too grounded in a manual data entry mainframe world.
The headings I had for my draft:
What’s the argument for sums and discriminated unions? (these are the same right?)
The #1 reason for me would be a foreign key that can match on multiple things, something like (from the draft, very rough, syntax is volatile):
Then you could query it with something like:
I have some more complex examples I’d like to polish up and put on the blog.
This is a pretty common pattern I see out in the wild along with other EAV patterns. It’s simple for storing the data perhaps but now data retrieval is convoluted.
For something like this with existing tools I’d prefer separate SoftwarePicture, VersionPicture, etc and use view to combine for read and updatable view for insert/updates. This gets you hard fk constraints, simple read and edit.
I do like MS SQL management studios visualization of query plans. It would be interesting if you could have live query plan updates while writing queries.
I will admit to not knowing what is in the SQL standard, but I’m very used to doing foreign keys via
This also has the advantage of producing only one fk column, so you don’t have to decide whether you want a.fk or b.fk or whether that makes a difference to the query planner.
P sure that’s standard.
Is it? I’m pretty sure it isn’t available in MS SQL Server. Tried using it and it seemed to think it was a query hint. It could be that they haven’t implemented that part of the standard, but it would be a strange thing to leave out.
Postgres don’t list it as an extension, and they love documenting extensions https://www.postgresql.org/docs/11/sql-select.html#SQL-FROM
I’m not 100% convinced, but MySQL docs say the following:
https://dev.mysql.com/doc/refman/8.0/en/join.html
If it is part of the standard then I really need to know if I can get it to work in MSSQL and if I can’t why it isn’t supported.
*Edit: WHY DOES IT COST $250 TO LOOK AT THE STANDARD?!
https://webstore.ansi.org/Standards/ISO/ISOIEC90752016-1646101?source=blog
In the old old days there was a humorous non-FAQ document for one of the big SGML newsgroups, which included the following[0]:
[0] I have no idea if it’s still online anywhere; the original URL I bookmarked years ago is now a 404. I do have a saved copy, though, because its answer to “what’s new in this year’s edition” was “Save a copy of this article so next year you can run ‘diff’ on it.”
Given that I linked one part of ten(?), I think that rings true. Crazy to think you might need to drop $2500 to get access to the standard.
What are some alternatives to SQL that avoid some of these pitfalls?
One really interesting SQL library I’ve seen is opaleye, which bypasses a lot of the incompressibility issues by constructing queries in a Haskell DSL using reusable and composable monadic primitives, and then compiling those down into SQL (and automatically generating the repetitive parts of the query as part of that compilation).
Thanks for linking to Opaleye, it was interesting to read about!
Great article. My eyes glazed over at most of the examples, but I need little convincing that SQL is a terrible language.
I didn’t see any mention of .NET’s LINQ, Language-INtegrated Query, which seems to me like a clever way to express queries directly in C# in a typical FP style that can then be translated by library code to any underlying query implementation. I’ve never used it, though; does it work well in practice? Or does it expose only a limited subset of query types?
I recommend taking a second look at the “the SQL spec allows a novel form of variable naming - you can refer to a column by using an expression which produces the same parse tree as the one that produced the column”. It improved my lower bound for terribleness (of any language, not just SQL).
I liked that example too! That’s a “new and creative” thing I haven’t seen in any language :)
But I’m biased and I still think that the worst feature I’ve ever encountered is the fact that bash/ksh (yes multiple shells!) will on occasion look inside your strings, parse them as code, and execute them.
This has existed for ~25 years in bash/ksh, but not in dash because it doesn’t have arrays. The executable code has to be in the array subscript!
Demos here, and when I rediscovered this I e-mailed a bunch of people, and Stephane Chazelas (person who discovered ShellShock in 2014) had written about it, and there’s a link from the Fedora security team too:
https://github.com/oilshell/blog-code/tree/master/crazy-old-bug
OpenBSD ksh was fixed based on my report, and I’m pretty sure the the “shopt -s assoc_expand_once” feature in bash 5.0 was inspired by my e-mails to the maintainer about it.
LINQ works well in practice although some queries will need to be written as parametrized SQL. It’s not SQL specific; you can query any data source that has a query provider or implements the right methods.
All of these criticisms are nice in the abstract, but until there is an alternative syntax being proposed, I don’t find any of it compelling. For all of SQL’s failings, it has the benefit of being implemented and therefore proven in the real world.
Propose a better syntax and I’ll be more interested.
Show me how moderately complex SQL queries (with joins, subqueries, “having”, joined inserts with duplicate key update, window functions) can be represented in your syntax and I’ll be very interested.
Show me how complex queries in your syntax can be parsed into an optimised execution plan, and I’ll be extremely interested.
I think it makes sense to write articles to set a course and sort out what people think the problem is.
Generally my experience with diatribes on the problems of complex systems is that they rarely consider whether the proposed alternative isn’t replacing a whole bunch of identified problems with a whole bunch of unidentified problems. Who’s to say that someone following this article won’t end up with the SQL equivalent of XHTML?
SQL is an interesting beast because there’s no disputing how crufty and scattered it is, but there’s some underappreciated beauty in it too. I believe that there’s more viability in incremental improvements than in wholesale invention of an entirely new syntax. But I’d be thrilled to be proven wrong.
Oh I completely agree. Which is why I say there should be a smooth upgrade path. I’d mostly worry about fixing names, some syntax for common sub expressions, some import syntax, and the rules about when select is required, and leave it at that.
Have a look at edgedb/EdgeQL linked from the end.
It looks like a nicer syntax. More pleasant to work in. But I don’t see much that would have any real dent in productivity. The weird and crufty parts of SQL are cheap enough to learn and remember; if I got proficient in EdgeQL I’d probably do a little bit less typing and save minutes per day at most.
This whole thing reminds me of the ORM argument. A whole bunch of people who don’t understand the nuances of SQL advocating for its replacement. Alternatives were designed to make their database pretend to be something they do understand—in the case of ORMs, a series of wonky, half-baked objects.
The difference vs ORM is that EdgeQL is still built on the full relational model and can do anything and everything SQL can do, it’s not a fig leaf over a single use case, but rather a new language with at least equal expressive power.
It won’t have a huge productivity change at first for anyone used to SQL of course, but it the composability etc come out in practise to enable more reusable libraries of functions, or if the more comprehensible wire format for common join cases results in more use of the DB by more users, these sorts of things can improve the ecosystem’s productivity over time.
This is why I’m afraid it won’t gain traction. There’s just too much inertia. Now if Postgres upstream would integrate this language, for example, or have a pluggable syntax layer which could be selected at will when making queries, it would be much easier to switch (and even switch over gradually within an application).
You can run edgedb in front of postgres to do the gradual switchover you describe.
This is a very good rant. I really like SQL, but also, these issues the author raises are extremely real.
I would actually regard the incompressible argument as the most substantive from a day-in-the-trenches perspective. Building a DB language such that it could be as terse and type-sane as, say, Rust, would be both monumental and also extremely valuable.
There are a couple big things that keep query languages legitimately different from general-purpose PLs, I think. One, query planners help us out a ton, so you want the language to be transparent to the planner so you can declaratively say what you want simply enough that it can figure out how to get it. You need to be careful inserting wasm blobs or such to not break that. Two, if you move a lot of your app’s computation into the DB, you might pay (already true with piles of stored procs today)–wonky code can overload or crash your DB not just the application instance, and in general you might have to scale your DB deployment for computation needs more which can be suboptimal (especially when DBs were harder to scale period!).
None of that justifies all the quirks of SQL; it wasn’t handed down from on high. More types, more conciseness, and making more imperative-style control a reasonable option all seem interesting. Weird to consider why SQL hasn’t gone anywhere, and even seems to be having a second heyday after NoSQL was the hot thing for a stretch.
Over the last decade or two we’ve had many general-purpose programming languages get to the point where they’re entirely practical to build big new apps in when, in ancient history when I was learning this stuff, it seemed like the implementations and ecosystems were so big and difficult to create we would kind of be stuck with a handful forever. Alternative languages existed, of course; the “practical to work in” part was the hard one. Many things changed, hard to pin down their relative importance–pieces of compilers/runtimes being reusable (LLVM, JVM/CLR, JS backend, etc.), collective learning about how to nurture a language ecosystem (importance of docs/introductions, stdlib, good FFI to fill gaps early on, easy ways to share libs, etc.), better hardware lowering the performance bar for “usable”, more programmers becoming polyglots, just a bigger population of programmers in general to write/use all this stuff?
So why hasn’t there been a similar explosion in query languages? Maybe we’re missing relational backends (with secondary indexes, query planners, etc.) easily separable from the query language? Maybe the risk aversion is greater? (That feels plausible intuitively, but you were also hosed if you wrote your app in a flaky language and we got over that.) Maybe the effort in DBs has gone to building decent distributed database backends, and we’re late to taking a good look at the layer above that?
FWIW, I think there’s another, loosely related, oddity of how we do data now: DBs bundle together lots of functionality that may not have to be bundled. Like, if you have two files that don’t fit in memory, doing a sort join doesn’t inherently require offloading the task to another binary that also has a serialization format, a transaction layer, a query planner, etc.; it just needs an external sort function. I think we’re chipping at it from a few directions–SQLite doesn’t have the server, Trino/Presto/Athena query data in its native format, data science libs do interesting things in an imperative way, there’re plenty of KV libraries. These things are each kind of islands now, and it’s interesting to imagine something more like a proper database that is separable into all these pieces (which, yes, is super hard, but we’re dreamin’ here).
Man, how often did I wish when doing database query generation “please $DB, just let me hand you queries in your own internal format, instead of making me write SQL”.
So I agree with the criticism of the author, but as mentioned in the end of the article … what to do with all the knowledge we have now?
It seems that many previous alternative were not successful:
So what can “we” do, to improve the state of the art?
In my opinion: demonstrating and specifying a practical, well-designed¹ language that various databases could implement on their own as an alternative to SQL.
¹ Not going into that here.
A Datalog variant. I had a lot of fun playing with differential-datalog, and there’s Logica which compiles datalog to SQL for a variety of SQL dialects.
What do you mean by ORMs and NoSQL being “junior developer ideas”?
Relational data maps pretty well to most business domains. NoSQL and ORMs throw out the baby with the bathwater for different reasons (turfing the entire model with NoSQL, trying force two different views of modelling the domain to kiss with ORMs). Anything that makes a join hard isn’t a good idea when an RDBMS is involved.
I think what might be interesting is instead of contorting the RDBMS model to work with OO languages like ORMs do, do the reverse: a relational programming language. I don’t know what that could look like though.
Agreed with the conclusion and I have nothing good to say about most NoSQL systems other than that rescuing companies from them is a lucrative career, but I think this criticism of ORMs is over-broad.
A good ORM will take the scut-work out of database queries in a clean and standardized-across-codebases way without at all getting in your way when accessing deep database features, doing whatever joins you want, etc. I’d throw modern Rails ActiveRecord (without getting in the weeds on Arel) as a good ORM which automates the pointless work while staying out of your way when you want to do something more complicated.
A bad ORM will definitely try to “hide” the database from you in ways that just make everything way too complicated the second you want to do something as simple as specify a specific type of join. Django’s shockingly awful “QuerySet” ORM definitely falls in this camp, as I’ve recently had the misfortune of trying to make it do fairly simple things.
I’m very surprised to see ActiveRecord used as an example if something which stays out if your way. The amount of time I have spent fighting to get it to generate the SQL I wanted is why I never use it unless I’m being paid a lot to do so.
Really? It’s extremely easy to drop to raw SQL, and to intermix that with generated statements – and I’ve done a lot of really custom heavy lifting with it over the years. Admittedly this may not be well documented and I may just be taking advantage of a lot of deep knowledge of the framework, here.
The contrast is pretty stark to me compared to something like Django, whose devs steadfastly refuse to allow you to specify joins and which, while offering a raw SQL escape hatch, has a different intermediate result type for raw SQL queries (RawQuerySet vs QuerySet) with different methods, meaning details of how you formed a query (raw vs ORM API) leak into all consuming layers and you can’t switch one for the other at the data layer without breaking everything upstream (hilariously the accepted community “solution” to this seems to be to write your raw query, then wrap an ORM API call around it that generates a “select * from (raw query)”??).
ActiveRecord has none of these issues in my experience – joins can be manually specified, raw clauses inserted, raw SQL is transparent and intermixable with ORM statements with no impedance mismatch. Even aggregation/deaggregation approaches like unions, unnest(), etc that breaks the table-to-class and column-to-property assumptions can still be made to work cleanly. It’s really night and day.
Not the commenter you’re asking but they’re both tools that reduce initial amount of learning at the cost of abandoning features that make complexity and maintainability easier to handle.
I’m not sure that that’s true, though. ORMs make a lot of domain logic easier to maintain—it’s not about reducing initial learning, it’s about shifting where you deal with complexity (is it complexity in your domain or in scaling or ???). Similar with NoSQL—it’s not a monolithic thing at all and most of those NoSQL databases require similar upfront learnings (document DBs, graph DBs, etc. all require significant upfront learning to utilize well). Again, it’s a trade off of what supports your use case well.
I’m just not sure what the GP meant by “junior developer ideas” (it feels disparaging of these, and those who use them, but I won’t jump to conclusions). They also are by no stretch “worse than using SQL”. They are sometimes worse and sometimes better. Tradeoffs.
I agree with you on the tradeoffs. I’m not sure I agree on the domain logic thing. In my experience orms make things easier until they don’t, in part because you’ve baked your database schema into your code. Sometimes directly generating queries allows changes to happen in the schema without the program needing to change its data model immediately.
[Comment removed by author]
JSON? Really?
What do you mean?
This is fantastic. I wonder what the author would think of CruxDB (Datomic like while offloading the data storage to your database of choice)
For some reason when I see this headline all I can think of is Adversus Haereses.