I don’t understand this post at all. Who has not been using SQL in some capacity, somewhere? SQL has been pervasive in application development since forever, and is an essential skill, even for non-developers like business analysts. Even when “NoSQL” was the hot new tech everyone talked about, people were still using SQL somewhere in their stack, often for reporting or analytics.
I’m beginning to get the impression that there’s a new generation of developers who actually haven’t been exposed to SQL that much - the last five years have seen plenty of boot camps that teach MongoDB or similar instead of a relational database.
This is correct. I’ve interviewed several people that only know Active Record or Entity Framework and they struggle doing simple joins/aggregations directly in SQL. They know everything about their ORM of choice, migrations, backups and deployments, but that’s all they know. And to be fair, for smaller projects that’s all you need to know.
And I would say that the corollary to that is older developers like myself who got exposed to NoSQL databases in their early days of immaturity and frequent reliability issues. It was an eye opener when I met a group of young developers who perceived MongoDB as this timeless, essential, and totally reliable piece of infrastructure.
This means learning SQL will benefit your career as a programmer—and it’s a fairly intuitive language to pick up.
“It’s taught in colleges and universities, and it’s really easy to learn.”
Basic things in SQL is easy, even straightforward. But anything advanced, is rapidly far from “easy to learn”, especially when you are doing data analysis.
More than once, I made what I could in SQL, then switched to some other language to use my SQL query, and filter/aggregate my results, where it could have been done in pure SQL :x
The thing is that while those things are still not easy in SQL, as a general rule they are vastly easier than how you’d hand-roll these thing if SQL wasn’t a thing in the first place. Obviously, you have certain cases where this hasn’t proven to be the case, but it would be interesting to know what places you hit a wall.
That’s because it’s a leaky DSL, making it a bad DSL.
DSLs need to stay small in scope in order to be useful and not become more complicated than just doing the things they try to abstract away and make easier. (Another example is CSS, where the simple things are simple, but the complex stuff require understanding way too many fundamental rendering/layout concepts.)
So in this case, yeah, SQL is “basically English” for all the small and simple things, and then for everything else, you need to understand why a given sub-select is 10x slower than a join. “No, John, you can’t just sub-select every single select column expression! You need to understand the execution plan is going to do this and that.” And thus, the abstraction of this terrible DSL breaks, and now you’re finding out the right incantations and forms to get the underlying system to do what you want.
So while I personally like relational databases, I can’t stand SQL, and can’t believe we’ve not replaced it yet, or at least an alternative option to use instead.
All abstractions leak. A DSL is an abstraction, so it shouldn’t be a surprise that SQL leaks. That doesn’t make SQL a bad abstraction. What could be said to make SQL a bad abstraction is that it implements a compromised version of relational calculus and has done since the beginning.
What kind of things? SQL’s main expressiveness problem is that subqueries are required to generate new name bindings; the main reason I’ve found to switch out of sql is that sometimes aggregations result in really slow execution which can be performed more quickly by explicitly specifying the algorithm.
Isn’t that dialect the SQL standard? Not all database engines implement all of the standard, but they all implement enough that there’s a useful common subset there.
Maybe someone should make a linter which validates that your queries follow the standard and will work in the major SQL engines, but I don’t see a need for much more than that.
SQL as a language is awesome and I love its declarative nature. But is usage has always suffered from a fatal flaw: there’s no good programming interface for it. For example, how do you programmatically add a WHERE clause to a given base query, say under a certain user-controlled condition? Similarly, how would you condtionally add a JOIN? An ORDER BY? The answer to all of these is string concatenation.
The interface is basically: throw me a string and I’ll tell you if it means anything. It’s awkward to do safely and correctly because, IMO, it exists at the wrong level of abstraction.
There are SQL APIs that are safer and more powerful than string concatenation and that support programmatic query construction. jOOQ is the one I use nowadays but there are plenty of others.
Yeah, but under the hood doesn’t that do string concatenations anyway? The benefit is a better API, but someone still has to make the sausage.
To add on to the OP, some people suggest you do things like
SELECT a, b, c
FROM foo
WHERE %(a)s IS NULL or a = %(a)s;
Aside from the duplication (I think there’s a way to do this with one reference, but I forget), this also tends to be optimized poorly (at least in postgresql). I had several queries like this in my app, but I had to switch to dynamic query construction to avoid pathologic query plans.
The process of constructing a SQL query using a library like jOOQ isn’t fundamentally any different than the process of constructing a human-readable JSON payload using a client library for a network service. Yes, it’s probably doing string concatenation somewhere along the line, but that fact is largely hidden from the calling code.
From a more abstract point of view, at some point, no matter what the in-memory representation is, you have to render your query to a flat sequence of bytes to send it to the database server. You can think of SQL as a human-readable wire format. If you have a safe, feature-rich API to build the query in memory before it gets rendered to the wire format, whether the rendering is done via string concatenation or some other technique shouldn’t be of much concern from the application code’s point of view.
But I suspect you mean something more specific by “string concatenation” that I’m not seeing.
String concatenation isn’t bad so long as you’re building a prepared statement or a client side equivalent that can take care of of the interpolation in a more secure way than the average user of the DB driver can.
I don’t understand this post at all. Who has not been using SQL in some capacity, somewhere? SQL has been pervasive in application development since forever, and is an essential skill, even for non-developers like business analysts. Even when “NoSQL” was the hot new tech everyone talked about, people were still using SQL somewhere in their stack, often for reporting or analytics.
I’m beginning to get the impression that there’s a new generation of developers who actually haven’t been exposed to SQL that much - the last five years have seen plenty of boot camps that teach MongoDB or similar instead of a relational database.
This is correct. I’ve interviewed several people that only know Active Record or Entity Framework and they struggle doing simple joins/aggregations directly in SQL. They know everything about their ORM of choice, migrations, backups and deployments, but that’s all they know. And to be fair, for smaller projects that’s all you need to know.
And I would say that the corollary to that is older developers like myself who got exposed to NoSQL databases in their early days of immaturity and frequent reliability issues. It was an eye opener when I met a group of young developers who perceived MongoDB as this timeless, essential, and totally reliable piece of infrastructure.
I mean, they are not wrong. MongoDB is reliable these days.
Basic things in SQL is easy, even straightforward. But anything advanced, is rapidly far from “easy to learn”, especially when you are doing data analysis. More than once, I made what I could in SQL, then switched to some other language to use my SQL query, and filter/aggregate my results, where it could have been done in pure SQL :x
The thing is that while those things are still not easy in SQL, as a general rule they are vastly easier than how you’d hand-roll these thing if SQL wasn’t a thing in the first place. Obviously, you have certain cases where this hasn’t proven to be the case, but it would be interesting to know what places you hit a wall.
That’s because it’s a leaky DSL, making it a bad DSL.
DSLs need to stay small in scope in order to be useful and not become more complicated than just doing the things they try to abstract away and make easier. (Another example is CSS, where the simple things are simple, but the complex stuff require understanding way too many fundamental rendering/layout concepts.)
So in this case, yeah, SQL is “basically English” for all the small and simple things, and then for everything else, you need to understand why a given sub-select is 10x slower than a join. “No, John, you can’t just sub-select every single select column expression! You need to understand the execution plan is going to do this and that.” And thus, the abstraction of this terrible DSL breaks, and now you’re finding out the right incantations and forms to get the underlying system to do what you want.
So while I personally like relational databases, I can’t stand SQL, and can’t believe we’ve not replaced it yet, or at least an alternative option to use instead.
All abstractions leak. A DSL is an abstraction, so it shouldn’t be a surprise that SQL leaks. That doesn’t make SQL a bad abstraction. What could be said to make SQL a bad abstraction is that it implements a compromised version of relational calculus and has done since the beginning.
What kind of things? SQL’s main expressiveness problem is that subqueries are required to generate new name bindings; the main reason I’ve found to switch out of sql is that sometimes aggregations result in really slow execution which can be performed more quickly by explicitly specifying the algorithm.
I wish there was a single dialect of SQL that could be translated to all other dialects. There’s babelfish, but that seems to only run on postgres
Isn’t that dialect the SQL standard? Not all database engines implement all of the standard, but they all implement enough that there’s a useful common subset there.
Maybe someone should make a linter which validates that your queries follow the standard and will work in the major SQL engines, but I don’t see a need for much more than that.
SQL as a language is awesome and I love its declarative nature. But is usage has always suffered from a fatal flaw: there’s no good programming interface for it. For example, how do you programmatically add a WHERE clause to a given base query, say under a certain user-controlled condition? Similarly, how would you condtionally add a JOIN? An ORDER BY? The answer to all of these is string concatenation.
The interface is basically: throw me a string and I’ll tell you if it means anything. It’s awkward to do safely and correctly because, IMO, it exists at the wrong level of abstraction.
There are SQL APIs that are safer and more powerful than string concatenation and that support programmatic query construction. jOOQ is the one I use nowadays but there are plenty of others.
Yeah, but under the hood doesn’t that do string concatenations anyway? The benefit is a better API, but someone still has to make the sausage.
To add on to the OP, some people suggest you do things like
Aside from the duplication (I think there’s a way to do this with one reference, but I forget), this also tends to be optimized poorly (at least in postgresql). I had several queries like this in my app, but I had to switch to dynamic query construction to avoid pathologic query plans.
The process of constructing a SQL query using a library like jOOQ isn’t fundamentally any different than the process of constructing a human-readable JSON payload using a client library for a network service. Yes, it’s probably doing string concatenation somewhere along the line, but that fact is largely hidden from the calling code.
From a more abstract point of view, at some point, no matter what the in-memory representation is, you have to render your query to a flat sequence of bytes to send it to the database server. You can think of SQL as a human-readable wire format. If you have a safe, feature-rich API to build the query in memory before it gets rendered to the wire format, whether the rendering is done via string concatenation or some other technique shouldn’t be of much concern from the application code’s point of view.
But I suspect you mean something more specific by “string concatenation” that I’m not seeing.
String concatenation isn’t bad so long as you’re building a prepared statement or a client side equivalent that can take care of of the interpolation in a more secure way than the average user of the DB driver can.
A better title for this would be “The Second Coming of SQL”.
People think SQL will die, but no, it never left.
Quite true!