URL has some garbage at the end.
Real url: http://www.craigkerstiens.com/2016/01/08/writing-better-sql/
Sorry. Thanks to Irene.
A colleague recently convinced me to stop upper-casing SQL keywords, and I found her arguments compelling:
Additionally, any reasonable editor will provide syntax highlighting which makes the same distinction clear in a much less aggressive way. I find I prefer my SQL all-lowercase and readability does not suffer.
I agree, I’ve been writing SQL in lowercase for a few years, and it’s far easier to read - it looks more like regular code. Even without syntax highlighting, it can be very readable if it’s formatted sensibly - e.g. I start each major block of a query on a new line (select/from/join/where/order etc.). Syntax highlighting can be quite helpful though.
I jotted down a few more thoughts on readable SQL here: http://korban.net/2017/02/writing-readable-sql/
While we’re on the subject, does anyone have a great recommendation for a book on SQL? I don’t have to write a ton of super complex queries in my job, but once every month or two, some task calls for a good bit of SQL writing, and I’d like to get a better foundation that just “what I’ve picked up over the years plus Google”.
Not a book recommendation but a couple pieces of advice which helped me shift out of the procedural mindset:
Think about the problem in terms of sets and transformations rather than individual instances.
When formulating a query start with the SELECT and write a list of what you want to see in the result set. This is the goal you’re working towards. Add in appropriate joins to build a set of the data you require. This is your starting point. Figure out the intermediate set transformations required to get from start to finish. Coincidentally this made the ordering of SELECT, FROM, and WHERE click. I was previously thinking in terms of FROM, WHERE, and SELECT.
Hopefully that’s not too elementary. Coming from a similar background I’d never really had that spelled out to me.
I came across the advice in Joe Celko’s SQL for Smarties, which I think is probably too basic for your needs. I haven’t read anything else by him so can’t vouch but “Joe Celko’s Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL” might be helpful? I’ve also heard good things about “The Essence of SQL” but it’s out of print so good luck finding a copy!
I find it amazing how differently you approach this than I do, yet I would assume, we would still end up writing very similar queries.
How do you approach it?
I tend to think of the database like a tree or a perhaps a map (as in google not reduce). I look at the fields I know I need to return, and then mentally map them across the database. I start my query by working from the trunk, or root node if you don’t like wood, of my tree and then map out the query from there. The root node isn’t always the same table; so that can vary based upon the required fields. After selecting all the fields I need from the root, I proceed to join the next table required by my SELECT. That isn’t always direct, and many times, there are tables in between. The process repeats till I have all the tables I need.
This line of thinking has lended itself well to pretty much every dataset I’ve encountered. Words like “set”, “transformation”, and “instance” never even crossed my mind.
Now obviously words like “set” and “instance” have a great deal of meaning in database land, but as far as writing queries go, those aren’t words I tend to think of.
I use CTEs a lot in Postgres, so I find that I work towards the final result in a different way, more like I would in code - by treating the query as a series of data transformations. So, for example, if I want to produce a result for a set of users on a range of days, I write a CTE to get the users, then another one to generate the range of dates, then another one to combine them to get all possible “user-days”, then another to retrieve the data for each and process it, and so on.
This results in very readable queries - way better than having subqueries all over the place. There are performance caveats to using CTEs so sometimes I have to structure the query differently, but it works well for a lot of them.
The docs for Postgres are amazing, and a good resource for this. It will call out which parts of SQL it’s explaining are Postgres-specific/not in the SQL standard.
I’m a very from first principles thinker and so I hope this recommendation isn’t off the mark for your needs. I really liked “Relational Theory for Computer Professionals” by C.J. Date. The book is roughly broken up into parts, the first is an introduction to relational theory and what that’s all about. This is the best intro to relational algebra that I’ve ever seen, a close second is the Stanford Database online class (you can just study the SQL part of the course). The second part of the book takes what you now know about relational algebra and shows how it fits with SQL.
This helped me peel away some of the syntactic confusion around SQL syntax and made the whole concept make more sense.
My 2 favorite resources over the years have been: “SQL for web nerds: http://philip.greenspun.com/sql/” and the official postgres docs.
You can also check this book : https://twitter.com/BonesMoses/status/832983048266330113 and his blog how is full of very good sql
I have to disable stylesheets to get the code samples in the article to use a fixed width font, hence lign up nicely.
But, some useful suggestions.
Coming from Rails land, where it’s expected you’ll be doing all of your database stuff through ActiveRecord, I’ve been increasingly fascinated by how people think about SQL. It’s really nice not to splay out SELECT foo FROM bar WHERE baz=1 everywhere in the codebase, but it often goes too far the other way: people assume that you should never use raw SQL in the business logic. That quickly turns optimization problems into architecture problems. Code just can’t give you the same flexibility a database does and past a point your only option is “throw more servers at it”. I worked for one company where the ruby couldn’t handle a 50 meg data set so we started throwing it into a Hadoop cluster. It took embarrassingly long for any of us to realize we could just use window functions.
SELECT foo FROM bar WHERE baz=1
For those unfamiliar CTEs are like a view that exist just for the duration of that query being executed. You can have them reference previous CTEs so you can gradually build on them, much like you would code blocks. I won’t repeat too much of what I’ve already written about them, but if you’re unfamiliar with them or not using them they are a must. CTEs are easily one of the few pieces of SQL that I use on a daily basis.
My work project uses a MySQL database and I really miss both CTEs and window functions :(