begin shameless plug
These objections echo why I wrote dpdb. I wanted to abstract away the trivial details of different database backends (connection strings, the parameter-quoting style, etc) but still be able to write real SQL queries.
A whole lot of research has gone in to making databases good at manipulating data in situ. Yes you can do that with ORMs, but usually one of two things happens (in my experience):
I’m sure things have improved in the ORM world, but I haven’t used an ORM layer since I started using dpdb.
So yeah, if you want to manipulate a bunch of data without a bunch of round trips and battling a very real impedance mismatch, the time to learn SQL will be well spent.
For people in .net land I can show a neat shameless plug, I have nothing to do with this project but I do love F#.
Rezoom.SQL offers a unified SQL dialect that can query many different kinds of database, SQLite, PGSQL, TSQL etc. This of course is the main argument for the ORM that I can think of. It uses type providers instead of an ORM.
That looks pretty cool. I think Dapper would probably be the most direct .NET analog to what lorddimwit is suggesting, but they look as if they serve slightly different purposes, similar to the normal use of Ruby’s Sequel library v. directly invoking SQL queries and using Sequel purely for the driver abstractions.
I mostly use the Django ORM, and I like it a lot. I also love writing manual Postgres-flavoured SQL for some queries, but I pretty much only write those to test things. My actual Django contains no raw SQL queries.
This is because I think the Django ORM is very mature, and quite flexible. It has syntax for writing subqueries, for limiting which columns to fetch, for writing aggregations, for joins… I don’t really feel like I’m missing out on much, and it helps immensely. I am able to reliably predict what SQL a particular Django queryset will produce.
I don’t really feel like any of the objections in this article apply to my usage of the Django ORM.
I often used sqlalchemy, I don’t know whether it shares the properties that the article criticises, but what I like about sqlalchemy is that it enables me to use the simple orm layer for typical queries, and then it also allows me to build non-orm queries with a query builder to perform more advanced stuff closer to the relational nature of the database (like joining data but only selecting a few columns etc).
Seconding this. Especially since 1.7/1.8, the Django ORM lets you do almost everything you want to do in SQL fairly easily, especially if the result rows look like any of your models. My biggest gripe is that grouping is not an explicit operation but derived off of stuff like values + order_by calls, making it sometimes hard to go from known SQL to the equivalent ORM operation.
It’s also extensible in the right way, letting you add custom operations really easily (if you have a stored procedure called Foo you’re a subclass declaration away from using it in your ORM operations). It lets you delegate all the routing of SELECT identifiers to the machine without being forced to follow a specific data storage model.
There’s some gotchas, but if you “know” SQL they’re not really gotchas, because it’s how the underlying DB works. It’s more of a problem for people who don’t really know how joins work
In my experience, there has always been more nuance between when to use ORMs/query builders/raw queries. There’s no reason why you can’t use an ORM for inserts/updates/simple selects and drop down to raw SQL for more complex queries. Query builders vs. raw SQL is another tradeoff: with query builders you have to learn new syntax, and it makes things more difficult to debug in production ( especially when paged at 3AM ;) ). With raw SQL though, you can write broken/insecure queries.
One thing that happens with ORM, at least in the 20 odd years I’ve been grappling with them, is that they encourage object thinking, which is almost without fail a gigantic detriment to the design of the database. ORM thinking is closely correlated with the “nobody will ever access this data without the gigabytes of runtime” school of thought, and tends to impose severely suboptimal decisions onto the database, which is treated as a simple persistence layer.
In my experience that has less to do with ORMs and more to do with people designing their objects before (or instead of) designing their tables. Hand-rolling all of your SQL won’t save you if you didn’t put the effort into the table-design (although it might alert you to how dogshit your tables are to work with, that assumes you didn’t farm all of the boilerplate queries out to some junior who doesn’t know any better and doesn’t complain, and you’re still stuck refactoring your objects if you started the wrong-way around). It’s the upfront table design that’s truly the important thing, not the mechanics of how queries get written afterwards.
With proper upfront Table design, I’ve found ORMs useful for eliminating a lot of boilerplate scut-work, while still allowing direct access to SQL whenever you need to get smarter with your queries.
But I don’t deny that actually modelling your data before getting started is critical.
Well, sure. But ORMs push the pain further out until it’s probably too late to fix the catastrophe.
Biggest complaint for me is ORMs for updating information on data objects that are deeply nested.
From a performance standpoint, what should be a one-off update_one or UPDATE turns into the clown car of loading in the object, fetching associations, eventually setting a value, and shoving it all back into the DB.
Granted, I also hate deeply-nested objects, soooooo…
Relatedly, transactionally updating a single field via an ORM is a deadlock footgun (since it’ll fetch without using for update, then try to lock the record).
“Don’t use shit libraries” is a little more succinct me thinks. But that’s not really enough waffling to justify a blog post is it.
If your ORM/Query Builder doesn’t give you the option of running your own customised query (either fully hand written or with parts provided by the QB), you should use another one.
I’ve found that ORMs vary a lot. Quite a few produce suboptimal queries and provide no escape hatch to get customized database-engine-independent SQL out. For Python, SQLAlchemy does a nice job of giving you a way to work with objects, to write reasonable queries while referring to those objects, and to drop down to SQL as needed… but most ORMs are not SQLAlchemy.
I think query builders are pretty nifty tech in that their surface area is super small, and help out a bit (though tbh manual joins when you have a lot of relational data is a pain)
What I still haven’t seen but want is some way for us to have client-side plan builders. Instead of giving Postgres an SQL string and for it to pass through an optimiser, I would love to be able to describe a query plan which sidesteps the planner and gives the executor the “ideal” way of handling things for the shape of my data.
I actually know what my data looks like, and in some cases I know that if I could convince the planner to do a thing, it would go much faster. But this is rarely given to us. Given how many people love dropping down to C/assembler for stuff, I’m surprised to have never seen someone try to circumvent SQL to build nicer plans.
Do you experince a lot of query planner inefficiency? I’ve noticed that (at least for MySQL/Percona) it provides decent results as long as you don’t have any low cardinality indexes.
So for 95% of stuff this is absolutely no problem.
There’s a couple really tricky queries where I have chosen to optimise for maintainability of the query/just relying on the ORM over trying to get the machine to do “what I want”. This often ends with me reaching for subqueries that just conceptually can’t be aware of the data properties I’m aware of. Stuff like “well for rows with this property I don’t actually need this subquery result”. But this stuff is hard
I am surprised that I made it to the end of this article without ever reading the words “Data Mapper”.