My opinion is that SQL should be for computers to write. This would put it firmly in the same category as LLVM IR.
I dunno man. I been exposed to ORMs since Enterprise Java Beans pre-relase (around 1997). Hand writing SQL queries always feels better to me. I think iBATIS (later donated to Apache Foundation), was on to something back in 2002. Write SQL manually, then derive binding types from the queries. Bottom-up instead of top-down.
As I gain more and more experience with SQL, I tend to agree with you: the less I want it abstracted away from me. But I don’t think I could have arrived there without an abstraction. Instead of “SQL should be for computers to write” or “hand writing SQL queries always feels better”, the answer—like with most things—probably lies somewhere in the middle.
I am intrigued by your sql -> bindings idea, though and I had not heard of iBATIS.
Check out https://sqlc.dev/. I’ve been using this recently and it seems to be the most natural fit. I’ve used plenty of ORMs and I’ve done plenty of “raw” SQL over the years. I love the type safety of the ORM, but I tend to dislike the queries it writes. And I love the flexibility of writing my own queries, but I miss the type safety (and dislike writing the boilerplate that inevitably comes with it). sqlc bridges that nicely.
But what you want is a DSL that is as close to SQL as possible but making use of the typesystem and allowing for typesafe extraction and combination of parts - while taking care of encoding/decoding results.
In the Java world, JOOQ gets pretty close to that. It can look like this:
Why do these tools insist on taking over schema definition? This never works well in practice. I want a type-safe orm that will use standard ddl and generate whatever it needs from that.
I applaud type-safe approaches. It should be possible to statically verify your queries and results.
other tools/languages. It is very rare that particular orm backend is the only thing that reads/write data.
multiple environments with different schema versions (or even partially migrated or incompatible)
coupling of application and data management. This gets especially difficult considering above.
richness of real-world ddls (including platform specific definitions). Orms rarely have capacity to declare everything I can manually
when things hit the fan it is extremely difficult to reconcile/diff your code and your real production db backup from 3 weeks ago
So insisting on having full master view of your data schema in code works only for simplest of projects imo. Not when you have dozens of developers adding tables/fields daily and somebody trying to run analytics and someone else to migrate this data to/from another store.
I see. The main one that seems like an issue to me would be the second. Personally I don’t really use ORMs ever so that’s why I asked - I don’t know what problems they cause because I never saw much value anyway.
Oh, interesting. How would you avoid downtime without an orm? I assume you just run migrations independently from the application so you’re not in a “application will start once migrations are done” state?
I have seen one codebase which used named stored procedures for pretty much every query.
It actually worked better than it sounds (they had good tests, etc) - migrations which changed the schema could also replace any affected procedures with a version that used the new tables.
Not sure I’d want to use that approach, but it kept a nice stable api between the application and the database.
I personally think this is the best general approach for DB interfacing, with versioning applied to the named stored procedures for when their API’s need changing. But avoiding downtime when migrating also means just being really careful with what the migrations are even doing in the first place.
There could be an argument for the idea that an automated migration system could automatically write less intrusive migrations than an average naive developer might, but I haven’t seen this borne out in practice.
How you run migrations isn’t necessarily tied to what ORM you use. Generally speaking, a high-level overview of the approach taken is as follows:
Deploy code changes to some host dedicated to running migrations (this could just be a CI server)
Run the migrations
Once done, deploy the code changes to your servers
Optionally, run some post deployment cleanups
There have been various ORMs over the decades that provide some sort of “automatically migrate the DB to the current code based schema” feature. I’ve however never seen or heard of one being used at any noteworthy scale, likely because migrating data is a lot more than just “add this new column”.
Another limitation of most of these migration tools that I didn’t see problematic before “the trenches” is linearity. Something like this has happened at $work-1:
migration 421 has been applied everywhere but becomes problematic in production. Because “very important client” the ops team keeps pushing and the solution always seems around the corner
couple days later it becomes apparent that we’re hitting some internal Aurora limitation and things need to be reworked drastically
meanwhile prolific developers have already contributed 20+ migrations that have all been applied to all staging and what not environment asking when they can run theirs
So you’re in a hard spot regarding what to do. If I recall we made 421 a no-op and people had to manually clean up the mess.
It’s a mess, I believe relational data models and 100% uptime across upgrades are fundamentally not compatible. In general I’m not convinced loose coupling of the schema and the application is even possible to do sustainably at scale. You can try double-writing relational data with both the old and the new schema but it’s not really a scalable approach for larger applications, it’s just too hard to correctly do especially when critical data is simply unrepresentable in the old schema.
I suspect this is a big part of why nosql took off. If you store things as JSON objects you at least have a really janky way to do evolution. You can also store events you receive as JSON and reprocess them with both the old and the new application versions, modulo dealing with updates to the JSON schema itself (which might be easier in practice).
In our experience, it takes a strong team, but it can be done. Generally, you consider the DB an application all its own and treat it like it is. You use something like PgTAP for testing, you have DDL checked into a VCS like git along with something like Liquibase to apply and revert. You have dev and prod environments, etc.
To avoid the thrashing of adding and removing columns all the time, We add an ‘extras’ JSONB column to every table where it might remotely make sense(which is most of them). This way app’s and users can shove extra stuff in there to their hearts content. When it’s needed outside of that particular app, we can then take the time to migrate it to a proper column. We also use it all the time for end-user data that we don’t care about.
Always type and constraint check columns at the DB level too. REFENCES(FK’s) are the bare minimum. CHECK() and trigger function checks are your friend. This forces applications to not be lazy and shove random junk in your DB.
We also liberally use views and triggers to make old versions exist as needed until some particular app can get updated.
Use the built in permission system. Ideally every app doesn’t get it’s own user to the DB and instead logs in as the end user to the DB, so we can do row and column level access granularity per user even through the application.
We also make a _nightly copy of the DB available, which is a restore from last nights backup put into production. This makes sure your backups work and gives devs(of DB or app variety) access to test stuff being mean in production, without having to actually abuse production. Consider an _hourly if you need it too.
This is mostly PostgreSQL specific, but similar techniques probably exist in other DB’s.
Yeah I tend to use nosql other than for really, really simple stuff in postgres where migrations are uncommon. But at work we use Rails so I see tons of model updates, but I haven’t done it much myself.
It is very rare that particular orm backend is the only thing that reads/write data.
multiple environments with different schema versions (or even partially migrated or incompatible)
I do understand that if you’re in this situation ORMs are going to mess you up in a lot of cases. But there’s a lot of systems out there with a single database, all written to by a single piece of software (sometimes even just a single instance of that software!) where these issues just don’t show up.
I do think there’s a big gap in what those teams need and what you need.
Since you’re here, I’m wondering how much you’ve thought about Rust type system scalability and error message issues. One of the things I’ve realized is that in larger applications, Diesel’s aggressive use of Rust’s type system leads to issues in two areas: with rustc and rust-analyzer performance, and with error messages when you get something wrong. A lot of my adventures with Diesel end up being spending 15 minutes writing the query and 8 hours trying to get rustc to accept it, with lots of inscrutable error messages and zero actual help from the compiler.
When writing queries in rust-query I have run into two types of error messages:
Errors that basically say that the column you selected has the wrong type. It is not the most pretty error, but you will recognize it instantly and it is easy to solve. It looks something like this:
type mismatch resolving <Column<'_, Schema, i64> as Typed>::Typ == f64
Errors indicating that you used a column in a scope where it is not valid. This happens for example when trying to Use a column from an outer query scope in an aggregate without using filter_on. It is easy to solve if you know the scoping rules. The error will point out where you made the mistake and says something like:
error[E0521]: borrowed data escapes outside of closure
--> examples/blog.rs:56:13
|
53 | let story = Story::join(rows);
| ----- `story` declared here, outside of the closure body
54 | let avg_rating = aggregate(|rows| {
| ---- `rows` is a reference that is only valid in the closure body
55 | let rating = Rating::join(rows);
56 | rows.filter(rating.story().eq(&story));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `rows` escapes the closure body here
|
This should tell you to use rows.filter_on instead of rows.filter, because filter_on will accept a value from the outer scope.
As for how it scales, I think it scales quite well as the number of generated types is linear with the number of tables you have. It does generate types for every version of your schema so that is something to watch out for. Luckily you only need to include schema versions that you actually want to support migrating from.
I dunno man. I been exposed to ORMs since Enterprise Java Beans pre-relase (around 1997). Hand writing SQL queries always feels better to me. I think iBATIS (later donated to Apache Foundation), was on to something back in 2002. Write SQL manually, then derive binding types from the queries. Bottom-up instead of top-down.
As I gain more and more experience with SQL, I tend to agree with you: the less I want it abstracted away from me. But I don’t think I could have arrived there without an abstraction. Instead of “SQL should be for computers to write” or “hand writing SQL queries always feels better”, the answer—like with most things—probably lies somewhere in the middle.
I am intrigued by your sql -> bindings idea, though and I had not heard of iBATIS.
Check out https://sqlc.dev/. I’ve been using this recently and it seems to be the most natural fit. I’ve used plenty of ORMs and I’ve done plenty of “raw” SQL over the years. I love the type safety of the ORM, but I tend to dislike the queries it writes. And I love the flexibility of writing my own queries, but I miss the type safety (and dislike writing the boilerplate that inevitably comes with it). sqlc bridges that nicely.
It sounded similar to JOOQ (https://www.jooq.org/ ), so I looked up how do these relate: https://news.ycombinator.com/item?id=41498268
Interesting
Does sqlc only work with Go? It seems the same analysis could yield types for a variety of languages.
It supports several languages and has a plugin API, so Rust could be supported as well (it’s not currently).
You don’t want to write SQL manually.
But what you want is a DSL that is as close to SQL as possible but making use of the typesystem and allowing for typesafe extraction and combination of parts - while taking care of encoding/decoding results.
In the Java world, JOOQ gets pretty close to that. It can look like this:
This allows to do things like
And it ensures that you don’t mess up and combine SQL parts that actually cannot work together.
I’m not sure what the general term for this type of library is, but it is definitely not ORM.
Just saw this today: this is what I meant what you really want right? (but in typescript)
https://news.ycombinator.com/item?id=42334379
Why do these tools insist on taking over schema definition? This never works well in practice. I want a type-safe orm that will use standard ddl and generate whatever it needs from that.
I applaud type-safe approaches. It should be possible to statically verify your queries and results.
Why doesn’t it work well in practice?
I saw several reasons in past projects:
So insisting on having full master view of your data schema in code works only for simplest of projects imo. Not when you have dozens of developers adding tables/fields daily and somebody trying to run analytics and someone else to migrate this data to/from another store.
I see. The main one that seems like an issue to me would be the second. Personally I don’t really use ORMs ever so that’s why I asked - I don’t know what problems they cause because I never saw much value anyway.
I forgot really big one: migrations take time. If it is coupled to application server: your application is down during migration.
Oh, interesting. How would you avoid downtime without an orm? I assume you just run migrations independently from the application so you’re not in a “application will start once migrations are done” state?
I have seen one codebase which used named stored procedures for pretty much every query.
It actually worked better than it sounds (they had good tests, etc) - migrations which changed the schema could also replace any affected procedures with a version that used the new tables.
Not sure I’d want to use that approach, but it kept a nice stable api between the application and the database.
I personally think this is the best general approach for DB interfacing, with versioning applied to the named stored procedures for when their API’s need changing. But avoiding downtime when migrating also means just being really careful with what the migrations are even doing in the first place.
There could be an argument for the idea that an automated migration system could automatically write less intrusive migrations than an average naive developer might, but I haven’t seen this borne out in practice.
How you run migrations isn’t necessarily tied to what ORM you use. Generally speaking, a high-level overview of the approach taken is as follows:
There have been various ORMs over the decades that provide some sort of “automatically migrate the DB to the current code based schema” feature. I’ve however never seen or heard of one being used at any noteworthy scale, likely because migrating data is a lot more than just “add this new column”.
Another limitation of most of these migration tools that I didn’t see problematic before “the trenches” is linearity. Something like this has happened at $work-1:
So you’re in a hard spot regarding what to do. If I recall we made 421 a no-op and people had to manually clean up the mess.
It’s a mess, I believe relational data models and 100% uptime across upgrades are fundamentally not compatible. In general I’m not convinced loose coupling of the schema and the application is even possible to do sustainably at scale. You can try double-writing relational data with both the old and the new schema but it’s not really a scalable approach for larger applications, it’s just too hard to correctly do especially when critical data is simply unrepresentable in the old schema.
I suspect this is a big part of why nosql took off. If you store things as JSON objects you at least have a really janky way to do evolution. You can also store events you receive as JSON and reprocess them with both the old and the new application versions, modulo dealing with updates to the JSON schema itself (which might be easier in practice).
In our experience, it takes a strong team, but it can be done. Generally, you consider the DB an application all its own and treat it like it is. You use something like PgTAP for testing, you have DDL checked into a VCS like git along with something like Liquibase to apply and revert. You have dev and prod environments, etc.
To avoid the thrashing of adding and removing columns all the time, We add an ‘extras’ JSONB column to every table where it might remotely make sense(which is most of them). This way app’s and users can shove extra stuff in there to their hearts content. When it’s needed outside of that particular app, we can then take the time to migrate it to a proper column. We also use it all the time for end-user data that we don’t care about.
Always type and constraint check columns at the DB level too. REFENCES(FK’s) are the bare minimum. CHECK() and trigger function checks are your friend. This forces applications to not be lazy and shove random junk in your DB.
We also liberally use views and triggers to make old versions exist as needed until some particular app can get updated.
Use the built in permission system. Ideally every app doesn’t get it’s own user to the DB and instead logs in as the end user to the DB, so we can do row and column level access granularity per user even through the application.
We also make a
_nightlycopy of the DB available, which is a restore from last nights backup put into production. This makes sure your backups work and gives devs(of DB or app variety) access to test stuff being mean in production, without having to actually abuse production. Consider an_hourlyif you need it too.This is mostly PostgreSQL specific, but similar techniques probably exist in other DB’s.
Yeah I tend to use nosql other than for really, really simple stuff in postgres where migrations are uncommon. But at work we use Rails so I see tons of model updates, but I haven’t done it much myself.
I do understand that if you’re in this situation ORMs are going to mess you up in a lot of cases. But there’s a lot of systems out there with a single database, all written to by a single piece of software (sometimes even just a single instance of that software!) where these issues just don’t show up.
I do think there’s a big gap in what those teams need and what you need.
Eh, these tools never really save you from knowing and understanding SQL. You just feel like you’re writing SQL with oven mitts on.
Curious, why not call it an ORM? (or something analogous)
Does this hide primary and foreign keys? That’s… Not good?
I would like to know how it compares to Diesel
Check https://www.reddit.com/r/rust/comments/1gzh7w4/announcing_rustquery_making_sqlite_queries_and/ for previous discussion on that line
Just make sure to expose the (inevitably required) escape hatch, for when the ORM abstraction breaks down in some corner case.
Hi, I will probably give access to the underlying
rusqlite::Transaction, but hide it behind a feature flag.Hey, nice work and congratulations!
Since you’re here, I’m wondering how much you’ve thought about Rust type system scalability and error message issues. One of the things I’ve realized is that in larger applications, Diesel’s aggressive use of Rust’s type system leads to issues in two areas: with rustc and rust-analyzer performance, and with error messages when you get something wrong. A lot of my adventures with Diesel end up being spending 15 minutes writing the query and 8 hours trying to get rustc to accept it, with lots of inscrutable error messages and zero actual help from the compiler.
When writing queries in rust-query I have run into two types of error messages:
type mismatch resolving <Column<'_, Schema, i64> as Typed>::Typ == f64filter_on. It is easy to solve if you know the scoping rules. The error will point out where you made the mistake and says something like:This should tell you to use
rows.filter_oninstead ofrows.filter, becausefilter_onwill accept a value from the outer scope.As for how it scales, I think it scales quite well as the number of generated types is linear with the number of tables you have. It does generate types for every version of your schema so that is something to watch out for. Luckily you only need to include schema versions that you actually want to support migrating from.
See also Squeal for Haskell
Comments also on reddit https://www.reddit.com/r/rust/comments/1gzh7w4/announcing_rustquery_making_sqlite_queries_and/