We put a fair bit of our application in the DB and we like it.
The DB handles all of our authentication/authorization using PG’s access controls. We run web and PyQT UI’s plus console scripts against the application. Not to mention we happily hand out PG connection info to power users to go abuse the DB themselves without worry.
We do use Python as the DB language, thanks to PG. Testing is a touch more complex, but not remotely hard or anything, the data going in and out is usually easy to reason about and easy enough to get at when it’s not. It’s pretty easy to build a test harness for it. With tools like Liquibase, updating the schema’s and functions are easy.
The biggest downsides I’ve come across:
Most tools(especially SQL reporting tools) are not built such that they expect the DB to hold and manage user accounts.
Your DB server tends to want more resources, but scaling for most applications isn’t that big of a deal.
Sometimes tracking down the stupid lock or debugging the troubled SQL statement can get annoying(but this is true of all SQL DB’s once you get past treating it as a KV store).
DB upgrades are a bit more troublesome sometimes.
Overall quite happy. This sort of thing doesn’t always work well, it totally depends on the application. It’s not a one-sized fits all solution, but nothing is.
The biggest performance issues I have had with relational databases, seem to be caused by long-lived transactions.
Applications starting a transaction, acquiring a lock over some data then taking their sweet time to commit.
While the most problematic examples tend to involve the application making an external service call while holding a transaction.
There are also times when the app is simply performing logic that could be possible to do in a DB function.
Functions can reduce the amount of round trips and latency to execute certain transactions.
Rather than versioning your stored functions separately, you could create temporary functions each time you want to use one, it might be a bit slower, but it solves some of the problems pointed out.
sharing a database between applications is a bad idea for many reasons, and sharing a database where multiple apps may be writing to it is borderline sin (who owns the schema? how do you coordinate app deploys across schema changes?).
In short, you use views to expose different versions of the schema. Just like a REST API server can serve multiple versions of its API, a database can be made to “serve” different versions of the data.
Appart from that, it feels to me like most “complains” the author has is about triggers?
Dealing with schema migrations and independent deployment of different apps is the real tricky bit here, IIUC. We have this at our current project at work too: there are two applications which both are allowed to write to the db.
We decided to use a monorepo for the apps, and make them share the schema and migrations (so there’s shared ownership), such that deploying any of the two apps will migrate the schema. This requires great care for backwards compatibility between schema versions. We ensure that each release’s schema changes are compatible with the previous release. This means we temporarily need to add triggers, default values, nullability where we don’t really want it etc. This is ensured by always running our test suite of the previous version against the current db schema in CI.
But it means we always need to deploy both apps within a certain range of versions. When one of the apps might need a new schema version that doesn’t work yet with the other app, we need to upgrade the other app first as well.
None of this is really unique to DB’s though. If you have tight integration(s), you have to deal with this, regardless of where the tight integration happens.
Someone somewhere has to care about the data, you can have the applications manage it or make the DB do it, where at least the mess is all in one place. With things like check() one can go really far in making sure the data is reasonably sane, most of the time.
What magJ is saying is you don’t have to work as hard as you were working. Presume you have some shared data table users. For application tootie, you would have a view: tootie_users and for application tito, you would have a view tito_users.
Then you can bang on the schema of either one whenever you want. Occasionally you might screw something up and have to babysit stuff like you mentioned, but if tootie only plays with the tootie view and tito only plays with the tito view, life is generally good.
I don’t have too much to add (will read article later..), but I want to cross reference a thread on this topic which got a lot of responses from lobster community “how much logic should I keep at the database vs application layer?”
This was a good article that touched upon most of the issues I’ve encountered in practice as well. In general I’d advocate leveraging the db (i.e., do not avoid custom SQL where needed) but from the application, unless that would lead to potential inconsistencies in the db. Try to enforce constraints within the db as much as you can, using triggers if needed.
Only when those triggers actually result in bad performance you can move the logic to a consistency check task that you run nightly, with alerts going out when there’s an inconsistency detected. Make sure to run the task at the end of all your tests to get the same benefit of erroring out when invariants are violated. Debugging is harder when this happens, because it’s not as the instant error you get from a trigger.
This paragraph is odd. It starts out claimiing too be about debug ability and then talks about code completion which isn’t exactly close to the topic. It also talks about print debugging which is fair, but also seems a bit odd given one usually does look at the data anyways. If course you can output messages but data related code in the database is usually very different and it’s not like your goal is to write a web server on the database.
Opaque side effects
While this part is fair enough to mention that the same problem exists when using active records i think one can say that for any code that has any kind of data changing side effect you intend to have. I don’t really understand where is is supposed to lead to. Yes, you are not going to have side effects if there is no code? Am I missing something?
Deployment and versioning
This argument also seems to be kind of week. It again starts out mentioning that you have migrations, which certainly is versioning especially in the case if stored procedures where you have the complete version. It says that you can’t git blame, while yes you can. Do you want rationales for changes? Put them in git or as a comment. Do you want a diff between versions? Just diff between migrations.
Overall I think the article misses the discussion at points. Bad code and bad practices as well as bugs can be created inside or outside of the database. Yes, having code inside or outside the databases is a technical decision that should be made in an informed context dependent manner.
I am not sure what the authors experiences were but it very much sounds like he just has seen consequences of bad decisions.
The problem with this discussion is that people make it a case of always or never which is nonsense. For many pro “code in the database” people it’s not even just about stored procedures, functions or triggers, but not querying, doing trivial updates and then saving data again when it just makes it harder to read, usually breaks transactions or makes them long running and in general is a huge source of errors when it could be one simple single in database operations.
Regarding the ACID and transactions part. This sounds like it could have been some miscommunication. Yes you might use triggers for consistency in data under certain circumstances. This isn’t about ACID really though. Some of this might be covered in the “deep consistency”part.
About multiple applications accessing one database. I mean it’s kind of what databases were made for. While I understand the sentiment it very much depends on the circumstances. I really wouldn’t just call it a sin for each scenario. But it also very much depends on what you consider one application. One process? I guess not. One codebase? Maybe. What about your migrations tool? Probably an exception. What about read only scripts for metrics, reporting? What about a huge amount of software that doesn’t go through HTTP at big companies? What about big data analysis?
I think it’s unintended because the intro and descriptions as well as somewhat opposing sounding arguments make it clearer, but the “good arguments”, “bad arguments” also aren’t so clear cut. They can’t really be applied as general rules either.
While I’m not sure whether I like it or not one can see this in the PostgREST project that appears to have the goal of just potting everything into the database, but doesn’t just go with a lot of bad decisions that even people who barely put code into the database make. Ob top of that it leaves quite some things open so you can go multiple routers.
Again I’m not sure how I feel about PostgREST but it makes it pretty clear that it’s not some clear cut thing even for pros and cons cause it can depend on the details. I also think the stored procedures part is only one facet. A much bigger problem is that ORMs make people do five queries with zero guarantees, sometimes even when taking care to explicitly writing code for abstractions instead of writing a simple query that does not need any explicit transaction.
I’ve seen multiple instances ever that was argued to be best practices, because ORMs are good practice and code in databases is bad.
So that’s a lot of words for trying to say that we should be careful to create dogmas less experienced people might adapt.
… sharing a database between applications is a bad idea for many reasons, and sharing a database where multiple apps may be writing to it is borderline sin…
That may be, but to me usually the argument is that multiple applications use the same database over different time periods. For example, today you may have a Python app using your database. Tomorrow you may decide to migrate to Java for various reasons but keep the same SQL database. If the database is handling the data validation and most of the business logic, the migration will be fairly simple. Whereas if the application were doing that, you would need to rewrite all of it.
We put a fair bit of our application in the DB and we like it.
The DB handles all of our authentication/authorization using PG’s access controls. We run web and PyQT UI’s plus console scripts against the application. Not to mention we happily hand out PG connection info to power users to go abuse the DB themselves without worry.
We do use Python as the DB language, thanks to PG. Testing is a touch more complex, but not remotely hard or anything, the data going in and out is usually easy to reason about and easy enough to get at when it’s not. It’s pretty easy to build a test harness for it. With tools like Liquibase, updating the schema’s and functions are easy.
The biggest downsides I’ve come across:
Overall quite happy. This sort of thing doesn’t always work well, it totally depends on the application. It’s not a one-sized fits all solution, but nothing is.
Out of curiosity, why Python instead of a trusted extension like Perl or v8?
We use PyQT as the front-end for the “native” application, and lots of Python everywhere else, so it’s just easier to keep it all Python.
The biggest performance issues I have had with relational databases, seem to be caused by long-lived transactions. Applications starting a transaction, acquiring a lock over some data then taking their sweet time to commit.
While the most problematic examples tend to involve the application making an external service call while holding a transaction. There are also times when the app is simply performing logic that could be possible to do in a DB function.
Functions can reduce the amount of round trips and latency to execute certain transactions.
Rather than versioning your stored functions separately, you could create temporary functions each time you want to use one, it might be a bit slower, but it solves some of the problems pointed out.
In short, you use views to expose different versions of the schema. Just like a REST API server can serve multiple versions of its API, a database can be made to “serve” different versions of the data.
Appart from that, it feels to me like most “complains” the author has is about triggers?
Dealing with schema migrations and independent deployment of different apps is the real tricky bit here, IIUC. We have this at our current project at work too: there are two applications which both are allowed to write to the db.
We decided to use a monorepo for the apps, and make them share the schema and migrations (so there’s shared ownership), such that deploying any of the two apps will migrate the schema. This requires great care for backwards compatibility between schema versions. We ensure that each release’s schema changes are compatible with the previous release. This means we temporarily need to add triggers, default values, nullability where we don’t really want it etc. This is ensured by always running our test suite of the previous version against the current db schema in CI.
But it means we always need to deploy both apps within a certain range of versions. When one of the apps might need a new schema version that doesn’t work yet with the other app, we need to upgrade the other app first as well.
None of this is really unique to DB’s though. If you have tight integration(s), you have to deal with this, regardless of where the tight integration happens.
Someone somewhere has to care about the data, you can have the applications manage it or make the DB do it, where at least the mess is all in one place. With things like check() one can go really far in making sure the data is reasonably sane, most of the time.
What magJ is saying is you don’t have to work as hard as you were working. Presume you have some shared data table users. For application tootie, you would have a view: tootie_users and for application tito, you would have a view tito_users.
Then you can bang on the schema of either one whenever you want. Occasionally you might screw something up and have to babysit stuff like you mentioned, but if tootie only plays with the tootie view and tito only plays with the tito view, life is generally good.
It strikes me that most back office APIs are this but with a heavy layer of REST instead of a simple table view.
LOL. It’s amazing what a DB can do when you learn more about their capabilities!
I don’t have too much to add (will read article later..), but I want to cross reference a thread on this topic which got a lot of responses from lobster community “how much logic should I keep at the database vs application layer?”
This was a good article that touched upon most of the issues I’ve encountered in practice as well. In general I’d advocate leveraging the db (i.e., do not avoid custom SQL where needed) but from the application, unless that would lead to potential inconsistencies in the db. Try to enforce constraints within the db as much as you can, using triggers if needed.
Only when those triggers actually result in bad performance you can move the logic to a consistency check task that you run nightly, with alerts going out when there’s an inconsistency detected. Make sure to run the task at the end of all your tests to get the same benefit of erroring out when invariants are violated. Debugging is harder when this happens, because it’s not as the instant error you get from a trigger.
This paragraph is odd. It starts out claimiing too be about debug ability and then talks about code completion which isn’t exactly close to the topic. It also talks about print debugging which is fair, but also seems a bit odd given one usually does look at the data anyways. If course you can output messages but data related code in the database is usually very different and it’s not like your goal is to write a web server on the database.
While this part is fair enough to mention that the same problem exists when using active records i think one can say that for any code that has any kind of data changing side effect you intend to have. I don’t really understand where is is supposed to lead to. Yes, you are not going to have side effects if there is no code? Am I missing something?
This argument also seems to be kind of week. It again starts out mentioning that you have migrations, which certainly is versioning especially in the case if stored procedures where you have the complete version. It says that you can’t git blame, while yes you can. Do you want rationales for changes? Put them in git or as a comment. Do you want a diff between versions? Just diff between migrations.
Overall I think the article misses the discussion at points. Bad code and bad practices as well as bugs can be created inside or outside of the database. Yes, having code inside or outside the databases is a technical decision that should be made in an informed context dependent manner.
I am not sure what the authors experiences were but it very much sounds like he just has seen consequences of bad decisions.
The problem with this discussion is that people make it a case of always or never which is nonsense. For many pro “code in the database” people it’s not even just about stored procedures, functions or triggers, but not querying, doing trivial updates and then saving data again when it just makes it harder to read, usually breaks transactions or makes them long running and in general is a huge source of errors when it could be one simple single in database operations.
Regarding the ACID and transactions part. This sounds like it could have been some miscommunication. Yes you might use triggers for consistency in data under certain circumstances. This isn’t about ACID really though. Some of this might be covered in the “deep consistency”part.
About multiple applications accessing one database. I mean it’s kind of what databases were made for. While I understand the sentiment it very much depends on the circumstances. I really wouldn’t just call it a sin for each scenario. But it also very much depends on what you consider one application. One process? I guess not. One codebase? Maybe. What about your migrations tool? Probably an exception. What about read only scripts for metrics, reporting? What about a huge amount of software that doesn’t go through HTTP at big companies? What about big data analysis?
I think it’s unintended because the intro and descriptions as well as somewhat opposing sounding arguments make it clearer, but the “good arguments”, “bad arguments” also aren’t so clear cut. They can’t really be applied as general rules either.
While I’m not sure whether I like it or not one can see this in the PostgREST project that appears to have the goal of just potting everything into the database, but doesn’t just go with a lot of bad decisions that even people who barely put code into the database make. Ob top of that it leaves quite some things open so you can go multiple routers.
Again I’m not sure how I feel about PostgREST but it makes it pretty clear that it’s not some clear cut thing even for pros and cons cause it can depend on the details. I also think the stored procedures part is only one facet. A much bigger problem is that ORMs make people do five queries with zero guarantees, sometimes even when taking care to explicitly writing code for abstractions instead of writing a simple query that does not need any explicit transaction.
I’ve seen multiple instances ever that was argued to be best practices, because ORMs are good practice and code in databases is bad.
So that’s a lot of words for trying to say that we should be careful to create dogmas less experienced people might adapt.
That may be, but to me usually the argument is that multiple applications use the same database over different time periods. For example, today you may have a Python app using your database. Tomorrow you may decide to migrate to Java for various reasons but keep the same SQL database. If the database is handling the data validation and most of the business logic, the migration will be fairly simple. Whereas if the application were doing that, you would need to rewrite all of it.