I love Postgresql, and I’m really grateful for posts like this to balance my opinion and present a well-argumented counter-argument. However, when I read posts like this I mention a pattern: all these downsides seem to be relevant to really huge, perfomance-sensitive projects with enormous loads. The kind of projects where you likely have over a hundred of developers, and probably should move away from one master relational database to micro-services and message queues as a backbone of your architecture.
What I’m saying, this kind of criticism just highlights the fact that Postgresql is probably the best choice if you’re smaller than that, both in terms of load and team size.
I almost hit the XID wraparound (I was 2-3 days from an outage; it was bad enough that AWS emailed me) when my company only had 4-5 devs and 100 customers. And I’ve hit connections-related performance issues at least four or five times through the last five years, at relatively modest scale (<100 application servers, in an overwhelmingly read-heavy workload). This affected us as recently as yesterday, as we are bringing a Timescale DB (which is a Postgres extension) into production and we are faced with the prospect of tiered pgBouncers or breaking the Timescale access into its own microservice.
I love Postgres but these are real and they can hit earlier than you expect.
I love Postgres, and I was strongly considering moving to it…but for our use case, it simply requires too much care and feeding. We ship databases to customers in appliance form, meaning all the maintenance has to be invisible and automatic and Postgres simply isn’t there.
Having worked for a company that did that and been responsible for Postgres tuning, I say it can be done. In nearly 15 years of shipping out a postgresql db as part of an appliance, I have not seen any of these problems.
Edit: Except for long upgrade times. That one is a PITA.
Just A Data Point. A weird acronym I picked up from old-timey online fora like USENET, and The Well.
I probably can’t say too much more about my experience postgres tuning as.
It was for a company, and might be considered propietary information.
It was about 5 years ago and I really don’t recall all that well what I did.
sorry, just know that these are really rare problems if you’re dealing with the limited scale inherent in incorporating postgresql as part of an appliance. Most of them deal with syndication, or ginormous tables. They’re web-scale problems, not appliance scale problems
I do this as well. There’s definitely a discovery period but I’ve reached the point that I almost never have to check anything on the database side for roughly 200 customers, running varying versions from 9.6 to 11.9.
Definitely echo that these problems (and others) can hit you way before you get to 100 devs. We were running into the pains mentioned in this article (which admittedly is a more general crritique of SQL databases, and lands on MySQL over Postgres) at more like 10 developers.
It absolutely isn’t that hard to run into the pitfalls of SQL databases at relatively small scale, especially if you’re using them for OLTP workloads in services where uptime/response times matter.
all these downsides seem to be relevant to really huge, perfomance-sensitive projects with enormous loads. The kind of projects where you likely have over a hundred of developers
A number of these issues affect “normal” users of PostgreSQL as well:
Replication is something you may want even on smaller use cases.
“Wasted” space from the “update is really delete + insert”-paradigm can be a problem even on fairly small use cases (i.e. tens of millions of rows). It can make some simple operations rather painful.
Lack of query hints is pretty annoying, especially for smaller users who don’t have a dedicated DBA with a Ph.D. in the PostgreSQL query planner. It’s also a massive help in development; want to try a new index? Now it’s a drop index, create index, wait, analyse`, wait some more, run some queries, discover that didn’t do what you expected, drop the index, create a different one, wait, etc. It’s very time-consuming and much of the time is spent waiting.
Nice example: “complaining” that it is hard to tune it for a million concurrent connections.
Haven’t read it to the end yet, almost hoping to see an ending like “of course I’m happy to have a free DB that gets me in trouble for a million concurrent connections instead of breaking my bank at 1000 connections or when somebody touches advanced debugging like Oracle or ”
The title is really click-bait. Most points are “This sucks, but there is a workaround/solution for it”.
It sounds that the main point of the Author is “but for beginners this can lead to a lot of pain”. But a lot of topics are not beginner-friendly, like replication and performance optimisation.
In interviews, I sometimes ask candidates to name a few pros of waterfall methodology and cons of agile. If you can’t name pros and cons of both, you don’t really know either.
I don’t agree. The Postgres replication story just isn’t there. The “workarounds” described boil down to “implement replication yourself for your use case.”
Whatever else you may say about MySQL synchronous replication, it does work. And it’s one of the main reasons large deployments like Google and Facebook used MySQL over Postgres. Vitess relies on MySQL row level replication, for example.
Postgres is better than MySQL in almost every way—XID wrap around and query hints notwithstanding—but synchronous replication is a hard requirement for lots of use cases.
I’ve been using Postgres for about 18 years and have never hit the XID wraparound problem. Autovacuum seems to take care of it, at least for my usage scenarios.
I have hit a problem in one database with MVCC bloat. The solution was to set up partitioning, which was an ordeal at the time (about 10 years ago) but has worked without pain since then.
I think replication creates more problems than it solves, so I have never used it. Where we need hot standbys at my work, they are populated via a process that takes messages and loads them into the database, so we just have the hot standby loading the same way as the active machine without involving any Postgres magic, and it works fine. We haven’t lost data during failover for the same reason.
I have run into connection problems, but they haven’t been bad enough to deploy pgBouncer. Older Java apps like we have are pretty leaky about connections; I sometimes manually kill processes that are “idle in transaction.” In theory, every connection could use up to work_mem memory, but in practice most don’t and you can increase the connection count higher than the memory arithmetic would suggest reasonably safely.
Several of these issues are related to space utilization, which is not something Postgres is great about, but it hasn’t been a significant problem for me, and I have several 2 TB databases.
I definitely hate the major version upgrade dance, but it’s way better now than it used to be.
Since most psql requires pgbouncer, why psql itself didn’t just include it in the default install. I remember I had to rewrite certain part of code to keep concurrent connections at sane level.
I have experience from working at a cloud provider with a PostgreSQL offering. This article feels pretty close to my own feelings on it. Some of these shortcomings may seem esoteric unless you are at high scale and that is true. If the regular day-to-day workloads were painful then it wouldn’t be a very mature database.
I might have put in something about “idle in transaction” but it probably isn’t top 10 material since 9.6 added idle_in_transaction_session_timeout. Even though the default is off you might only notice it from the effect on vacuuming.
Although PostgreSQL is mature that doesn’t mean it can handle every scenario flawlessly. Plus there are lots of real-world scenarios that could be considered out of scope like client, network, host or storage failures. Apps that have optimised their PostgreSQL usage still have to keep on top of scenarios like the ones in this article as they scale.
Please please please consider using a managed database service for your critical systems. There is so much involved in keeping data safe and available. Unless you have staff committed to your database tech, and enough of them build out a replication strategy - and enough to staff an on-call rotation - it’s just not worth the risk.
I love Postgresql, and I’m really grateful for posts like this to balance my opinion and present a well-argumented counter-argument. However, when I read posts like this I mention a pattern: all these downsides seem to be relevant to really huge, perfomance-sensitive projects with enormous loads. The kind of projects where you likely have over a hundred of developers, and probably should move away from one master relational database to micro-services and message queues as a backbone of your architecture.
What I’m saying, this kind of criticism just highlights the fact that Postgresql is probably the best choice if you’re smaller than that, both in terms of load and team size.
I almost hit the XID wraparound (I was 2-3 days from an outage; it was bad enough that AWS emailed me) when my company only had 4-5 devs and 100 customers. And I’ve hit connections-related performance issues at least four or five times through the last five years, at relatively modest scale (<100 application servers, in an overwhelmingly read-heavy workload). This affected us as recently as yesterday, as we are bringing a Timescale DB (which is a Postgres extension) into production and we are faced with the prospect of tiered pgBouncers or breaking the Timescale access into its own microservice.
I love Postgres but these are real and they can hit earlier than you expect.
I love Postgres, and I was strongly considering moving to it…but for our use case, it simply requires too much care and feeding. We ship databases to customers in appliance form, meaning all the maintenance has to be invisible and automatic and Postgres simply isn’t there.
Having worked for a company that did that and been responsible for Postgres tuning, I say it can be done. In nearly 15 years of shipping out a postgresql db as part of an appliance, I have not seen any of these problems.
Edit: Except for long upgrade times. That one is a PITA.
JADP.
I’d love to hear about your experience if you have time.
Also, I’m drawing a blank on “JADP”…
Just Another Data Point?
Just A Data Point. A weird acronym I picked up from old-timey online fora like USENET, and The Well.
I probably can’t say too much more about my experience postgres tuning as.
sorry, just know that these are really rare problems if you’re dealing with the limited scale inherent in incorporating postgresql as part of an appliance. Most of them deal with syndication, or ginormous tables. They’re web-scale problems, not appliance scale problems
what do you use instead?
What are you planning on using instead?
I do this as well. There’s definitely a discovery period but I’ve reached the point that I almost never have to check anything on the database side for roughly 200 customers, running varying versions from 9.6 to 11.9.
Definitely echo that these problems (and others) can hit you way before you get to 100 devs. We were running into the pains mentioned in this article (which admittedly is a more general crritique of SQL databases, and lands on MySQL over Postgres) at more like 10 developers.
It absolutely isn’t that hard to run into the pitfalls of SQL databases at relatively small scale, especially if you’re using them for OLTP workloads in services where uptime/response times matter.
A number of these issues affect “normal” users of PostgreSQL as well:
Replication is something you may want even on smaller use cases.
“Wasted” space from the “update is really delete + insert”-paradigm can be a problem even on fairly small use cases (i.e. tens of millions of rows). It can make some simple operations rather painful.
Lack of query hints is pretty annoying, especially for smaller users who don’t have a dedicated DBA with a Ph.D. in the PostgreSQL query planner. It’s also a massive help in development; want to try a new index? Now it’s a
drop index
,create index
, wait, analyse`, wait some more, run some queries, discover that didn’t do what you expected, drop the index, create a different one, wait, etc. It’s very time-consuming and much of the time is spent waiting.Nice example: “complaining” that it is hard to tune it for a million concurrent connections.
Haven’t read it to the end yet, almost hoping to see an ending like “of course I’m happy to have a free DB that gets me in trouble for a million concurrent connections instead of breaking my bank at 1000 connections or when somebody touches advanced debugging like Oracle or ”
FYI you should have read it through to the end, as the article does end on that note.
The title is really click-bait. Most points are “This sucks, but there is a workaround/solution for it”.
It sounds that the main point of the Author is “but for beginners this can lead to a lot of pain”. But a lot of topics are not beginner-friendly, like replication and performance optimisation.
Experts talking about the flaws in their tools is important.
In fact the ability to readily list a bunch of problems with a thing is a massive tell that someone is a real expert in that thing.
In interviews, I sometimes ask candidates to name a few pros of waterfall methodology and cons of agile. If you can’t name pros and cons of both, you don’t really know either.
I don’t agree. The Postgres replication story just isn’t there. The “workarounds” described boil down to “implement replication yourself for your use case.”
Whatever else you may say about MySQL synchronous replication, it does work. And it’s one of the main reasons large deployments like Google and Facebook used MySQL over Postgres. Vitess relies on MySQL row level replication, for example.
Postgres is better than MySQL in almost every way—XID wrap around and query hints notwithstanding—but synchronous replication is a hard requirement for lots of use cases.
I’ve been using Postgres for about 18 years and have never hit the XID wraparound problem. Autovacuum seems to take care of it, at least for my usage scenarios.
I have hit a problem in one database with MVCC bloat. The solution was to set up partitioning, which was an ordeal at the time (about 10 years ago) but has worked without pain since then.
I think replication creates more problems than it solves, so I have never used it. Where we need hot standbys at my work, they are populated via a process that takes messages and loads them into the database, so we just have the hot standby loading the same way as the active machine without involving any Postgres magic, and it works fine. We haven’t lost data during failover for the same reason.
I have run into connection problems, but they haven’t been bad enough to deploy pgBouncer. Older Java apps like we have are pretty leaky about connections; I sometimes manually kill processes that are “idle in transaction.” In theory, every connection could use up to
work_mem
memory, but in practice most don’t and you can increase the connection count higher than the memory arithmetic would suggest reasonably safely.Several of these issues are related to space utilization, which is not something Postgres is great about, but it hasn’t been a significant problem for me, and I have several 2 TB databases.
I definitely hate the major version upgrade dance, but it’s way better now than it used to be.
For the query planning issue, is there a reason I shouldn’t be using https://github.com/ossc-db/pg_hint_plan to work around that problem?
Since most psql requires pgbouncer, why psql itself didn’t just include it in the default install. I remember I had to rewrite certain part of code to keep concurrent connections at sane level.
I have experience from working at a cloud provider with a PostgreSQL offering. This article feels pretty close to my own feelings on it. Some of these shortcomings may seem esoteric unless you are at high scale and that is true. If the regular day-to-day workloads were painful then it wouldn’t be a very mature database.
I might have put in something about “idle in transaction” but it probably isn’t top 10 material since 9.6 added
idle_in_transaction_session_timeout
. Even though the default is off you might only notice it from the effect on vacuuming.Although PostgreSQL is mature that doesn’t mean it can handle every scenario flawlessly. Plus there are lots of real-world scenarios that could be considered out of scope like client, network, host or storage failures. Apps that have optimised their PostgreSQL usage still have to keep on top of scenarios like the ones in this article as they scale.
Please please please consider using a managed database service for your critical systems. There is so much involved in keeping data safe and available. Unless you have staff committed to your database tech, and enough of them build out a replication strategy - and enough to staff an on-call rotation - it’s just not worth the risk.
PostgreSQL is free “as in libertarian”, not free “as in beer”.