1. 23
  1. 5

    Is the problem just that an upgrade requires downtime? Because I’ve never had an issue, just take down the old cluster, run upgrade, bring up the new cluster. Debian basically walks me through it

    1. 5

      My issue is that the upgrade requires RTFM: my database works, I upgrade the OS or packages, and now my database doesn’t work, and I have to figure out what commands to run to fix it. In my personal projects I don’t have time for this.

      In a professional setting I’ve never seen Postgres upgraded. Projects are born and die with the same version.

      1. 10

        In a professional setting I’ve never seen Postgres upgraded. Projects are born and die with the same version.

        That’s odd, in my career we’ve typically been eager to upgrade, because almost every release offers performance benefits which are always welcome, and the occasional new feature which would be really nice to have.

        1. 1

          This is obviously elitist, but I think that if you’re writing on this site, you’re probably in the top 5% (or higher) of the global sysadmin population. But then again, Postgres is rather good for its long maintenance periods for older major versions, so using one of those is usually just fine – at least if you keep up with the minor versions at least.

          1. 6

            I think that if you’re writing on this site, you’re probably in the top 5% (or higher) of the global sysadmin population.

            Even if that were true (I’m not even a sysadmin - I’m a backend developer), that should go for @kornel too. I’m responding to his comment which he also wrote on this site. After some consideration, I think it’s quite possible that the type of projects we’ve worked on are wildly different, or the teams/company’s we’ve been on are of a different culture.

            Most projects I’ve worked on are productivity systems for businesses, where some downtime can usually be coordinated with the customer. The project I’m currently working on is a public system where high availability is so important that it can literally save lives. But that’s also why it’s designed to be fully distributed, so taking down a node for upgrade is “trivial”; when we bring it back up it’ll sync state with its peers. In theory, we can also just completely replace a node with a new one which can then get its state from the other peers.

            1. 2

              Lack of upgrades is “if it ain’t broke, don’t fix it”. New Postgres versions are cool, but upgrade is a non-zero risk. If there are no performance issues, no features that are really required (not merely nice), then there’s no business case for an upgrade.

              I’ve also worked in one place where this hilarious exchange took place:

              – Should we be worried about PHP 5.6 getting EOL-ed?
              – Nah, we use 5.2.

        2. 3

          Well, after the upgrade the database still works, it just still runs on the old postgres version until you run the three commands to switch it.

          1. 1

            is that so? I thought postgres would refuse to start until you run pg_ugrade.

            1. 1

              The new cluster won’t start until the data gets moved of course, but at least on Debian the old cluster stays running no problem until you do that

              1. 1

                oh right, we’re in the thread that talks about debian upgrades. Still just a pg_upgrade thingy (upgrade strategy n°2 in the post), but now I see what you mean. that’s not specific to debian. pg_upgrade requires to have both versions at hand, so you can decide to run the old binary on the old data dir.

          2. 2

            In a professional environment I’d strongly recommend not to blindly upgrade an OS. The point of having major versions is that they might break stuff.

            I have never worked in a setting where they stuck with the same version. I just saw things like the managed versions of GCP for example doing a horrible job, because they require huge downtimes on major upgrades by design. ([RANT] Yet another area where clouds hold us back sigh[/RANT])

            1. 4

              The nice thing about Postgres is that the developers maintain an official repositories of RPM and Debian packages that are well-tested against existing OS releases. This makes it easy to install different Postgres versions side-by-side on the same OS.

              1. 2

                Yes, and a lot of others simply support that in a different way.

                On top of it I think there’s many situations where your data isn’t that big and you can comfortably do dump and restore during your maintenance window and don’t have to worry about anything really. It’s not like you upgrade your DB every day. I’m way more happy about good official docs, than having to pray that the upgrade doesn’t break stuff and going through issue trackers, big breaking changes lists, etc.

            2. 2

              In a professional setting I’ve never seen Postgres upgraded. Projects are born and die with the same version.

              I call this law of the welded monolith. This has also been my experience and not only with Postgres. Other complex systems also.

            3. 1

              Yeah. When I used to run long term apps downtime was the enemy because what if a potential customer clicked a banner ad and got a 500?

            4. 3

              From the “Support policy” section.

              This becomes a problem for some users: by the time the product runs out of support, the amount of changes is so big that a major upgrade is no longer feasible, or easy to do.

              I don’t really understand what this is supposed to mean. It’s not like you’re anywhere close to likely to end up with breaking changes with Postgres, even if have a big setup and switch from let’s say 9.x to 14, so over a decade and you could probably go further. You get more features and better performance, but good luck finding a breaking change affecting you. I maybe would recommend using dump and restore, but pg_upgrade might work as well.

              So not sure what’s the problem there. You don’t usually have such an easy time with switching from a decade old release with other services, especially not when it’s something used to deal with state/data and often has big improvements.

              1. 3

                If you don’t want to work, you can just use some hosted PostgreSQL like Amazon RDS. It handles all upgrades automatically.

                1. 2

                  After having done an upgrade recently that had a few not so usual road blocks. Migration was also OS upgrade, server switch, there were extensions like postgis, etc. involved I must say that logical replications are a really nice way to get around. I crafted a command to dump out the precise commands that need to be taken after upgrade to sync seq (these sequences btw. weren’t just +1, but functions) and it worked flawlessly, jumping ahead from 11 to 14 on about a TB of data without more than two minutes of downtime (which certainly could be decreased).

                  If one day I find the time I’d would like to make a tool that does that for you, working around the caveats if logical replication. Cause while they exist for day to day things, if your only goal is to migrate a DB this is fine for major version upgrades, where the main thing that might bite people is sequential, which are often used for IDs, but you might wanna consider not doing that, but either use an anyways better identifier or a UUID for example, which in many situations would make logical replication just fine.

                  That’s also my main criticism about such articles. If you are fine with the same limitations as other systems Postgres in many situations allows you to do something similar/identical in a superior way. With superior I mean things like still more guarantees on integrity, better performance, more functionality or more flexibility. The default configs largely are just either something standard/compatibility related or to prevent yourself from shooting yourself in the foot.

                  1. 2

                    One problem with pg_upgrade is that not all distros play nicely with it. It requires a copy of the binaries from both the new and the old versions, and package managers fully remove the old version before installing the new one.

                    On FreeBSD, the solution is to spin up a new jail with an install of the old version, then use pg_upgrade from the host. You might (will) run into library mismatch issues between the host and the jail for any non-trivial installation, and it’s a huge hassle. It’s so much easier (though depending on the size of the database, potentially cost prohibitive in terms of downtime) to use pg_dump and pg_restore.

                    1. 1

                      package managers fully remove the old version before installing the new one.

                      Sounds like a broken package setup…

                      1. 1

                        That’s definitely one way of looking at it! It doesn’t have to be the case but in practice the old version is always marked as “conflicts” with the new version, necessitating the full uninstall before the new install.

                        1. 1

                          I think the way this is worked around by the official Debian and RedHat packages by including the major version in the package name. A bit like having the ability to run both Python2 and Python3 side by side.

                          1. 1

                            It’s not the package name that conflicts (FreeBSD has separate packages for 13 and 14 because each continues to be supported with its own minor updates and because you don’t want it to invisibly upgrade to a new major just because upstream changed). The problem is things like userland commands (which version does pg_upgrade resolve to?), the system service itself (it’s called postgres for all versions), and the service-related configuration files/knobs (apart from the pg config itself). Even though it goes into its own directory (eg /var/db/pg14) the service name in rc.conf doesn’t include the version (otherwise you have to manually upgrade disparate references across the fs for each major version).

                            I’m sure there’s a nicer way, but there are enough roadblocks that simply treating them as mutually exclusive looks very appealing.

                      2. 1

                        The newer FreeBSD packages use different data directories for different versions, so should be able to support having both versions installed simultaneously (though I think they’re marked as conflicting, possibly because they provide an rc script with the same name?). I found this recently because a pkg upgrade left me with a postgres install that didn’t think it had any users or a database configured. I have a very small install, so I could do the traditional thing of downgrade the package, pg_dumpall > {temp file}, upgrade the package, and psql < {temp file}, but it’s annoying. pg_upgrade would be much nicer and I’d love to see the packages support it automatically.

                        Unfortunately, as I recall, pkg doesn’t really have a concept of major version upgrades. postgresql-server14 and postgresql-server15 are different packages and it just sees you uninstall one and install the other, so there’s no hook to run an upgrade command when you try to do this.

                      3. 2

                        Anecdata: I once decided to move my personal Nextcloud instance from MariaDB to PostgreSQL for getting acquainted to that other DB everyone seem to praise. It stopped working after I upgraded PostgreSQL to the new at the time major version. I promptly migrated back to MariaDB when I looked up why and saw what I had to do to perform the upgrade.

                        1. 7

                          MySQL/MariaDB is certainly much easier to upgrade, but the way you do that gives me the willies - basically, just replace the binaries with a newer version, then start the server with the old files. This server will happily run with the old tables and maybe even produce correct results, but they are considered “broken” until you run mysql_upgrade which then “repairs” them for you when needed.

                          1. 4

                            Postgres pg_restore does the same thing and works just fine. For the past few years the breaking on-disk changes have been to indexes and the database comes with both the old and new code for working with indexes in the server. You only get the new data structures on disk if you did one of the time-consuming updates like REINDEX, pg_dump/pg_restore or replication. Those do full rewrites of everything to perform the update which is equivalent to the mysql_upgrade rewrite.

                            1. 2

                              pg_upgrade is also very fast because depending on configuration it doesn’t do too much.

                              From the official docs:

                              Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files. If a future major release ever changes the data storage format in a way that makes the old data format unreadable, pg_upgrade will not be usable for such upgrades. (The community will attempt to avoid such situations.)

                            2. 1

                              thanks for warning..

                            3. 2

                              I would love for SREs or SWEs to be able to operate a database without a DBA. Needing to know Patroni, PGBackrest, PGBouncer, and all of the upgrade ceremony is a really steep ask. Yeah, it’s great that there are managed offerings on cloud providers that abstract all of this, but it would be even better if this beat abstraction was built into Postgres directly.

                              1. 11

                                I would love for SREs or SWEs to be able to operate a database without a DBA.

                                Uh…we can? I’ve never in my career worked anywhere that employed a DBA or even idly discussed attempting to hire one – maintaining a production PG cluster is more or less firmly just part of the “backend software engineer” or at best SRE job description, as a rule.

                                I was rather under the impression that outside of a few dusty IBM DB2 or w/e installations, “DBA” as a career was utterly and completely dead.

                                1. 3

                                  The only times I’ve worked at a company without dedicated DBA(s) are when the company was too small to afford one.

                                  1. 3

                                    I think this has much more to do with engineering culture than size. Our database management transitioned over time from backend engineers to a proper SRE team, but there are still no DBAs in the company and we definitely are more than large enough to afford one.

                                    The way in which you are getting Postgres also matters. A managed cloud service is a lot easier to upgrade than a self-hosted DB.

                                    1. 2

                                      Admittedly some of these have been quite small (early stage startups aren’t going to splurge on someone just to babysit the DB) – but still, small-mid-size companies (profitable, growing, $10s mil revenue, 20-some people in the engineering dept, certainly not incapable of hiring someone if there’d been interest in it, but there was none) and a couple truly large orgs (10k employees, both centralized and with some embedded IT depts), zero DBAs at any of them (the largest org had shitcanned all of the Oracle DBAs a few years before I started my career as part of a transition to open-source DBs – kind of set the tone for my career). All of these were self-hosting PG (and MySQL) on-prem (large orgs) or in something like Linode or (later) AWS.

                                      The reality is I struggle to see the justification for a non-code contributing role like this; while it might be nice (as a dev) to imagine that I could offload all of the responsibility for thinking about the DB to someone else, the reality is just that as a backend engineer I still need so much visibility into the DB that paying someone just to wrangle config and servers when I need to be in the loop on those kinds of changes, regardless, because they feed back into the app architecture and performance, doesn’t make sense to me.

                                      Maybe it’s just cultural, but it feels like I’d be abdicating important responsibilities (and the ability to do my job as well as I can) if I weren’t capable of tuning work_mem and temp_buffers etc to be synergistic with the kinds of data I’m working with and the queries I’m writing against it, or if I weren’t making the calls on the cluster architecture and how the rest of the app connects to it, fails over, etc.

                                      1. 2

                                        Yep, same here. I also wonder how a DBA would even be able to get anything done - on the one hand you have the backend engineers who need to optimize their queries in a way that the application still works properly (and grok all the interactions between different sessions) and on the other hand you have the sysadmins who you need to bug to get the Postgres settings changed (as with fully automated declarative deployments and such).

                                        Between the two roles (of backend dev and sysadmin), this should divide all the tasks you’d typically hire a DBA for.

                                        1. 1

                                          Also in my experience: DBAs are subject-matter experts for databases. They spend a lot of time reading application code with an eye towards either changing the database to handle the application better, or changing the application to perform better on the database.

                                          Do you really only think of DBAs as Oracle-programmed configure-monkeys?

                                          1. 1

                                            Do you really only think of DBAs as Oracle-programmed configure-monkeys?

                                            No, but at the same time, I wouldn’t call someone who’s equipped to actually change the application code a DBA, either. You can be a backend dev with a heavy DB focus, but a DBA is typically non-code-contributing.

                                            1. 1

                                              This is the sort of tiny boxes that make talking about organizations so painful.

                                              A (junior, intermediate, senior) DBA is a person developing expertise in databases. A (junior, intermediate, senior) sysadmin is developing expertise in deployment and operations. The senior folks in these specialties always end up being good in a few languages. Are they particularly focused on web technologies or UX or QA or business logic? No, those are different specializations. But it is entirely reasonable to ask your DBA to consult on programming problems and expect that the answer might be to change your program just as much as it might be to change your database.

                                              1. 1

                                                But it is entirely reasonable to ask your DBA to consult on programming problems and expect that the answer might be to change your program just as much as it might be to change your database.

                                                Really don’t know what it is you’re trying to argue with here, but I’d simply say that you’d be better off spending that salary on someone who could lead (and share, and help juniors develop) that knowledge within the dev team and make those changes directly in the codebase on a daily basis (and make those recommendations directly in Code Reviews), rather than spend it on some guru external to the developers who devs periodically go to for “change the program” answers that come from someone without the requisite skills to change the codebase directly, and therefore without an understanding of the actual constraints on the changes they’re recommending in a vacuum.

                                      2. 2

                                        I was rather under the impression that outside of a few dusty IBM DB2 or w/e installations, “DBA” as a career was utterly and completely dead

                                        Definitely not. Not all backend engineers can analyze the queries they write and their impact on the system. Nor SREs. At certain amounts of data you need someone dedicated to the system internals to make it work for you.

                                        1. 1

                                          I don’t know. I’ve never worked anywhere that operated our own postgres clusters rather than using RDS, Cloud SQL, etc.

                                          I was rather under the impression that outside of a few dusty IBM DB2 or w/e installations, “DBA” as a career was utterly and completely dead.

                                          Me too, but I thought that was because everyone else was running RDS, etc.

                                          1. 1

                                            Most of those are too new for a lot of businesses to be using them (RDS PostgreSQL only came out in 2017) - if you’ve already got a well-tuned, stable PG setup switching it over to Amazon RDS just to pay them a lot more money (not to mention the loss of control over things like maintenance windows and version upgrade timing) to be where you already are isn’t all that appealing.

                                            Those services are fine, generally, but if you let them convince you you can’t do it yourself you’re missing a lot of knowledge about the internals of your tools which, at least for a backend dev, will come back to bite you in the ass in the long run.

                                            1. 1

                                              I mean, new businesses start new projects all the time (such that one project could be on a manually-managed Postgres cluster and another on RDS), and many have moved from their own data centers to cloud providers in that span of time. But I agree with you that there are probably a lot of manually managed Postgres clusters out there too; however, it’s not evident to me that these are run by SREs or SWEs rather than DBAs or DBEs or other database specialists.

                                              1. 2

                                                A lot of shops run their own Postgres. In many cases, performance isn’t that much of a concern (think your average CMS or CRM solution that uses Postgres for its backend), and in cases where it is, often the developers or sysadmins learn on the job and make it scale (because they have to).

                                                At least in the companies where I’ve worked we made bespoke business software which always ran on our own VPSes, and maintenance and tuning of the Postgres database was a shared responsibility between the backend devs (i.e. the team I was in) and the sysadmin. So the only bit that’s “cloud” would be the plain VPS (which sometimes would run on our own bare metal as well). The actual “cloud” offerings usually are too rigid in what you can install/control (custom db extensions? hell no!) or too damned expensive compared to just running it yourself (especially for small projects).

                                                1. 2

                                                  I mean, new businesses start new projects all the time (such that one project could be on a manually-managed Postgres cluster and another on RDS),

                                                  Yeah that does happen. It also goes the opposite way; we had an old MySQL-using app on RDS but they discontinued the version it was using and testing turned up some weirdness on the next supported version. Wasn’t worth the manpower to fix so we just set it up with MySQL on a regular VM. Like I said, sometimes control over upgrade timing is more important.

                                                  But I agree with you that there are probably a lot of manually managed Postgres clusters out there too; however, it’s not evident to me that these are run by SREs or SWEs rather than DBAs or DBEs or other database specialists.

                                                  Well I’m mostly telling you this as a senior backend dev whose worked in a bunch of places where we hosted our own PG cluster as backend devs with nary a DBA in sight, friends with many other backend devs with similar responsibilities and never even heard tell of a shop that employed a DBA etc.

                                                  Maybe these places exist but I’d say there’s probably 50 “Maintain the database? That’s your job as a backend dev” places for any one “yeah we have a DBA” shops (which again, I’ve never actually heard of outside this thread).

                                                  Reality is unless your PG cluster is absolutely gargantuan it doesn’t really take much work. Genuinely not sure what they’d do all day.

                                          2. 2

                                            Yeah, it’s great that there are managed offerings on cloud providers that abstract all of this

                                            There are, but not really on the big ones. Look up their upgrade guides, they either make this complicated, require you to essentially do the same step or simply require you to dump and restore, but don’t even allow you to do that with users and such things, making it a more complicated and requiring a lot more downtime than it usually would (GCP).

                                          3. 2

                                            what if citus was used to move shards from older-version nodes to newer-version nodes? it uses logical replication in a non blocking way now. IDK, just random thoughts

                                            1. 1

                                              However PostgreSQL by itself is not aware of the cluster, and does not support this out of the box.

                                              I think that’s an exaggeration. You have multiple ways to do replication out of the box, depending on your requirements. They themselves are extremely flexible. Because of this a lot of people built solutions that fit their use cases well. This can be really nice to fit it well into your disaster recovery plans and so you don’t end up in an impasse. You can use the mentioned tools if they fit your setup, but you are in no way required to.