1. 42
  1.  

  2. 8

    yet in many respects, it is the most modern database management system there is

    It’s not though. No disrespect to PostgreSQL, but it just isn’t. In the world of free and open source databases it’s quite advanced, but commercial databases blow it out of the water.

    PostgreSQL shines by providing high quality implementations of relatively modest features, not highly advanced state of the art database tech. And it really does have loads of useful features, the author has only touched on a small fraction of them. Almost all those features exist in some other system. But not necessarily one single neatly integrated system.

    PostgreSQL isn’t great because it’s the most advanced database, it’s great because if you don’t need anything state of the art or extremely specialized, you can just use PostgreSQL for everything and it’ll do a solid job.

    1. 13

      but commercial databases blow it out of the water

      Can you provide some specific examples?

      1. 16

        Oracle has RAC, which is a basic install step for any Oracle DBA. Most Postgres users can’t implement something similar, and those that can appreciate it’s a significant undertaking that will lock you into a specific workflow so get it right.

        Oracle and MS-SQL also have clustered indexes. Not what Postgres has, but where updates are clustered as well. Getting Pg to perform sensibly in this situation is so painful, it’s worth spending a few grand to simply not worry about it.

        Ever run Postgres on a machine with over 100 cores? It’s not much faster than 2 cores without a lot of planning and partitioning, and even then, it’s got nothing on Oracle and MS-SQL: Open checkbook and it’s faster might sound like a lose, but programmers and sysadmins cost money too! Having them research how to get your “free” database to perform like a proper database isn’t cost effective for a lot of people.

        How about big tables. Try to update just one column, and Postgres still copies the whole row. Madness. This turns something that’s got to be a 100GB of IO into 10s of TBs of IO. Restructuring this into separate partitions would’ve been the smart thing to do if you’d remembered to do it a few months ago, but this is a surprise coming from commercial databases which haven’t had this problem for twenty years. Seriously! And don’t even try to VACUUM anything.

        MS-SQL also has some really great tools. Visual Studio actually understands the database, and its role in development and release. You can point it at two tables and it can build ALTER statements for you and help script up migrations that you can package up. Your autocomplete can recognise what version you’re pointing at. And so on.

        …and so on, and so on…

        1. 3

          Thanks for the detailed response. Not everyone has money to throw at a “real” enterprise DB solution, but (having never worked with Oracle and having only administered small MSSQL setups) I did wonder what some of the specific benefits that make a DBA’s life easier were.

          Of course, lots of the open source tools used for web development and such these days seem to prefer Postgres (and sometimes MySQL), and developers like Postgres’ APIs. With postgres-compatible databases like EnterpriseDB and redshift out there, my guess is we’ll see a Postgres-compatible Oracle offering at some point.

          1. 7

            Not everyone has money to throw at a “real” enterprise DB solution

            I work for a commercial database company, so I expect I see a lot more company-databases than you and most other crustaceans: Most companies have a strong preference to rely on an expert who will give them a fixed cost (even if it’s “money”) to implement their database, instead of trying to hire and build a team to do it open-source. Because it’s cheaper. Usually a lot cheaper.

            Part of the reason why: An expert can give them an SLA and has PI insurance, and the solution generally includes all costs. Building a engineering+sysadmin team is a big unknown for every company, and they usually need some kind of business analyst too (often a contractor anyway; more £££) to get the right schemas figured out.

            Professional opinion: Business logic may actually be some of the least logical stuff in the world.

            lots of the open source tools used for web development and such these days seem to prefer Postgres

            This is true, and if you’re building an application, I’d say Postgres wins big. Optimising queries for dbmail’s postgres queries was hands down much easier than any other database (including commercial ones!).

            But databases are used for a lot more than just applications, and companies who use databases don’t always (or even often) build all (or even much) of the software that interacts with the database. This should not be surprising.

            With postgres-compatible databases like EnterpriseDB and redshift out there, my guess is we’ll see a Postgres-compatible Oracle offering at some point.

            I’m not sure I disagree, but I don’t think this is a good thing. EnterpriseDB isn’t Postgres. Neither is redshift. Queries that work fine in a local Pg installation run like shit in redshift, and queries that are built for EnterpriseDB won’t work at all if you ever try and leave. These kinds of “hybrid open source” offerings are an anathema, often sold below a sustainable price (and much less than what a proper expert would charge), leaving uncertainty in the SLA, and with none of the benefits of owning your own stack that doing it on plain postgres would give you. I just don’t see the point.

            1. 3

              Professional opinion: Business logic may actually be some of the least logical stuff in the world.

              No kidding. Nice summary also.

              1. 0

                Queries that work fine in a local Pg installation run like shit in redshift

                Not necessarily true, when building your redshift schema you optimize for certain queries (like your old pg queries).

            2. 4

              And yet the cost of putting your data into a proprietary database format is enough to make people find other solutions when limitations are reached.

              Don’t forget great database conversion stories like WI Circuit Courts system or Yandex where the conversion to Postgres from proprietary databases saved millions of dollars and improved performance…

              1. 2

                Links to those stories?

                1. 1

                  That Yandex can implement clickhouse doesn’t mean everyone else can (or should). How many $100k developers do they employ to save a few $10k database cores?

                  1. 2

                    ClickHouse has nothing to do with Postgres, it’s a custom column oriented database for analytics. Yandex Mail actually migrated to Postgres. Just Postgres.

                2. 2

                  You’re right about RAC but over last couple of major releases Postgres has gotten alot better about using multiple cores and modifying big tables. Maybe not at the Oracle level yet bit its catching up quickly in my opinion.

                  1. 3

                    Not Oracle-related, but a friend of mine tried to replace a disk-based kdb+ with Postgres, and it was something like 1000x slower. This isn’t even a RAC situation, this is one kdb+ core, versus a 32-core server with Postgresql on it (no failover even!).

                    Postgres is getting better. It may even be closing the gap. But gosh, what a gap…

                    1. 1

                      Not to be that guy, but when tossing around claims of 1000x, please back that up with actual data/blogpost or something..

                      1. 6

                        You remember Mark’s benchmarks.

                        kdb doing 0.051sec what postgres was taking 152sec to complete.

                        1000x is nothing.

                        Nobody should be surprised by that. It just means you’re asking the computer to do the wrong thing.

                        Btw, starting a sentence with “not to be that guy” means you’re that guy. There’s a completely normal way to express curiosity in what my friend was doing (he’s also on lobsters), or to start a conversation about why it was so much easier to get right in kdb+. Both could be interesting, but I don’t owe you anything, and you owe me an apology.

                        1. 2

                          Thanks for sharing the source, that helps in understanding.

                          That’s a benchmark comparing a server grade setup vs essentially laptop grade hardware (quad-core i5), running the default configuration right out of the sample file from the Git repo, with a query that reads a single small column out of a very wide dataset without using an index. I don’t doubt these numbers, but they aren’t terribly exciting/relevant to compare.

                          Also, there was no disrespect intended, not being a native english speaker I may have come off clumsy though.

                          1. 1

                            kdb doing 0.051sec what postgres was taking 152sec to complete.

                            That benchmarks summary points to https://tech.marksblogg.com/billion-nyc-taxi-rides-postgresql.html which was testing first a pre-9.6 master and then a PG 9.5 with cstore_fdw. Seems to me that neither was fair and I’d like to do it myself, but I don’t have the resources.

                            1. 1

                              If you think a substantially different disk layout of Pg, and/or substantially different queries would be more appropriate, I think I’d find that interesting.

                              I wouldn’t like to see a tuning exercise including a post-query exercise looking for the best indexes to install for these queries though: The real world rarely has an opportunity to do that outside of applications (i.e. Enterprise).

                        2. 1

                          Isn’t kdb+ really good at stuff that postgres (and other RDBMS) is bad at? So not that surprising.

                          1. 1

                            Sort of? Kdb+ isn’t a big program, and most of what it does is the sort of thing you’d do in C anyway (if you liked writing databases in C): Got some tall skinny table? Try mmaping as much as possible. That’s basically what kdb does.

                            What was surprising was just how difficult it was to get that in Pg. I think we expected, with more cores and more disks it’d be fast enough? But this was pretty demoralising! I think the fantasy was that by switching the application to Postgres it’d be possible to get access to the Pg tooling (which is much bigger than kdb!), and we massively underestimated how expensive Pg is/can be.

                            1. 3

                              Kdb+ isn’t a big program, and most of what it does is the sort of thing you’d do in C anyway (if you liked writing databases in C)

                              Well, kdb+ is columnar, which is pretty different than how most people approach naive database implementation. That makes it very good for some things, but really rough for others. Notably, columnar storage is doesn’t deal with update statements very well at all (to the degree that some columnar DBs simply don’t allow them).

                              Even on reads, though, I’ve definitely seen postgres beat it on a queries that work better on a row-based system.

                              But, yes, if your primary use cases favor a columnar approach, kdb+ will outperform vanilla postgres (as will monetdb, clickhouse, and wrappers around parquet files).

                              You can get the best of both worlds You can get decent chunks of both worlds by using either the cstore_fdw or imcs extensions to postgres.

                              1. 1

                                which is pretty different than how most people approach naive database implementation.

                                I blame foolish CS professors emphasising linked lists and binary trees.

                                If you simply count cycles, it’s exactly how you should approach database implementation.

                                Notably, columnar storage is doesn’t deal with update statements very well at all (to the degree that some columnar DBs simply don’t allow them).

                                So I haven’t done that kind of UPDATE in any production work, but I also don’t need it: Every customer always wants an audit trail which means my database builds are INSERT+some materialised view, so that’s exactly what kdb+ does. If you can build the view fast enough, you don’t need UPDATE.

                                Even on reads, though, I’ve definitely seen postgres beat it on a queries that work better on a row-based system.

                                If I have data that I need horizontal grabs from, I arrange it that way in memory. I don’t make my life harder by putting it on the disk in the wrong shape, and if I do run into an application like that, I don’t think gosh using postgres would really speed this part up.

                    2. 3

                      Spanner provides globally consistent transactions even across multiple data centers.

                      Disclosure: I work for Google. I am speaking only for myself in this matter and my views do not represent the views of Google. I have tried my best to make this description factually accurate. It’s a short description because doing that is hard. The disclosure is long because disclaimers are easier to write than useful information is. ;)

                      1. 2

                        @geocar covered most of what I wanted to say. I also have worked for a commercial database company, and same as @geocar I expect I have seen a lot more database use cases deployed at various companies.

                        The opinions stated here are my own, not those of my former or current company.

                        To put it bluntly, if you’re building a Rails app, PostgreSQL is a solid choice. But if you’ve just bought a petabyte of PCIe SSDs for your 2000 core rack of servers, you might want to buy a commercial database that’s a bit more heavy duty.

                        I worked at MemSQL, and nearly every deployment I worked with would have murdered PostgreSQL on performance requirements alone. Compared to PostgreSQL, MemSQL has more advanced query planning, query execution, replication, data storage, and so on and so forth. It has state of the art features like Pipelines. It has crucial-at-scale features like Workload Profiling. MemSQL’s competitors obviously have their own distinguishing features and qualities that make them worth money. @geocar mentioned some.

                        PostgreSQL works great at smaller scale. It has loads useful features for small scale application development. The original post talks about how Arcentry uses NOTIFY to great effect, facilitating their realtime collaboration functionality. This already tells us something about their scale: PostgreSQL uses a fairly heavyweight process-per-connection model, meaning they can’t have a huge number of concurrent connections participating in this notification layer. We can conclude Arcentry deployments using this strategy probably don’t have a massive number of concurrent users. Thus they probably don’t need a state of the art commercial database.

                        There are great counterexamples where specific applications need to scale in a very particular way, and some clever engineers made a free database work for them. One of my favorites is Expensify running 4 million queries per second on SQLite. SQLite can only perform nested loop joins using 1 index per table, making it a non-starter for applications that require any kind of sophisticated queries. But if you think about Expensify, its workload is mostly point look ups and simple joins on single indexes. Perfect for SQLite!

                        1. 1

                          But MemSQL is a distributed in-memory database? Aren’t you comparing apples and oranges?

                          I also highly recommend reading the post about Expensify usage of SQLite: it’s a great example of thinking out of the box.

                          1. 1

                            No. The author’s claims “Postgres might just be the most advanced database yet.” MemSQL is a database. If you think they’re apples and oranges different, might that be because MemSQL is substantially more advanced? And I used MemSQL as one example of a commercial database. For a more apples-to-apples comparison, I also think MSSQL more advanced than PostgreSQL, which geocar covered.

                            And MemSQL’s in-memory rowstore serves the same purpose as PostgreSQL’s native storage format. It stores rows. It’s persistent. It’s transactional. It’s indexed. It does all the same things PostgreSQL does.

                            And MemSQL isn’t only in-memory, it also has an advanced on-disk column store.

                    3. 4

                      The pedant in me cries: Age doesn’t equal wisdom, or advancedness.

                      That said, postgres is pretty heckin great.

                      1. 4

                        What I got from the title was that even though it isn’t brand new it’s still advanced.

                        New !== Cutting Edge

                        1. 6

                          Google docs proves this point well.

                          I miss WordPerfect.

                      2. 2

                        As a sysadmin I prefer MySQL since the high availability is way easier to setup. Either master master or master slave, and if more than two nodes, galera.

                        Postgres is a box of Lego, some tools (barman, repmgr, pgpool, pgbouncer) and wal replication, but nothing cohesive or easy “just works”.

                        Cannot say anything about its capabilities as a database but my developers like it, and as long as they’re happy I’m happy.

                        1. 9

                          As a DBA, I prefer postgres, because postgres hasn’t ever lost data for me, while mysql has. As a dev, I prefer postgres, since it has far better support for advanced datatypes and advanced querying.

                          But, yes, MySQL is still slightly easier to set up out of the box, especially in HA setups.

                          1. 0

                            As someone who cares about the consistency and availability of their data, I wouldn’t wish MySQL on a dog I don’t like.