1. 59

  2. 16

    Excellent post, and I respect the author’s experience and reasons for avoiding Postgres.

    From my own experience, I would avoid MySQL like the plague, not because of its bad history but because of the footguns it has and by how often I’ve been burned by them. I’ve seen to many things to count, and I’m sure experienced users would be able to avoid most of them instinctively as well, but I just can’t seem to get my head around things like silently truncating data, allowing for invalid or nonsensical data (like empty strings in non-NULLable enum fields when an unknown value gets inserted, dates that don’t exist like 0000-00-00, accepting invalid UTF-8 bytes in UTF-8 columns, not accepting the full set of UTF-8 in the default UTF-8 column type and so on and so forth), not allowing indexes on text fields without a size limitation etc.

    The list is endless, and for me personally, not being able to trust your database to keep your data safe is much worse than a few performance pitfalls which are generally solvable with enough time. When you discover your data is mangled, sometimes it’s too late and you can’t recover that data at all anymore.

    1. 9

      Related good thread from a month ago: Why is SQL so successful?

      1. 7

        The ask for pragma to give full control seems at odds with the later ask for declarative schema changes. If we had declarative schema changes, someone would be writing a post about how the database does the changes right 99% of the time and fails one some special use case 1% of the time.

        I do agree with the SQL syntax point. I would prefer a more structured approach to queries.

        1. 3

          They address that to a point though, by pointing out that the scheme is much better suited to a declarative style than data modification. If you had the choice between imperative schema with declarative data mutation, or declarative schema with imperative data mutation, you could make a pretty solid argument that the latter is preferable.

        2. 7

          Always nice to read a differing perspective. I can’t disagree with the points in the article but I kind of like most of the things the author dislikes https://benjiweber.co.uk/blog/2021/03/21/thinking-in-questions-with-sql/

          1. 5

            This mostly jives with my own opinions. Using MySQL more as a mostly-typed KV store while having humans whose responsibility it is to prevent engineers from expressing non-performant SQL queries can be a really beautiful thing - but it requires a balancing act and access to rare talent to scale the data team to work with the engineers who need to be prevented from releasing unacceptable queries. There will be a tendency for the engineers to try to deploy alternative data platforms that allow them to interact less with the “grumpy database gatekeepers” and at some point you’ll have a system failure because an engineer pushed code that used the KEYS * command on a redis instance that worked great in test but caused everything to freeze in prod. Having self-serve data platforms with misuse-resistant API’s is the dream. Memcached, FoundationDB and SQLite all stand out to me as things that tend to push users to avoid a few footguns over time compared to more fully featured systems. Even though SQLite supports a decently expressive subset of SQL, it seems like engineers kind of still think of it like a small under-powered thing that shouldn’t be overworked compared to a beefy MySQL system that engineers perceive to be a behemoth that will chew through anything.

            1. 4

              It can make sense to use a SQL database even with a trivial (id int primary key, data blob) key-value schema, just to get access to the performance and durability of the underlying storage layer.

              (Couchbase architect here!) That’s exactly what I did in designing Couchbase Lite. It didn’t make sense to port the existing server storage engine to mobile, since it’s tuned for very different workloads. SQLite turns out to be a very efficient, durable k/v engine.

              1. 3

                Excellent post. So basically the author wants a relational database with a query language more suited to programmatically building queries (like MongoDB which is mentioned), table definitions based on Protocol Buffers (this is how Google F1 works - they have a few great papers about it), query planner hints to avoid performance cliffs, and automated zero-downtime migrations (again Google F1 has an interesting strategy). I’d say this is an ambitious program but nothing than can’t be done with an evolution of MySQL and PostgreSQL. I pretty much with all the concerns, but I don’t think they are bad enough to make me abandon relational databases and use a KV or document database instead.

                1. 3

                  (w.r.t. migrations)

                  This is the approach taken, in some fashion, by most ORMs, which is part of why it feels so clearly right to me.

                  This is why I prefer Ecto to SQLAlchemy or whatever–I like that explicit handling of migrations. I like being very verbose and detailed in how I interact with all of my business data.

                  1. 1

                    SQLAlchemy does not support migrations at all. Alembic is a separate tool built upon SQLAlchemy that implements schema versioning and migration. It allows you the choice of the declarative or imperative approach. If you want to simply make changes to the declarative data model, Alembic can introspect the running database and generate the migration script. Or if you prefer you can write the imperative migration script with explicit SQL.

                    1. 1

                      Ah, thank you for the correction!

                      I got them mixed up–I don’t really use often Python if I can avoid it.

                  2. 2

                    there is a postgres extension to force usage of indices (mysql has one too AFAIR). Even though mysql is battle tested, i wouldn’t say it’s the most well-engineered. Its codebase is known to be whacky compared to postgres. for everything else, I totally agree! I’d love to see truely declarative schema, but I’m afraid it would have to embeed a lot of decisions concerning migrations (performance, ordering, atomicity, …). cool post!

                    1. 1

                      I’ve always liked the way “modern” / “nosql” systems seem to so often offer their underlying storage API as a first-class option as well, with it being well-documented and stable enough to target it instead of whatever document model they’ve built atop it.

                      1. 2

                        Which systems are that? I thought people used dedicated key-value stores if they didn’t need the document model. Like Redis or LevelDB.

                        1. 1

                          Couchbase Server is oriented toward JSON documents, but lets you store arbitrary binary data in a document if you want; you just lose some of the higher-level features like querying. At that point you have pretty much a raw k/v store.