1. 22

  2. 7

    Excellent post, I love the thoroughness! Units especially seem really useful in catching subtle correctness issues. But I do have one minor gripe with one thing you said:

    For instance, a person’s given name can be long, but it’s never a gigabyte long. It’s sensible to pick an upper limit of fifty Unicode characters.

    50 characters is not a sensible maximum for names. This has been a highly obnoxious problem for people with long names, simply because programmers make faulty assumptions like this this one. And these people aren’t even terribly obscure, for example the singer Israel Ka’ano’i Kamakawiwoʻole, who most of you will know (by name or not) for his famous rendition of “Over The Rainbow”. His name, aside from being 30 characters long, includes multiple apostrophes. More restrictive assumptions could easily clobber his name. It’s not even hard to imagine “conventional” names that come even closer to this limit, like Jacqueline Alexandra Cunningham-Patterson, 41 chars. And best not make a single middle name field, lest you encounter people with more than one, like George Herbert Walker Bush. Or worse, Avul Pakir Jainilabdeen Abdul Kalam Manakkaya. At 45 chars, the former president of India is dangerously close to a constraint violation in your database.

    When it comes to things like this, it’s better to just set a limit for the safety of your software. A gigabyte is unreasonable, but 64k probably won’t hurt you. I doubt anyone has a 65,536 character name, or a 16,384 character name of astral characters. Instead of a hard constraint, I would log long names as warnings and manually check if those cases are caused by bugs.

    1. 3

      Having a an unhyphenated two-word last name has broken systems in a mostly-amusing variety of ways. And non-technically, humans simply fail to alphabetize properly, or even consistently.

      1. 1

        The Spanish also have a naming system prone to long names.


        A German minister of some nobility had the full name:

        Karl-Theodor Maria Nikolaus Johann Jacob Philipp Franz Joseph Sylvester Buhl-Freiherr von und zu Guttenberg (108 chars)

        A little fun thing aside: when he was chosen as a minister, someone ninja-edited the Wikipedia to add a name, which lead to newspapers misreporting it :).

        To the defense of the blog post, they speak about the “given name”, which would be “Karl-Theodor” in that case.

        1. 1

          My favourite so far is Alexander Siddig, known to governments as Ṣiddīq aṭ-Ṭāhir al-Fāḍil aṣ-Ṣiddīq ʿAbd ar-Raḥman Muḥammad ʾAḥmad ʿAbd al-Karīm al-Mahdī.

        2. 3

          Slightly off-topic moan (who, me?): I for one am sick to death of people insisting that RDBMSes don’t work—of course they don’t fare well when an application’s every request-response cycle involves making half a dozen SELECT * queries on denormalized data stored in crude data types, parsed by client-side regexes that may destroy the fabric of reality.

          Posts like these remind us that, if you’re going to use a huge complex technology, you should put in enough effort to determine whether you’re incompetent before you judge it. Hopefully we’ll be seeing more like this: perhaps something on NOTIFY/LISTEN? {dark muttering about message queues}

          1. 2

            I had a coworker whose reaction to our (admittedly very shitty) Oracle DB was to suggest we go all NoSQL on it (this was in 2006 or so).

            Once the company actually shelled out the $$$ for someone who knew Oracle things got much better.

          2. 1

            Love the post. As a Rails dev, it makes me painfully aware about how little I know about databases. One thing I noticed:

            Time is within business hours. CHECK ('8:00 am'::time <= VALUE AND VALUE <= '5:00 pm'::time)

            Time assumptions sound dangerous here.

            1. 3

              Time is the worst. I work hard to only involve myself in applications that can safely assume everyone lives in London.

              1. 1

                Time is definitely awful, but I’ve found if you offload all time processing into PG, and use the PG date/time tools it’s not abysmal.