1. 7
  1.  

  2. 10

    “Never” excludes a useful practice of embedding the ID’s type.

    Given the UUID 946f7674-2693-4e3b-b44c-92fb88a20f3e, what can you say about it? If it shows up in logs, or in an unexpected part of the UI, where would you begin trying to figure out what it represents? With a bare UUID, you’ve lost the metadata of what entity space that ID references.

    Now, let’s apply two transformations – base62 it for readability, and prefix it with {type}_. You might end up with user_4W5nmi8kGhBq9rdYbLtRvC. This is shorter than the raw UUID, double-clickable, and you know what it identifies.

    1. 2

      I’m not convinced converting it to base 62 makes it more readable. Depending on your native language, it could be argued it makes it less readable.

      In fact, when it comes to UUIDs, I don’t like the fact that hexadecimal values A through F leak through at all. Hexadecimal is a low level hardware detail leaking through to the user layer.

      One thing all of humanity seems to have in common are the digits 0 through 9. Those, and only those, should be used for unique IDs - in my opinion.

      1. 5

        You’re missing jmillikin’s primary point, which was the utility of embedding type information in the ID.

        As for base64, the point isn’t really readability, rather compactness. Base64 encodes 6 bits per character instead of 4, so it’s ⅓ smaller [maybe; I suck at mental arithmetic.] Decimal is even worse, only 3-point-something bits per character. This matters a lot in a database that might store billions of these and which is constantly shuffling them around when doing queries.

        Your stance on decimal makes zero sense to me. This isn’t the user layer; database IDs aren’t meant to be human-readable. And a UUID is a uninterpreted string of 128 random bits. It’s not a number, and certainly not one humans would ever want to do arithmetic on. Of course it can be interpreted as a number, but so what?

        (There are legitimate concerns about the ability to transcribe an ID by voice, if it’s something a human might see, like a bank account. That’s why there are encodings like base58 that avoid some easily-confused characters like “l”, without sacrificing much density.)

        1. 2

          You’re missing jmillikin’s primary point, which was the utility of embedding type information in the ID.

          I didn’t miss that point, I just didn’t address it. I think the idea has some merit.

          Your stance on decimal makes zero sense to me. This isn’t the user layer; database IDs aren’t meant to be human-readable.

          Wait a minute, the GP said:

          Now, let’s apply two transformations – base62 it for readability

          Surely “readability” refers to humans reading it, right?

          1. 1

            Agreed, “readability” is a strange thing to say about a random bit-string.

        2. 2

          I think the “readability” might come from it just being more dense, and having the “user_” prefix. A higher base is a great way to make things denser.

          Also, digits are definitely not common to all of humanity - see a few common examples other than Arabic numerals here: https://en.wikipedia.org/wiki/Numeral_system. You might argue that Arabic numerals are “the most common”, but that also holds for the English alphabet.

          1. 2

            I think the “readability” might come from it just being more dense, and having the “user_” prefix. A higher base is a great way to make things denser.

            If density is the goal, applying base 62 seems appropriate. However, the GP said:

            Now, let’s apply two transformations – base62 it for readability

            I do not think using base 62 makes the blob more readable for a lot of humanity. I think sticking to decimal digits would achieve the goal of being more readable.

            Also, digits are definitely not common to all of humanity - see a few common examples other than Arabic numerals here: https://en.wikipedia.org/wiki/Numeral_system. You might argue that Arabic numerals are “the most common”, but that also holds for the English alphabet.

            Decimal digits are pervasive virtually everywhere; the English alphabet much less so.

            I have co-workers that have families in countries around the world, and they can attest to the fact that hexadecimal digits A-F leaking into to the user layer makes their lives harder. It’s not clear to me why UI people think exposing everyone to hexadecimal digits is acceptable when decimal digits are far more friendly to a much wider range of humanity.

            1. 1

              Well, it’s not going to become “readable” in the sense that you’re going to be able to derive meaning from the blob. I considered the base62 more readable because the density makes it so that it interrupts the flow of a sentence less.

              I don’t personally particularly see a difference in reading English alphabet and Arabic digits, and my native language is Hindi, which uses neither. I suppose if you had a string of digits equivalent length and just as short, like user_3290483902, I would consider that equally “readable”, though of course you have a smaller space of valid user IDs this way. Probably not a problem in practice?

              1. 2

                Well, it’s not going to become “readable” in the sense that you’re going to be able to derive meaning from the blob. I considered the base62 more readable because the density makes it so that it interrupts the flow of a sentence less.

                I have to sheepishly admit I probably got too hung up on the word “readability” there and was applying the wrong kind of context to the word.

                I agree that if it’s just an opaque blob of characters not meant to be “directly read,” being shorter does make it more readable because it consumes less display real estate.

                Mea culpa.

                This probably spills over from my annoyance when I occasionally have to type in a bunch of hexadecimal digits (for whatever reason). A numpad (keyboard) or a nice big decimal digit pad (phone) would make it so easy!

                And my (older) eyes also hate, “Wait, is that an 8? Or a B? Is that a zero, or an O?” etc. I’d rather just type in the additional decimal digits! </rant> ;-)

                1. 1

                  Ah, yeah, that’s fair. Having to key hex in like that could definitely get annoying. 😅 Not looking forward to being able to see your last point, but I suppose I’ll get there eventually. :’)

      2. 3

        Putting personally identifiable information (PII) in identifiers seem like a recipe for a world of hurt when GDPR requests starts coming in. You’re likely to have these identifiers in all sorts of logs that might otherwise not have been in scope for scrubbing. (HTTP access logs, for example.)

        The problem becomes even worse if you use these ids as keys in a db and you can’t simply delete the whole record. Maybe you have to clear the PII portion but retain the rest. I imagine this could get really tricky if there are foreign key constraints.

        Moreover UUIDs, when stored in a suitably typed UUID column, take up only 16 bytes (128 bits) and with the new UUID formats coming out should index quite well. This matters if you have a lot of data and lots of indices.

        Monotonically increasing integers are even shorter, but it can be hard to generate unique and unguessable ids quickly in a distributed system.

        You may want to base6X encode ids, and maybe give them type-prefixes, for the double-clickyness when presenting a slice of the data to users, but I wouldn’t store them in the DB that way.

        I guess that’s a long-winded way to say I agree with using UUIDs. But perhaps not for the same reason as the poster.

        1. 2

          But another more subtle was the consequence for the project code itself, since some of the information now did not need a query to the database, then came the REGEXES, to parse and extract this meta information from the IDs. This generates a lot implicit and embedded logic, what if the ID needs to be changed? then entire parts of the system didn’t work properly anymore.

          I do think UUIDs are great, but this argument just sounds like regex-phobia. “Don’t pack information into IDs because regexes are scary and the format might eventually change” doesn’t make sense at all. Examples of IDs which contain information which are widely used:

          1. 1

            Still to this day, my favorite pattern for UUIDs is to serialize a small amount of relevant metadata (enrollment year, name, student id, in this case) and encrypt / pad that byte string.

            It looks like any-ole uuid at first glance, you can pass it around safely, and you can retrieve that data quickly without DB round trips or regex. Great for request validation too!

            It still suffers from the ‘embedded logic’ argument, but I feel like changing uuids is a no-no anyway.

            1. 3

              A name is something that can change. The ID is probably something you want to stay the same.

              1. 3

                If any of that changes for any reason, your ID has to change if you rely on it anywhere.

                I can see fat-fingered entry of all of those fields causing you heartache.

                1. 1

                  Be careful about trusting data like year read from this, because encrypted data can still be manipulated. In UUID you most likely won’t have enough bits left to add a proper HMAC.