1. 13

A follow-up to a previous post, and it’s lobsters discussion.

  1.  

  2. 2

    I’ve been in that kind of circumstance with MySQL, but I don’t see why you would tell it the column was “latin-1”[1]? If what you want is to store 100 bytes, make the column VARBINARY(100); you can stick UTF-8 bytes in there, you avoid issues with backup/restore, your indices work correctly, and as long as your connection charset is set correctly you can still do string operations. No?

    [1] Fun fact: latin-1 is actually a 7-bit encoding, but fortunately(?) when MySQL says “LATIN1” it actually means “cp1252 with 8 codepoints swapped around at random”

    1. 1

      We also supported other databases without such limits on indices, and didn’t want to use different column types here and there. The issue with index size only showed up some time later, as I recall, since most databases defaulted to latin1 for some time. The problem had also been “solved” before I arrived on the scene, and I only had to write the double decode layer.

      I’m not familar with varbinary. Does it otherwise allow string like operations? I think in some cases we were concatenating columns? (We had other issues with other databases being very particular about what queries were allowed to operate on blob types.)

      1. 2

        I implemented the same switch to VARBINARY that @lmm mentioned.

        The only impact I recall compared to regular VARCHAR is for case-insensitive operations such as LIKE you need to cast the column to utf8mb4, so instead of:

        col_name LIKE '%foo%'
        

        you’d write:

        CONVERT(col_name USING `utf8mb4`) LIKE '%foo%'
        

        The same goes for any sorting logic that you would want to rely on sensible collation, replacing col_name with CONVERT(...) as above.

        1. 1

          ok, yeah, we definitely had some like and sorting queries. useful to know, god forbid i ever find myself doing this again.