I thought Unicode was supposed to just be existing writing systems, but with this article, it appears that the Unicode Consortium is actually defining a new written language.
It’s more that written language is evolving in front of us as people come up with ever more use cases for emoji, and the Unicode Consortium is part of the process.
Emoji were a mistake, for many reasons. Anyone else remember when the gun emoji changed from rendering (generally) as revolvers to rayguns or squirt guns because something something violence? That was an attempt to retroactively alter people’s messages, and nobody involved should be allowed anywhere near other people’s writing ever again.
Okay, I can see the emoji thing going for the same approach as diacritics where we can have say ü written both as the ü glyph (U+00FC) or an u glyph (U+0075) followed by an umlaut ¨ (U+00A8).
So, are we having emoji normalization forms too? Because I really don’t want to be the guy tracking sorting and filtering bugs that failed to match bear + snowflake sequences when searching for polar bears.
It’s not really the same thing. Normalization happens because both the composed and decomposed versions result in, logically, the same string. There is no ‘composed’ equivalent for these sequences; the only way to represent a polar bear is to put multiple codepoints together.
No, but imagine you have a commutative case. E.G., what if snowflake+bear = bear+snowflake = polar bear? Then, what if a webpage contains snowflake+bear, but you search for bear+snowflake? You have to match the two somehow.
Good news: modifiers and JWZ sequences aren’t commutative. [Snowflake, JWZ, bear] just looks like ❄️🐻.
You mean ZWJ right?
No, no, Jamie Zawinski has been really busy lately.
Seriously. There were plenty of search results for “jwz unicode” that the “did you mean” thing did not kick in. it took me a bit to figure out the initialism was probably accidentally transposed.
You’ve correctly guessed why I transposed it too.
I, for one, am in favor of a JWZ unicode character. It would probably be disappointment/rage with technology.
The fun thing is that “are these two sequences the same emoji” becomes undecidable then :D
How can I see all the combinations that a font supports?
If alien archeologists ever unearth Unicode conversational text such as twitter feeds or something along those lines, they are going to be so confused.
It’s basically hanzi/kanji all over again. They’ll figure it out.
If folks are interested in how UTF-8 was designed, this thread is a fun read on how it was designed over dinner on a placemat! https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt