More interesting are the Unicode characters that are not combining characters, but compose in some way in practice anyway. The flag emoji, for example, don’t actually exist in Unicode. The Unicode Consortium didn’t want to be constantly amending a list of national flags as countries popped in and out of existence, so instead they cheated. They added a set of 26 regional indicator symbols, one for each letter of the English alphabet, and to encode a country’s flag you write its two-letter ISO country code with those symbols. So the Canadian flag, ??, is actually the two characters U+1F1E8 REGIONAL INDICATOR SYMBOL LETTER C and U+1F1E6 REGIONAL INDICATOR SYMBOL LETTER A. But if you put a bogus combination together, you probably won’t get a flag glyph; you’ll get stand-ins for the characters instead. (For example, ??.) So the “length” of a pair of these characters depends both on the display font (which may not support all flags), and on the current geopolitical state of the world. How’s that for depending on global mutable state?
I did a bunch of programming tutoring / mentoring sessions this last summer with a few college kids.
I spent an entire 2+ hour session explaining Unicode, character encoding, collation, etc.
My favorite way of engaging them with the content was by talking about all the security vulnerabilities all the inherent complexity exposed, and all of the real-world fall-out of misinterpreting or mis-implementing the various standards. Not only did they leave shocked at the complexity of getting “Iñtërnâtiônàlizætiøn” to be parsed and displayed, it was a great way to tie in data-structure design and algorithm implementation. Everybody had a blast during that session ?
Is it just Windows 7, or are like 60% of those characters not supported by the font?