1. 31
  1.  

    1. 10

      I understand the rationale for including the original emoji (unicode wants to be a superset of existing character sets) but they should have been put in a code space reserved for backwards compatibility with bad ideas, not made such a big part of unicode.

      At this point, there’s a strong argument for a new character set that is a subset of unicode that removes all of the things that are not text. We already have mechanisms for embedding images in text. Even in the ‘90s, instant messaging systems were able to avoid sending common images by having a pre-defined set of pictures that they referenced with short identifiers. This was a solved problem before Unicode got involved and it’s made text processing an increasingly complicated mess, shoving image rendering into text pipelines for no good reason.

      The web could have defined a URL encoding scheme for emoji from an agreed set, or even a shorthand tag with graceful fallback (e.g. <emoji img="gb-flag;flag;>Union Flag</emoji>, which would render a British flag if you have an image for gb-flag, a generic flag if you don’t, have ‘Union Flag’ as the alt text or the fall back if you don’t support emoji). With the explicit description and fallback, you avoid the things like ‘I’m going to shoot you with a 🔫’ being rendered as ‘I’m going to shoot you with a {image of a gun}’ or ‘I’m going to shoot you with a {image of a water pistol}’ depending on the platform: if you didn’t have the water-pistol image, you’d fall back to the text, not show the pistol image.

      1. 24

        Like it or not, emoji are a big part of culture now. They genuinely help convey emotion in a fairly intuitive manner through text, way better than obscure tone indicators. I mean, what’s more understandable?

        “Are you going to leave me stranded? 😛”

        “Are you going to leave me stranded? [/j]”

        It definitely changes the meaning of the text. They’re here to stay, and being in Unicode means they got standardized, and it wouldn’t have happened otherwise.

        Of course there’s issue with different icon sets having different designs (like how Samsung’s 😬 was completely different from everyone else’s), but those tend to get resolved eventually.

        1. 4

          Like it or not, emoji are a big part of culture now. They genuinely help convey emotion in a fairly intuitive manner through text, way better than obscure tone indicators.

          Except they don’t. Different in groups assign different meanings to different ones. Try asking someone for an aubergine using emoji some time and see what happens.

          “Are you going to leave me stranded? 😛”

          This is culturally specific. It’s an extra set of things that people learning English need to learn. This meaning for sticking out your tongue is not even universal across European cultures. And that’s one of the top ten most common reaction emoji, once you get deeper into the hundreds of others the meaning is even further removed. How would you interpret the difference between 🐶 and 🐕 in a sentence?

          Of course there’s issue with different icon sets having different designs (like how Samsung’s 😬 was completely different from everyone else’s), but those tend to get resolved eventually.

          That’s an intrinsic property of using unicode code points. They are abstract identifiers that tell you how to find a glyph. The glyphs can be different. A Chalkboard A and a Times A are totally different pictures because that’s an intrinsic property of text. If Android has a gun and iOS has a waterpistol for their pistol emoji, that’s totally fine for characters but a problem for images.

          1. 16

            😱 Sure emojis are ambiguous . And different groups can use them differently. But that doesn’t mean they don’t convey meaning? The fact that they are so widely used should point towards them being useful no? 😉

            1. 7

              I never said that embedding images in text is not useful. I said that they are not text, do not have the properties of text, and treating them as text causes more problems than it solves.

              1. 3

                Emoji are not alphabets, syllabaries, abugidas, or abjads. But they are ideograms, which qualifies them as a written script.

                1. 1

                  I disagree. At best, they are precursors of an ideographic script. For a writing system, there has to be some kind of broad consensus on semantics and there isn’t for most emoji beyond ‘that is a picture of X’.

                  1. 3

                    For a writing system, there has to be some kind of broad consensus on semantics

                    Please describe to me the semantics of the letter “р”.

                    1. 1

                      Please describe to me the semantics of the letter “р”.

                      For alphabetic writing systems, the semantics of individual letters is defined by their use in words. The letter ‘p’ is a component in many of the words in this post and yours.

                      1. 5

                        Thank you! (That was actually U+0440 CYRILLIC SMALL LETTER ER, which only featured once in both posts, but no matter.)

                        the semantics of individual letters is defined by their use in words

                        The thing is, I disagree. “e” as a letter itself doesn’t have ‘semantics’, only the words including it do[1]. What’s the semantics of the letter “e” in “lobster”? An answer to this question isn’t even wrong. It gets worse when different writing systems interpret the same characters differently: if I write “CCP”, am I referring to the games company CCP Games? Or was I abbreviating сoветская социалистическая республика? What is the semantics of a letter you cannot even identify the system of?

                        Emoji are given meaning of different complexity by their use in a way that begins to qualify them as logographic. Most other writing systems didn’t start out this way, but that doesn’t make them necessarily more valid.

                        [1]: The claim doesn’t even hold in traditional logographic writing systems which by all rights should favor your argument. What is the semantics of the character 湯? Of the second stroke of that character? Again, answers aren’t even wrong unless you defer to the writing system to begin with, in which case there’s no argument about (in)validity.

          2. 13

            Except they don’t. Different in groups assign different meanings to different ones.

            This is true of words as well.

            1. 3

              Yes, but their original point is that we should be able to compose emojis like we compose words, as in the old days of phpBB and instant messaging. :mrgreen:

              1. 13

                Just a nit: people do compose emojis - I see sequences of emojis all the time. People send messages entirely of emojis that other people (not necessarily me) understand.

                1. 9

                  The fact that an in-group can construct a shared language using emoji that’s basically opaque to outsiders is probably a big part of their appeal.

                  1. 8

                    Yeah, and also there’s nothing wrong with that, it’s something any group can and should be able to do. I have no entitlement to be able to understand what other people say to each other (you didn’t claim that, so this isn’t an attack on you. I am just opposed to the “I don’t like / use / understand emojis how other people use them therefore they are bad” sentiment that surfaces periodically).

              2. 4

                That’s fair, I’m just nitpicking a specific point (that happens to be a pet peeve of mine).

            2. 2

              This is true of words as well.

              But not of characters and rarely true even of ideographs in languages that use them (there are exceptions but a language is not useful unless there is broad agreement on meaning). It’s not even true of most words, for the same reason: you can’t use a language for communication unless people ascribe the same meaning to words. Things like slang and jargon rarely change more than a small fraction of the common vocabulary (Clockwork Orange aside).

              1. 6

                Without getting into the philosophy of what language is, I think this skit best illustrates what I mean (as an added bonus, emoji would have resolved the ambiguities in the text).

                Note I’m not arguing for emoji to be in Unicode, I’m just nitpicking the idea that the problem with them is ambiguity.

              2. 2

                Socrates would like to have a chat with you. I won’t go through the philosophical tooth-pulling that he would have enjoyed, but suffice it to say that most people are talking past each other and that most social constructions are not well-founded.

                I suspect that this is a matter of perspective; try formalizing something the size of English Prime (or, in my case, Lojban) and see how quickly your intuitions fail.

      2. 15

        I understand the rationale for including the original emoji (unicode wants to be a superset of existing character sets) but they should have been put in a code space reserved for backwards compatibility with bad ideas, not made such a big part of unicode.

        Except emoji have been absolutely stellar for Unicode: not only are they a huge driver of adoption of unicode (and and through UTF8) because they’re actively desirable to a large proportion of the population, they’ve also been a huge driver of improvements to all sorts of useful unicode features which renderers otherwise tend to ignore despite their usefulness to the rendering of actual text, again because they’re highly desirable and platforms which did not support them got complaints. I fully credit emoji with mysql finally getting their heads out of their ass and releasing a non-broken UTF8 (in 2010 or so). That’s why said unicode consortium has been actively leveraging emoji to force support for more complex compositions.

        And the reality is there ain’t that much difference between “image rendering” and “text pipeline”. Rendering “an image” is much easier than properly rendering complex scripts like arabic, devanagari, or burmese (or Z̸̠̽a̷͍̟̱͔͛͘̚ĺ̸͎̌̄̌g̷͓͈̗̓͌̏̉o̴̢̺̹̕), even ignoring that you can use text presentation if you don’t feel like adding colors to your pileline.

        Even in the ‘90s, instant messaging systems were able to avoid sending common images by having a pre-defined set of pictures that they referenced with short identifiers.

        After all what’s better than one standard if not fifteen?

        This was a solved problem before Unicode got involved and it’s made text processing an increasingly complicated mess, shoving image rendering into text pipelines for no good reason.

        This problem was solved by adding icons in text. Dingbats are as old as printing, and the Zapf Dingbats which unicode inherited date back to the late 70s.

        The web

        Because nobody could ever want icons outside the web, obviously. As demonstrated by Lucida Icons having never existed.

      3. 10

        subset of unicode that removes all of the things that are not text

        It sounds like you disagree solidly with some of Unicode’s practices so maybe this is not so appealing, but FWIW the Unicode character properties would be very handy for defining the subset you’d like to include or exclude. Most languages seem to have a stdlib interface to them, so you could pretty easily promote an ideal of how user input like comment boxes should be sanitized and offer your ideal code for devs to pick up and reuse.

      4. 8

        new character set that is a subset of unicode that removes all of the things that are not text

        and who’d be the gatekeeper on what the text is and isn’t? What would they say about the ancient Egyptian hieroglyphs? Are they text? If yes, why, they are pictures. If no, why, they encode a language.

        It might be a shallow dissimilar, but people trying to tell others what forms of writing text are worthy of being supported by text rendering pipelines gets me going.

        If the implementation is really so problematic, treat emojis as complicated ligatures and render them black and white.

        1. 3

          and who’d be the gatekeeper on what the text is and isn’t? What would they say about the ancient Egyptian hieroglyphs? Are they text? If yes, why, they are pictures. If no, why, they encode a language.

          Hieroglyphics encode a (dead) language. There are different variations on the glyphs depending on who drew them (and what century they lived in) and so they share the property that there is a tight(ish, modulo a few thousand years of drift) coupling between an abstract hieroglyph and meaning and a loose coupling between that abstract hieroglyph and a concrete image that represents it. Recording them as text is useful for processing them because you want to extract the abstract characters and process them.

          The same is true of Chinese (though traditional vs simplified made this a bit more complex and the unicode decisions to represent Kanji and Chinese text using the same code points has complicated things somewhat): you can draw the individual characters in different ways (within certain constraints) and convey the same meaning.

          In contrast, emoji do not convey abstract meaning, they are tightly coupled to the images that are used to represent them. This was demonstrated very clearly by the pistol debacle. Apple decided that a real pistol image was bad because it was used in harassment and decided to replace the image that they rendered with a water pistol. This led to the exact same string being represented by glyphs that conveyed totally different meaning. This is because the glyph not the character encodes meaning for emoji. If you parsed the string as text, there is no possible way of extracting meaning without also knowing the font that is used.

          Since the image is the meaningful bit, not the character, we should store these things as images and use any of the hundreds of images-and-text formats that we already have.

          More pragmatically: unicode represents writing schemes. If a set of images have acquired a significant semantic meaning over time, then they may count as a writing system and so can be included. Instead, things are being added in the emoji space as new things that no one is using yet, to try to define a writing scheme (largely for marketing reasons, so that ‘100 new emoji!’ can be a bullet point on new Android or iOS releases).

          It might be a shallow dissimilar, but people trying to tell others what forms of writing text are worthy of being supported by text rendering pipelines gets me going.

          It’s not just (or even mostly) about the rendering pipelines (though it is annoying there because emoji are totally unlike anything else and have required entirely new feature to be added to font formats to support them), it’s about all of the other things that process text. A core idea of unicode is that text has meaningful semantics distinct from the glyps that they represent. Text is a serialisation of language and can be used to process that language in a somewhat abstract representation. What, aside from rendering, can you do with processing of emoji as text that is useful? Can you sort them according to the current locale meaningfully, for example (seriously, how should 🐕 and 🍆 be sorted - they’re in Unicode and so that has to be specified for every locale)? Can you translate them into a different language? Can you extract phonemes from them? Can you, in fact, do anything useful with them that you couldn’t do if you embedded them as images with alt text?

          1. 11

            Statistically, no-one cares about hieroglyphics, but lots of people care about being able to preserve emojis intact. So text rendering pipelines need to deal with emojis, which means we get proper hieroglyphics (and other Unicode) “for free”.

            Plus, being responsible for emoji gives the Unicode Consortium the sort of PR coverage most organizations spend billions to achieve. If this helps them get even more ancient writing systems implemented, it’s a net good.

          2. 2

            What, aside from rendering, can you do with processing of emoji as text that is useful?

            Today, I s/☑️/✅/g a text file.

            Can you sort them according to the current locale meaningfully, for example (seriously, how should 🐕 and 🍆 be sorted - they’re in Unicode and so that has to be specified for every locale)?

            Do I have the book for you!

            Can you translate them into a different language? Can you extract phonemes from them?

            We can’t even do that with a lot of text! 😝

      5. 8

        At this point, there’s a strong argument for a new character set that is a subset of unicode that removes all of the things that are not text.

        All that’s missing from this sentence to set off all the 🚩 🚩 🚩 is a word like “just” or “simply”.

        Others have started poking at your definition of “text”, and are correct to do so – are hieroglyphs “text”? how about ideograms? logograms? – but really the problem is that while you may feel you have a consistent rule for demarcating “text” from “images” (or any other “not text” things), standards require getting a bunch of other people to agree with your rule. And that’s going to be difficult, because any such rule will be arbitrary. Yours, for example, mostly seem to count certain very-image-like things as “text” if they’ve been around long enough (Chinese logograms, Egyptian hieroglyphics) while counting other newer ones as “not text” (emoji). So one might reasonably ask you where the line is: how old does the usage have to be in order to make the jump from “image” to “text”? And since you seem to be fixated on a requirement that emoji should always render the same on every platform, what are you going to do about all the variant letter and letter-like characters that are already in Unicode? Do we really need both U+03A9 GREEK LETTER CAPITAL OMEGA and U+2126 OHM SIGN?

        etc.

        1. 1

          So one might reasonably ask you where the line is: how old does the usage have to be in order to make the jump from “image” to “text”?

          Do they serialise language? They’re text. Emoji are not a writing system. They might be a precursor to a writing system (most ideographic writing systems started with pictures and were then formalised) but that doesn’t happen until people ascribe common meaning to them beyond ‘this is a picture of X’.

          And since you seem to be fixated on a requirement that emoji should always render the same on every platform, what are you going to do about all the variant letter and letter-like characters that are already in Unicode?

          That’s the opposite of my point. Unicode code points represent an abstraction. They are not supposed to require an exact glyph. There are some things in Unicode to allow lossless round tripping through existing character encodings that could be represented as sequences of combining diacritics. They’re not idea in a pure-Unicode world but they are essential for Unicode’s purpose: being able to represent all text in a form amenable to processing.

          For each character, there is a large space of possible glyphs that a reader will recognise. The letter A might be anything from a monospaced block character to a curved illustrated drop character from an illuminated manuscript. The picture is not closely coupled to the meaning and changing the picture within that space does not alter the semantics. Emoji do not have that property. They cause confusion when slightly different glyphs are used. Buzzfeed and similar places are full of ‘funny’ exchanges from people interpreting emoji differently, often because they see slightly different glyphs.

          The way that emoji are used assumes that the receiver of a message will see exactly the same glyph that the sender sends. That isn’t necessary for any writing system. If I send Unicode of English, Greek, Icelandic, Chinese, or ancient Egyptian, the reader’s understanding will not change if they change fonts (as long as the fonts don’t omit glyphs for characters in that space). If someone sends a Unicode message containing emoji, they don’t have that guarantee because there is no abstract semantics associated with them. I send a picture of a dog, you see a different dog, I make a reference to a feature of that dog and that feature isn’t present in your font, you are confused. Non-geeks in my acquaintance refer to them as ‘little pictures’ and think of them in the same way as embedded GIFs. Treating them as characters causes problems but does not solve any problems.

          1. 2

            Do they serialise language? They’re text. Emoji are not a writing system. They might be a precursor to a writing system (most ideographic writing systems started with pictures and were then formalised) but that doesn’t happen until people ascribe common meaning to them beyond ‘this is a picture of X’.

            I think this is going to end up being a far deeper and more complex rabbit hole than the tone of your comment anticipates. Plenty of things that are in Unicode today, and that you undoubtedly would consider to be “text”, do not hold up to this criterion.

            For example, any character that has regional/dialect/language-specific variations in pronunciation seems to be right out by your rules. So consider, say, Spanish, where in some dialects the sound of something like the “c” in “Barcelona” is /s/ and in others it’s /θ/. It seems hard to say that speakers of different dialects agree on what that character stands for.

      6. 4

        At this point, I feel like the cat is out of the bag; people are used to being able to use emoji in almost any text-entry context. Text rendering pipelines are now stuck supporting these icons. With that being the case, wouldn’t it be way more complexity to layer another parsing scheme on top of Unicode in order to represent emoji? I can see the argument that they shouldn’t have been put in there in the first place, but it doesn’t seem like it would be worth it to try to remove them now that they’re already there.

    2. 10

      Tangential reminder: flags do not represent languages, but nations.

      1. 3

        Don’t like how often people forget this. I do wish there were some sort of emojis to represent languages without nations like Malayalam or the many north american indigenous languages.

        1. 4

          The ISO codes can sometimes suffice, but the best badge seems to be the native language’s name in its native script (or both). It could be cute to a have a single codepoint symbol but the labeling with the script is clear to its readers.

        2. 1

          I haven’t looked into this, but surely the various federally recognized tribes in the US have their own flags? I take your point though that it doesn’t solve the overall problem.

      2. 1

        🏴‍☠️

        1. 1

          Aye. Speaking pirate is an exception, matey.

      3. 1

        On the other hand, an interface that displays language names raises questions about which language’s name takes precedence, and then when you settle on “display each language as it names itself” raises further questions about how to handle the fact that not all those languages read left-to-right (hello Arabic, Hebrew, etc.) or even horizontally (hello Mongolian). And that’s before getting into how to select particular dialects or variants like en-US versus en-GB, or zh-TW, etc.

        Flags or other “graphical” representations can at least push the issues out of the technical and into the political realm.

        1. 5

          Yes, lets casually marginalize swaths of folks so we can say “not our problem” /s. Wikipedia, et al. seem just fine with their language pickers, etc. without needing to piss a bunch of folks off for software makers’ naïve/insensitive decisions since the can often afford to skip the political implications as “doesn’t affect me”.

          1. 1

            Yes, lets casually marginalize swaths of folks so we can say “not our problem”

            Yes, let’s just casually sneer at people and accuse them of evil, that’ll certainly win them over to doing what you want.

            Like it or not, using a graphical language picker does avoid some rendering issues, and those rendering issues may not be under the control of the programmer writing the language picker, which can sometimes make it a least-worst-option approach.

            1. 2

              Good job. You invented an exception that might happen in some rare cases (‘rare’ because the web & other places don’t seem to have issues in practice at present), but you haven’t said anything that refutes the main point that, as a heuristic, avoid flags as the symbol for languages since it is the wrong literally the wrong symbol for the idea—and often has negative side effects on users such as being reminded they don’t fit in as immigrants/expats to their new home, leaving minority languages out as unimportant, reinforcing some dialect as more correct/prestigious, plastering a colonizer’s banner in your UI, etc.

              You could get away with ignorance, but reminding others that it in almost all cases it is the wrong thing to do means if you know but are casually doing it anyways is malice. A PSA for good help folks avoid inflaming issues in the “political realm” is a positive move as many developers come from the background of privileges that makes them accidentally miss the implications of their choices—and I don’t see the value in your devil’s advocacy.

              1. 1

                I mean, just be concrete. I’m in the US. Most of the posts on the site I manage are in English. Some are translated into Spanish. What flags should we use? The flag of the UK has nothing to do with our content (we’re US-focused). The flag of Spain doesn’t really have anything to do with most of our Spanish readership, who are immigrants or children of immigrants from Latin America. We could do the flags of the US and Mexico, but that would be annoying to people from other Latin American countries, and for that matter a lot of Mexicans grow up speaking indigenous languages, which aren’t represented by the flag and we don’t have support for.

                There are cases where flags are appropriate, like picking your Amazon store based on what items are available and shippable, and cases where it’s not, like picking the language for site chrome.

                1. 1

                  I mean think about the UK flag… does Wales speak English or Welsh? Scotland English or Scots? Northern Ireland speak Gaelic dialects? Why not use the English red+white flag? What? …you say many folks don’t recognize it & others confuse it with the Georgian flag. The UK is #6 in the world for number of English speakers. There’s no language governing body. It just doesn’t make a lot of sense when you break it down.

                  picking your Amazon store based

                  That is a geographical region, not a language so. With taxes usually tied to nations, so the flag makes sense.

    3. 6

      They do maintain a list of flag emoji and how they look

      Wow, that is the slowest loading page without javascript that I have ever seen. Is someone literally typing out the table as I request it? Curl is saying that it is downloading at 7 KiB/s.

      I was wondering if Firefox was struggling to render the long table, but it seems that it is in fact doing an excellent job rendering an incredibly slowly downloading HTML file in a streaming fashion.

      Update: It completed in 24 min 27 s averaging 5.34 KiB/s. The page has a nice footer!

    4. 3

      Unicode sprung from the discovery that after the invention of computers, letting them talk to each other was the second-biggest mistake