Not sure I get the point here. The division of vocabulary into ācontentā and āgrammarā words is non-scientific, to say the least; the assertion that this is what makes a language would be surprising to linguists. Then again, has anybody ever argued that emoli is a language?
I think the Unicode consortium made a huge mistake giving in to adding emojis to Unicode. Itās a bottomless pit, very politically charged and definitely ambiguous (compare for example the different emoji-styles across operating systems/fonts).
It severely complicates most of the Unicode algorithms (grapheme cluster detection, word/sentence/line-segmentation, etc.) and, compared to dead and alive languages, feels very short-lived, like a fashion.
How will emojis be seen in 50 years? I can already feel the second-hand-embarassment.
Every thread about emoji has a āUnicode shouldnāt have added themā comment (or several), and I feel like I then always step in to remind those commenters that basically every single chat/message system humans have built in the internet era has reinvented emoticons in some form or another, whether purely textual (ā:-)ā and ā:/ā and friends) or custom graphics, or a mix of text abbreviations that get replaced by graphics.
This suggests that they are a non-negotiable part of how humans conduct written communications in this era. Which means Unicode must find a way to capture them, by the nature of Unicode itself.
This suggests that they are a non-negotiable part of how humans conduct written communications in this era. Which means Unicode must find a way to capture them, by the nature of Unicode itself.
You might as well use the same argument to claim that Unicode should capture all words, too.
Doesnāt it try? Morally, is there any difference between a code sequence of letters representing a word, and a code sequence of letters and combining characters that come together to create a single glyph?
This is solved well with ligatures at the font level.
Solving it at the font level has the additional benefit of not blocking the addition of new emoji on a standards body, as well as allowing graceful degradation to character sequences that anyone, including those on older software, can view.
Ligatures canāt and donāt solve all the traditional emoticons, let alone emoji.
Why not? This approach is more or less used for flags, where flag emoji are ā for political reasons like āTWā ā ligatures of country codes in a special unicode range. If you happen to put āFlag{T}ā beside āFlag{W}ā, you may get the letters āTWā, or you may get a flag that enrages China, depending on your font.
If you want to avoid ASCII ābar:(foo)ā from being interpreted as a smiley emoji , maybe unicode could standardize non-rendering āemoji bracketsā as a way of hinting to a font system that it could render a sequence of characters as an emoji ligature.
Thereās no need to restrict emoji to the slow pace of the unicode consortium, when dropping in a new font will get you the new hotness, especially since using text sequences will render legibly for everyone not using that font.
This is win/win. It makes things more usable for those that dislike emoji, and it makes more emoji available to those that like emoji.
Because fonts cannot change emoticons into images? They have different meaning, so font ligature processing, which is essentially replaceAll(characters/glyphs/whatever, graphic), does not work.
No one can adopt a system font that magically turns one set of characters into another system. Because it canāt be adopted as the system font, then no apps get emoji. A person canāt simply change the default, for the same reason the system couldnāt: you made ligatures that potentially change meaning of bytes.
As far a font is concerned, there is no difference between :) in āsee you :)ā ā(: Iāve seen this comment format somewhere :)ā but you ligature āsolutionā makes the latter nonsense.
Emoticons also have characters that have no equivalent emoticon, either due to number of characters, or the lack of color.
Now, you may not like emoji, but arguing āwe didnāt need it beforeā is pretty weak sauce: we didnāt have it. The goal of text is to communicate, and it is clear that a vast proportion of all people alive use emojis in their communication. So computers should be able to facilitate that communication rather than requiring workarounds.
The use of semagrams in alphabetic languages is nothing new - even hieroglyphics used semagrams.
Because fonts cannot change emoticons into images?
Thatās⦠just untrue.
You ignored the entire paragraph where I pointed out that flags ALREADY work this way. Then, you ignored the second paragraph which addresses the problem you mentioned in the third paragraph, where something like an RTL marker could mark emoji. Then you invented me saying āwe didnāt need it beforeā.
In fact, you seem to have ignored everything I wrote.
It would be nice if you responded to what I said, rather than what you imagined I said.
The simple counterpoint to this is to imagine the Unicode Consortium declaring that all the writing systems and characters which ever will be needed have been invented already ā anything new will just be a variant or a ligature of something existing!
That would be dangerously incorrect, and would not work at all.
So, look. I get that some people really really really really donāt like emoji and wish they didnāt exist. But they do exist and they are a perfectly valid form of written communication and they are not sufficiently captured by ligatures or other attempts to layer on top of ASCII emoticons, any more than an early-2000s forum would have been happy with just the ASCII forms. For decades weāve been used to a richer set of these, and it is right and proper for Unicode to include them. Complaints about them, to me, feel like ranting that kids these days say ālolā instead of typing out the fully-punctuated-and-capitalized sentence āThat is funny!ā
and they are not sufficiently captured by ligatures or other attempts to layer on top of ASCII emoticons, any more than an early-2000s forum would have been happy with just the ASCII forms.
So far in this thread, Iāve seen this asserted ā but I donāt see why flags are appropriately captured by ligatures, while emoticons are not. What is the technical difference that allows one to work while the other does not?
Again, Iām arguing that for emoji lovers, ligatures are BETTER and MORE FUNCTIONAL than encoding emoji individually into unicode. That this would be an improvement in availability and usability, not a regression.
We already have messaging programs ignoring the emoji range and adding their own custom :emoji: sequences because Unicode moves too slowly for them. We can wait years for unicode to standardize animated party parrots, or we can add :party_parrot: as text that gets interpreted by our application. Slack, and most others programs, chose the latter. Not to mention adding stickers ā which arguably need the same position in Unicode as emoji.
Unicodeās charter is to standardize existing practice. Why not let Unicode standardize the way that emoji ranges are worked around in practice, today, this with standardized āemoji bracketsā that allow clients to mark any text sequence as an emoji ligature? This matches the way things actually work, and fills the need for custom emoji (and stickers) that the Unicode consortium is not serving.
I offer the following counter proposal: since you seem to think itās at least possible and perhaps even easy, I challenge you to pick, say, 20 code points at random from among the emoji and come up with distinct, memorable ASCII sequences you think would suffice to be ligatureād into those emoji. I think that this will help you to understand why I donāt think ājust ligature themā is going to work.
This is solved well with ligatures at the font level.
Demonstrably false by the number of systems that screw up trying to auto detect smileys from colons and parentheses. š is unambiguous semantically; ā:)ā is not.
I actually feel the opposite. ā:)ā is unambiguously a smiling face, and is mostly uniform in appearance across system UI fonts. The icon āšā is rendered differently depending on not only the operating system but also the specific app being used. The recipient of my message may see a completely different image then I intend for them to see. Even worse, the meaning and tone of my past emoji messages can completely change whenever Apple or Google or Telegram decides to redesign their emoji.
Too many apps have no way to disable auto-replacement of ascii faces.
I was going to mock your post by pointing out all of the other stuff in Unicode which is āpolitically chargedā, from Tibetan to Han unification to the Hangul do-over to that time that a single character was added just for Japanās government. But this is a grand understatement of exactly how political and pervasive the Consortiumās work is. Peruse the list of versions of Unicode and youāll see that we already have a ābottomless pitā of natural writing systems to catalogue.
I think that the most inaccurate part of your claim is that emoji are ālike a fashionā. Ideograms are millennia old and have been continuously used for communicating mathematics.
I think the Unicode consortium made a huge mistake giving in to adding emojis to Unicode. Itās a bottomless pit, very politically charged and definitely ambiguous (compare for example the different emoji-styles across operating systems/fonts).
Also, any kind of character system is politically charged an interesting read here is: https://www.hastingsresearch.com/net/04-unicode-limitations.shtml (I do not agree with the points here and history has proven the author wrong, but itās a good specimen to look at political unicode arguments pre-Emoji)
It severely complicates most of the Unicode algorithms (grapheme cluster detection, word/sentence/line-segmentation, etc.)
If there were no emojis in Unicode, but everything else remained, would any of these things really be simpler? The impression I get is there are corner cases across the languages Unicode covers for all of the complexity, independent of emoji; emoji just exposes them to westerners more.
Emoji isnāt a language, itās a writing system. And as much as I despise it, the fact that it is at least to some extend universally comprehensible regardless of oneās language background might well mean itās here to stay. I wouldnāt be surprised if it or some evolution of it becomes the dominant writing system in 100 years or so.
universally comprehensible regardless of oneās language
Itās not as much as you think. Itās culturally influenced, and even age-influenced (like vocabulary).
Examples are the infamous āface with tears of joyā which is understood as a sad face by many elder. Or the āskullā emoji which is understood as ādead laughingā by the youth while you just understand ādeathā, because they frown upon āface with tears of joyā. Or the āeggplantā that does mean a vegetable to many people but a penis to others.
Emojis are really useful for text communication, when you want to convey feelings or emotions as a side channel to text. Besides, theyāre single-handedly forcing anglo-saxon programmers to fix their unicode handling programs š
šš©āšš«š©āš
ššš¤£
Not sure I get the point here. The division of vocabulary into ācontentā and āgrammarā words is non-scientific, to say the least; the assertion that this is what makes a language would be surprising to linguists. Then again, has anybody ever argued that emoli is a language?
I agree, but the content (lexical) and grammar (function) word distinction is sometimes used by people in the domain of semantics.
I think the Unicode consortium made a huge mistake giving in to adding emojis to Unicode. Itās a bottomless pit, very politically charged and definitely ambiguous (compare for example the different emoji-styles across operating systems/fonts).
It severely complicates most of the Unicode algorithms (grapheme cluster detection, word/sentence/line-segmentation, etc.) and, compared to dead and alive languages, feels very short-lived, like a fashion.
How will emojis be seen in 50 years? I can already feel the second-hand-embarassment.
It looks like people were already using emoji, and Unicode had to add them for compatibility. https://unicode.org/emoji/principles.html
Every thread about emoji has a āUnicode shouldnāt have added themā comment (or several), and I feel like I then always step in to remind those commenters that basically every single chat/message system humans have built in the internet era has reinvented emoticons in some form or another, whether purely textual (ā:-)ā and ā:/ā and friends) or custom graphics, or a mix of text abbreviations that get replaced by graphics.
This suggests that they are a non-negotiable part of how humans conduct written communications in this era. Which means Unicode must find a way to capture them, by the nature of Unicode itself.
You might as well use the same argument to claim that Unicode should capture all words, too.
Doesnāt it try? Morally, is there any difference between a code sequence of letters representing a word, and a code sequence of letters and combining characters that come together to create a single glyph?
This is solved well with ligatures at the font level.
Solving it at the font level has the additional benefit of not blocking the addition of new emoji on a standards body, as well as allowing graceful degradation to character sequences that anyone, including those on older software, can view.
Ligatures canāt and donāt solve all the traditional emoticons, let alone emoji.
Emoji are a part of written communication, no matter how much someone might personally dislike them, and as such belong in Unicode.
Why not? This approach is more or less used for flags, where flag emoji are ā for political reasons like āTWā ā ligatures of country codes in a special unicode range. If you happen to put āFlag{T}ā beside āFlag{W}ā, you may get the letters āTWā, or you may get a flag that enrages China, depending on your font.
If you want to avoid ASCII ābar:(foo)ā from being interpreted as a smiley emoji , maybe unicode could standardize non-rendering āemoji bracketsā as a way of hinting to a font system that it could render a sequence of characters as an emoji ligature.
Thereās no need to restrict emoji to the slow pace of the unicode consortium, when dropping in a new font will get you the new hotness, especially since using text sequences will render legibly for everyone not using that font.
This is win/win. It makes things more usable for those that dislike emoji, and it makes more emoji available to those that like emoji.
Because fonts cannot change emoticons into images? They have different meaning, so font ligature processing, which is essentially replaceAll(characters/glyphs/whatever, graphic), does not work.
No one can adopt a system font that magically turns one set of characters into another system. Because it canāt be adopted as the system font, then no apps get emoji. A person canāt simply change the default, for the same reason the system couldnāt: you made ligatures that potentially change meaning of bytes.
As far a font is concerned, there is no difference between :) in āsee you :)ā ā(: Iāve seen this comment format somewhere :)ā but you ligature āsolutionā makes the latter nonsense.
Emoticons also have characters that have no equivalent emoticon, either due to number of characters, or the lack of color.
Now, you may not like emoji, but arguing āwe didnāt need it beforeā is pretty weak sauce: we didnāt have it. The goal of text is to communicate, and it is clear that a vast proportion of all people alive use emojis in their communication. So computers should be able to facilitate that communication rather than requiring workarounds.
The use of semagrams in alphabetic languages is nothing new - even hieroglyphics used semagrams.
Thatās⦠just untrue.
You ignored the entire paragraph where I pointed out that flags ALREADY work this way. Then, you ignored the second paragraph which addresses the problem you mentioned in the third paragraph, where something like an RTL marker could mark emoji. Then you invented me saying āwe didnāt need it beforeā.
In fact, you seem to have ignored everything I wrote.
It would be nice if you responded to what I said, rather than what you imagined I said.
The simple counterpoint to this is to imagine the Unicode Consortium declaring that all the writing systems and characters which ever will be needed have been invented already ā anything new will just be a variant or a ligature of something existing!
That would be dangerously incorrect, and would not work at all.
So, look. I get that some people really really really really donāt like emoji and wish they didnāt exist. But they do exist and they are a perfectly valid form of written communication and they are not sufficiently captured by ligatures or other attempts to layer on top of ASCII emoticons, any more than an early-2000s forum would have been happy with just the ASCII forms. For decades weāve been used to a richer set of these, and it is right and proper for Unicode to include them. Complaints about them, to me, feel like ranting that kids these days say ālolā instead of typing out the fully-punctuated-and-capitalized sentence āThat is funny!ā
So far in this thread, Iāve seen this asserted ā but I donāt see why flags are appropriately captured by ligatures, while emoticons are not. What is the technical difference that allows one to work while the other does not?
Again, Iām arguing that for emoji lovers, ligatures are BETTER and MORE FUNCTIONAL than encoding emoji individually into unicode. That this would be an improvement in availability and usability, not a regression.
We already have messaging programs ignoring the emoji range and adding their own custom
:emoji:
sequences because Unicode moves too slowly for them. We can wait years for unicode to standardize animated party parrots, or we can add :party_parrot: as text that gets interpreted by our application. Slack, and most others programs, chose the latter. Not to mention adding stickers ā which arguably need the same position in Unicode as emoji.Unicodeās charter is to standardize existing practice. Why not let Unicode standardize the way that emoji ranges are worked around in practice, today, this with standardized āemoji bracketsā that allow clients to mark any text sequence as an emoji ligature? This matches the way things actually work, and fills the need for custom emoji (and stickers) that the Unicode consortium is not serving.
I offer the following counter proposal: since you seem to think itās at least possible and perhaps even easy, I challenge you to pick, say, 20 code points at random from among the emoji and come up with distinct, memorable ASCII sequences you think would suffice to be ligatureād into those emoji. I think that this will help you to understand why I donāt think ājust ligature themā is going to work.
Demonstrably false by the number of systems that screw up trying to auto detect smileys from colons and parentheses. š is unambiguous semantically; ā:)ā is not.
I actually feel the opposite. ā:)ā is unambiguously a smiling face, and is mostly uniform in appearance across system UI fonts. The icon āšā is rendered differently depending on not only the operating system but also the specific app being used. The recipient of my message may see a completely different image then I intend for them to see. Even worse, the meaning and tone of my past emoji messages can completely change whenever Apple or Google or Telegram decides to redesign their emoji.
Too many apps have no way to disable auto-replacement of ascii faces.
I was going to mock your post by pointing out all of the other stuff in Unicode which is āpolitically chargedā, from Tibetan to Han unification to the Hangul do-over to that time that a single character was added just for Japanās government. But this is a grand understatement of exactly how political and pervasive the Consortiumās work is. Peruse the list of versions of Unicode and youāll see that we already have a ābottomless pitā of natural writing systems to catalogue.
I think that the most inaccurate part of your claim is that emoji are ālike a fashionā. Ideograms are millennia old and have been continuously used for communicating mathematics.
This applies to other planes in Unicode, due to https://en.wikipedia.org/wiki/Han_unification
Also, any kind of character system is politically charged an interesting read here is: https://www.hastingsresearch.com/net/04-unicode-limitations.shtml (I do not agree with the points here and history has proven the author wrong, but itās a good specimen to look at political unicode arguments pre-Emoji)
If there were no emojis in Unicode, but everything else remained, would any of these things really be simpler? The impression I get is there are corner cases across the languages Unicode covers for all of the complexity, independent of emoji; emoji just exposes them to westerners more.
In chapter 5 of Because Internet, linguist Gretchen McCulloch argues that emoji are analogous to gestures, not a language.
The constructed language toki pona has an emoji orthography: https://sites.google.com/view/sitelenemoji
So does Lojban (Reddit; click the image links). In the Lojban case, the encoding is almost completely inscrutable.
š£āļø(toki a)
šš§ āš§ ā©š (sina sona ala sona e ni?)
ššā¶ļøš£ā (lipu ni li toki ala)
ššā¶ļøš£šš»ā¹š (lipu ni li toki pona pi lipu)
šā¶ļøš£ā (lipu li toki ala)
šā¶ļøš§š£ā©š£ (lipu li ken toki e toki)
šā¶ļøš£ā (lipu li toki ala)
Emoji isnāt a language, itās a writing system. And as much as I despise it, the fact that it is at least to some extend universally comprehensible regardless of oneās language background might well mean itās here to stay. I wouldnāt be surprised if it or some evolution of it becomes the dominant writing system in 100 years or so.
Itās not as much as you think. Itās culturally influenced, and even age-influenced (like vocabulary). Examples are the infamous āface with tears of joyā which is understood as a sad face by many elder. Or the āskullā emoji which is understood as ādead laughingā by the youth while you just understand ādeathā, because they frown upon āface with tears of joyā. Or the āeggplantā that does mean a vegetable to many people but a penis to others.
agreed. emoji should not be in unicode. itās prescriptive rather than descriptive.
How do you mean itās prescriptive?
Emojis are really useful for text communication, when you want to convey feelings or emotions as a side channel to text. Besides, theyāre single-handedly forcing anglo-saxon programmers to fix their unicode handling programs š
I direct you to my reply to the other commenter.