1. 19
  1. 6

    My one disagreement with how this article is written is that it presents the complexity as being part of Unicode, as if Unicode could have been simpler and saved us all the trouble. The fact is, Unicode is merely representing reality: It could have been simpler, but there’s some irreducible complexity here because of how real-world natural languages work, and Unicode can’t do away with that while still allowing speakers of those languages to have text which works the way they’re accustomed to.

    1. 3

      (I am the author of the article)

      There is some truth in your argument, but also something I want to disagree with.

      I think there have been several significant issues over the history of Unicode’s design and development, and one of them was its initial Eurocentrism. The complexity in Unicode tends to grow the further you get from Western European scripts and the assumptions that are valid for those scripts.

      For example: some of the assumptions Unicode makes about case, and which require special handling to work around, come from the fact that the early design of Unicode didn’t look enough beyond Western Europe. Even just digging more deeply into Greek or the Turkic languages would have exposed the trouble with Unicode’s early assumptions about case.

      So I think at least some of the complexity in Unicode could have been avoided, or at the very least mitigated, with a less Eurocentric initial approach.

      1. 2
      2. 3

        I hope this becomes a trend, it is much more useful than ‘falsehoods programmers believe about X’!

        1. 1

          Unicode: ruining logic since 1987.

          1. 0

            Reference to how European languages use case as a signifier and not even a single mention of how the German language uses case as a signifier?

            1. 2

              I’m not sure I understand the complaint. There are many languages the author could have included a single mention of, but didn’t.

              1. 1

                Well, for one thing I did say it wasn’t exhaustive.

                For another, the topic was more specifically how Unicode models and handles case. And I pointed out that Unicode explicitly handles only a very few locale-sensitive rules for casing, with “occurs initial in a German noun” not being one of the ones it handles.