1. 9

    String.length() is to return the number of characters in the string

    There is no such thing as “number of characters in the string”. https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/

    1. 1

      https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#length()

      This does say

      Returns the length of this string. The length is equal to the number of Unicode code units in the string.

      and

      the length of the sequence of characters represented by this object.

      1. 2

        Normatively, many have used and continue to use the word “character” to refer to a single grapheme in a piece of text. The point I take @singpolyma and Manish (the person who wrote the linked blog post) to be making is that they shouldn’t.

        In short: the word “character” makes sense under ASCII, but doesn’t make sense under Unicode (whatever encoding you’re using). Unicode code points are definitely not “characters,” and grapheme clusters are only sort-of “characters.”

        More broadly, adhering to out-of-date ASCII-centric wording helps to perpetuate the exclusion of people who use languages whose scripts are non-Latin (most of the world), and can lead to incorrect thinking around the representation and manipulation of text exactly in the manner the top blog post in this discussion describes.

        1. 1

          Can you clarify the three terms for us, then: character, code point, grapheme?

    1. 3

      20 hour workweeks FTW!

      1. 2

        I would settle for even the normal 40-45 pw and be glad :–/

        1. 0

          I fully support this!