If the author’s here, grapheme clusters are an important concept to know on top of UTF-8:
It is important to recognize that what the user thinks of as a “character”—a basic unit of a writing system for a language—may not be just a single Unicode code point. Instead, that basic unit may be made up of multiple Unicode code points.
I actually really like that they included a Hangul example as the second example, because despite looking like Chinese logograms, the Korean alphabet actually works very similarly to unicode grapheme clusters in that each morpho-syllabic block is a bunch of jamo glued together:
If the author’s here, grapheme clusters are an important concept to know on top of UTF-8:
One example from the unicode.org page on Grapheme Cluster Boundaries is g̈, which is two unicode code points:
I actually really like that they included a Hangul example as the second example, because despite looking like Chinese logograms, the Korean alphabet actually works very similarly to unicode grapheme clusters in that each morpho-syllabic block is a bunch of jamo glued together: