I like how clear yet technically accurate and demonstrative this article is. I’ve read a few explanations of Unicode, UTF-8 vs UTF-16BE vs UTF-16LE, and this article just clicked very easily for me. Also great use of binary to highlight how the encoding works.
I’ve been very fortunate that I’ve never professionally had to think about Unicode. A while back my wife worked as an executive assistant and was doing something manual with Qualtrics (survey software) that I thought I could automate for her. It turns out it exported CSV files as UTF-16LE and Python’s csv module couldn’t parse the file because of the leading BOM byte, and other weirdness in the files. I blindly hacked together my own CSV parser from scratch to work around it. But I recall not really understanding the point of the BOM character not how to detect it, and now I get it!