I don’t like how it claims the filesystem “lies” about its encoding. It’s like you think “this italian file server will never unzip an archive that came from Korea”. The rules (at least on unices) tend to be everything except NUL and slashes, and even if line-feed, ESC or BEL might be hard to generate, they are still valid parts of potential filenames and pretending they are not will make you program worse. A filesystem that says “nopes” to writing a file not compliant with the current locale setting would be poor indeed.
Arguably if anyone were lying it would be sys.getfilesystemencoding, which promises to tell you what the filesystem’s encoding is, even when the filesystem makes no claims to even having an encoding. But arguably it’s not lying either, per the documentation:
Return the name of the encoding used to convert Unicode filenames into system file names, or None if the system default encoding is used. The result value depends on the operating system:
Note that it only talks about “Unicode filenames into system file names” (aside: don’t miss the “filename” vs “file name” here) but says nothing about the going the other way. It can’t.
Knowing too little Python to be sure where the mistake is, I do not trust the article on the correctness or the necessity of the solution it outlines. I have seen developers in other languages be gravely mistaken about their language’s string model, and this article feels iffy in a way that is reminiscent of such misconceptions to me; however, I’ve just as well seen languages screw up their string model, so if I knew more Python I might well be nodding along.
I don’t like how it claims the filesystem “lies” about its encoding. It’s like you think “this italian file server will never unzip an archive that came from Korea”. The rules (at least on unices) tend to be everything except NUL and slashes, and even if line-feed, ESC or BEL might be hard to generate, they are still valid parts of potential filenames and pretending they are not will make you program worse. A filesystem that says “nopes” to writing a file not compliant with the current locale setting would be poor indeed.
Arguably if anyone were lying it would be
sys.getfilesystemencoding, which promises to tell you what the filesystem’s encoding is, even when the filesystem makes no claims to even having an encoding. But arguably it’s not lying either, per the documentation:Note that it only talks about “Unicode filenames into system file names” (aside: don’t miss the “filename” vs “file name” here) but says nothing about the going the other way. It can’t.
Knowing too little Python to be sure where the mistake is, I do not trust the article on the correctness or the necessity of the solution it outlines. I have seen developers in other languages be gravely mistaken about their language’s string model, and this article feels iffy in a way that is reminiscent of such misconceptions to me; however, I’ve just as well seen languages screw up their string model, so if I knew more Python I might well be nodding along.
Interesting, I have been playing with an even more restrictive approach in Rust for managing config files.