I hadn’t seen the additional tricks that Apple did with indirection. I wonder if they get much performance. In GNUstep, we actually implemented the tiny string optimisation quite a few years before Apple back in 2011. We only stored 8-character ASCII strings though, which hits most path components and dictionary keys. In the 2.0 ABI, I also modified clang to generate these for string literals, which saves a surprising amount of binary size.
Equality comparison is one of the big wins for the tiny string encoding. When two things are tiny strings, equality comparison is straight pointer comparison. When you use tiny strings as dictionary keys, you get some nice wins there. Apple’s tricks work well if they can guarantee that all of the strings in the smaller character set will always be in that form, but that unfortunately precludes using them for string literals (unless something rewrites the literals, though that means you need to load them from a writeable page and can’t materialise them directly in the instruction stream).
One of the most interesting things for me about this work was that it made -UTF8String much more expensive. For normal ASCII NSStrings, that method just retains and autoreleases self and returns a pointer to the internal buffer. With the tiny string work, it had to allocate a new autoreleased NSData to hold the buffer. A surprising amount of code has -UTF8String on its fast path and it took quite a bit of work to fix all of that code.
The thing I’m really curious about is how the short-string table is computed. Presumably Apple can dump all of the binaries on the app store and find the most common string literals (and possibly even collect telemetry from NSString for dynamically created strings), but that won’t necessarily give the ideal result for any given program. It would be really interesting to build this table dynamically as a program runs.