Interesting. I believe the reason for not doing this originally was that it baked the object representation into the binary format, which means that you can’t modify the implementation. I wonder to what extent Apple’s LLVM IR requirements for the App Store have relaxed that concern: it’s easy for them to rewrite the object representation in the IR for older things if they want to change them.
Even baking constant strings into the binary has caused problems in the past. The original NeXT implementation represented constant strings as an object with an isa pointer, a length, and a pointer to the character data. This caused several problems. First, there was no space for a hash, and constant stings are most commonly used as keys in a dictionary. This led to folks initialising globals with constant strings and then at load time replacing them with a dynamically allocated NSString instance to get the hash baked into the object, rather than calculated on every call. Then unicode came along and so NSConstantString ended up carrying UTF-8 data and now there was only space for the number of bytes or number of characters in the string, but not both (and you want both: the OpenStep APIs want the number of characters, a load of functions you’ll call on the data want the length of the buffer).
This is interesting. Which instances in ObjC cache their hash?
I don’t know in Cocoa. In GNUstep, all of the concrete classes that implement the immutable NSString interface do. I think most of the NSMutableString implementations do as well, though they lazily calculate it the first time -hash is called.
It doesn’t look as if the CoreFoundation constant string type contains a hash, which is very odd to me. Profiling on GNUstep showed that a lot of time was spent in recalculating string hashes. That said, a lot of that was on server workloads (particularly high-volume SMS processing and SS7 handling), so it may not be such a problem on desktop workloads.
I think the GNUstep strategy makes a lot of sense. Thank you David.