I’ve always loved this technique. There was a modification to the standard Lua interpreter that used this and it got something like a 20% speed up with no other changes.
It does remind me of the old practice of using high/low bits of pointers for metainformation, and then as operating systems and hardware start using more and more of the pointers’ bits, you run into problems.
For example, early MacOS used the high bits of pointers for various flags. This worked because the original hardware (M68k) had 32-bit pointers, but only 24 address lines, so masking off the top bits didn’t matter. As later processors in the family used more bits, this suddenly became and issue and software had to be rewritten.
The technique discussed here assumes that 64-bit pointers will fit into 48 bits. This is currently true, but all of the major 64-bit architectures reserve the right to extend how many physical bits will end up being used regardless of how many are today; IIRC, IBM z doesn’t make any distinction and states that all 64 bits may be used today.
When I was first getting into Lua circa 2005, I found out about how it represents values and felt like there had to be some way to scrunch them smaller than 16 bytes. I was envisioning something like NaN tagging, but I didn’t know enough about floating point. It was run to run into it the other day and know that people had gotten it working!
I think tagged pointers originated with early LISP implementations. The various LISP machines that were built over the years sometimes had custom memory architectures with extra bits reserved for tags!
Wait might be dense here but.. so if a program requires/requests a clean 8 bytes of stack/heap mem, and gets it to use it however it pleases, surely the actual meaning of those 8 bytes (address-value or other content) cannot possibly be messed with by the OS and/or the arch. Are you talking only about pointer values retrieved from OS/kernel calls? Yeah well maybe those shouldn’t be stuffed with custom cookie crumbs. But own pointers to own locations? Who or what, outside the program’s code, will mess with any of those bits?!
You’re right: if you’re using a custom allocator, you’ll be fine for most cases if you can control the upper bits of the allocation range. For example, if you use a custom allocator based on an mmap‘d heap and request that the address that the new heap gets mapped to is guaranteed to have all its top n bits be zero (which necessarily limits the size of your heap), then you can use the trick from TFA treat those bits however you want and life’s good, so long as your heap stays addressable in 48 bits.
But if you interface with other libraries or things allocated by the system, this is not technically future-proof. Of course, it’s not going to be a problem for a long time if ever for most platforms (but not all…). It’s just reminiscent of the old MacOS issue.
(Note that this technique technically won’t work even now on certain platforms: I don’t think it would work consistently on IBM i or z, though that’s just a guess, and SAOS’s on SPARC or POWER might also do something funky….but for mainstream systems on popular hardware this works fine.)
Ah yeah of course all heap addrs come from the system in the final analysis one way or the other. Yeah I can see the issue more clearly now
If you align everything to 8 bytes, as you say, you get 8 combinations or 3 bits to fill with whatever information you please.
As you also pointed out, when you interface with something that doesn’t know about your tagging convention, you have to strip away the tagging bits before passing it to that something.
Apart from that I don’t see how the OS or your architecture might mess with your pointers.
What might be an issue, however, is fighting the compiler. While sound from a processor/architecture perspective, the tagging is undefined behavior according to C/C++ standards. You have to compile with flags to get the desired behavior with certainty.
The problem comes up when you try to store additional data types without extra storage: assume that all pointers will have the top x bits unused and set to 0. This means that you have 2^x other namespaces of the same size without taking extra space.
For example some smalltalk VMs use a bit to identify if a word contains a pointer or a number, reducing both the address space and the native number range to 1 bit less than what the architecture supports natively. That works as long as you can control the address space (these VMs tend to work on a big pre-allocated slab of memory). In some scenarios like m68k macos the limit was lifted underneath the program (thanks to newer processors) and so the trick failed spectacularly because there were now pointers that looked to the program like something else.
NAN tagging is a bit of a step further, but this post reminds me of the table in the ruby interpreter here, listing everything that may be encoded in a pointer.
Famously, nil, true and false had to move one bit up with ruby 2.0, when they introduced a new numeric encoding that needed the rightmost bit. As the object_id is the pointer value in Ruby, that was an observable behaviour change.
The information when presented in this fashion seems so intuitive, I wonder how long they had to think about this to get it to its final state. Overall a pretty good read!