This is a technique that should be used more! A related trick is relative pointers, where instead of a pointer you store an offset from the address where the offset is stored. These can be byte offsets, or larger scale depending on your data alignment. If you combine this with a custom allocator using a contiguous heap, you can replace pointers with 32-, 24- or even 16-bit offsets. This is a big win for locality of reference. Unlike the V8 method it doesn’t require accessing an external base pointer.
(In C++ or other languages that support smart pointers, this can all be done with little effect on the source code. You just create a RelativePointer class that stores an offset but converts to/from a real pointer.)
I’d like to see the comparison between 32bit native program vs the 64bit vs the pointer compression.
This reminds the real mode of x86, when memory is addressed by segments (20bits).
A similar mechanism is available in hotspot: compressed oops. Because of alignment, they can address a 32gb heap, which is a bit less stifling—4gb is completely suitable for js, but is a harder sell for java in many contexts.
It depends a bit on how you handle arrays. Often, a large proportion of the heap (50-80%) is used by large arrays. With pointer compression, you can trade some maximum heap size for indirection. If pointers to arrays are actually pointers to small objects that contain a 64-bit pointer, then each array consumes a few words of the main heap and can be separately allocated. This is especially useful for Java arrays of primitives and JavaScript TypedArray objects, which are always leaf nodes in the GC graph and so don’t need scanning, so putting them on a separate memory region improves locality during a GC scan.
Java is in a much better position than JavaScript to implement graceful fallback because of the strong typing. If you make Object and interface references 64 bits, but everything else 32 bits, then you can profile, find objects that are consuming a large amount of heap space, and promote these to having full 64-bit pointers, rewriting objects that hold pointers to them in the copy phase of a GC pass. If you have a 32 GiB heap and start doing this once the size passes 8 GiB, then you have a lot of time to relocate things before you hit the limit.
All of that said, GC complexity scales with the size of the heap and so if you have a single 32 GiB heap for a Java program, you’d almost certainly improve performance by splitting it into a load of smaller Java processes that communicated explicitly.
I don’t quite follow. If the majority of the memory is taken up by large arrays of primitives, then it shouldn’t matter how much space you need to point to them; the majority of the space will still be taken up by their contents. (Unless you are doing aggressive structure-sharing—of contiguous subslices—but that has a tendency to leak and needs to be done with care.)
This is a technique that should be used more! A related trick is relative pointers, where instead of a pointer you store an offset from the address where the offset is stored. These can be byte offsets, or larger scale depending on your data alignment. If you combine this with a custom allocator using a contiguous heap, you can replace pointers with 32-, 24- or even 16-bit offsets. This is a big win for locality of reference. Unlike the V8 method it doesn’t require accessing an external base pointer.
(In C++ or other languages that support smart pointers, this can all be done with little effect on the source code. You just create a RelativePointer class that stores an offset but converts to/from a real pointer.)
I’d like to see the comparison between 32bit native program vs the 64bit vs the pointer compression. This reminds the real mode of x86, when memory is addressed by segments (20bits).
The V8 blog is so good. I need to spend more time reading back through it. Great for language hackers and web developers.
A similar mechanism is available in hotspot: compressed oops. Because of alignment, they can address a 32gb heap, which is a bit less stifling—4gb is completely suitable for js, but is a harder sell for java in many contexts.
It depends a bit on how you handle arrays. Often, a large proportion of the heap (50-80%) is used by large arrays. With pointer compression, you can trade some maximum heap size for indirection. If pointers to arrays are actually pointers to small objects that contain a 64-bit pointer, then each array consumes a few words of the main heap and can be separately allocated. This is especially useful for Java arrays of primitives and JavaScript TypedArray objects, which are always leaf nodes in the GC graph and so don’t need scanning, so putting them on a separate memory region improves locality during a GC scan.
Java is in a much better position than JavaScript to implement graceful fallback because of the strong typing. If you make
Objectandinterfacereferences 64 bits, but everything else 32 bits, then you can profile, find objects that are consuming a large amount of heap space, and promote these to having full 64-bit pointers, rewriting objects that hold pointers to them in the copy phase of a GC pass. If you have a 32 GiB heap and start doing this once the size passes 8 GiB, then you have a lot of time to relocate things before you hit the limit.All of that said, GC complexity scales with the size of the heap and so if you have a single 32 GiB heap for a Java program, you’d almost certainly improve performance by splitting it into a load of smaller Java processes that communicated explicitly.
I don’t quite follow. If the majority of the memory is taken up by large arrays of primitives, then it shouldn’t matter how much space you need to point to them; the majority of the space will still be taken up by their contents. (Unless you are doing aggressive structure-sharing—of contiguous subslices—but that has a tendency to leak and needs to be done with care.)