I once looked at the Python VM to see how hard it would be to convert the code to use Smalltalk-style smallints or use float packing (which would limit it to systems using IEEE floating point with pointers that fit into 48 bits, which is true of all currently supported platforms) to avoid heap-allocated numbers.
It would not be worth the effort, IMHO. It would break just about every extension module too, and I don’t know how much of a speed up you’d actually get.
Is that kinda like tagged pointers, where you borrow a few bits in the pointer to store type info?
So NaN boxing (the floating point method I was talking about) is where you take as advantage of the fact that IEEE NaN values have an enormous amount of representations but only one or a few are ever used in most common situations, so you can use the others to represent anything, like pointers or whatever (so long as your pointers fit into 52-bits).
The other technique I was talking is what classic Smalltalk implementations used: any number under a certain limit is an integer, everything above that threshold is an index into the object memory. This is equivalent to saying some bits on the top are used as a tag.
A lot of the ML systems use a single bit at the top to distinguish between garbage collected and value types, IIRC. And something similar was done in a lot of LISP implementations as well.
There’s numba, which does JIT on numeric code: https://numba.pydata.org/
I had a similar problem when loading in a large text file into memory with python (the 550mb wikitext 103 dataset) which iirc took around 5+ gb of ram. I haven’t tried it yet, but the apache arrow project looks pretty interesting for serializing and loading in large objects or files quickly since they use some sort of memory-mapped serialization protocol. The huggingface/nlp library uses this and can iterate over a 17gb text file at about 2+ GB/S (https://twitter.com/thom_wolf/status/1272512974935203841)