It’s really interesting to think of operations occurring next to the memory.
Rather than a small set of bitwise ops, if there was a more capable, but still specialised processor embedded in each bank of ram, it start to looks like a heterogeneous multiprocessor system. It might even look like a “central CPU” + “ “simple GPU per bank of RAM”? (essentially supporting only vectorised ops).
A more aggressive rearchitecting would do away with the central CPU and have a collection of banks of RAM+CPU essentially be a replacement for current CPU cores+cache/cluster of compute+RAM nodes).
How do we find the best place to put the line between “bank of RAM which knows a few tricks” and “16 x (1GB of RAM+CPU)” as a model?
RAM cloning seems like it would provide a major boost to copy on write operations. Wonder what the performance improvements would be like for forking processes.
Is there a recording of this anywhere? I find it much easier to listen to a talk than to read the slides of it.
GameBoy Advance had special DMA operations for copying and zeroing memory without involving the CPU. It also had CPU’s SRAM exposed as an area of memory instead of used as L1 cache, so you could manage it yourself. So ahead of the time!
That sounds like scratchpad memory. That goes way back. Has power and performance advantages if managed correctly. Thing is, you have to manage it. I guess scratchpad vs cache is memory version of code it in C vs GC’d language.
3D ICs (“chip stacking”) are also promising for reducing memory power usage. Since the existing manufacturing processes for normal DRAM and logic are mostly not compatible (see Ryzen), making separate chips and stacking / interconnecting them is the approach some companies are taking.
Maybe one day we will just have combined processor + RAM sockets on motherboards?