    The Z80 (used in the contemporaneous ZX Spectrum, amongst others) has a blit instruction called LDIR. But it turns out that wasn’t the fastest way to copy memory.

    A game (whose name escapes me, but I think it was a wireframe 3d game with a bunch of spheres, possibly set on Mars?) came out with a higher framerate than should be possible. It used something I thought was a pretty amazing technique. The stack access instructions on the Z80 were faster. So to move a bunch of memory you could:

    • disable interrupts
    • move the stack pointer to src
    • pop all the 16 bit registers you can (4 of them I think)
    • move the stack ptr to dst
    • push your registers
    • repeat until blit finished
    • put the stack pointer back
    • enable interrupts again

    This link describes it in more detail and also suggests that unrolling a bunch of LDI instructions was also faster than LDIR: