The Z80 (used in the contemporaneous ZX Spectrum, amongst others) has a blit instruction called LDIR. But it turns out that wasn’t the fastest way to copy memory.
A game (whose name escapes me, but I think it was a wireframe 3d game with a bunch of spheres, possibly set on Mars?) came out with a higher framerate than should be possible. It used something I thought was a pretty amazing technique. The stack access instructions on the Z80 were faster. So to move a bunch of memory you could:
This link describes it in more detail and also suggests that unrolling a bunch of LDI instructions was also faster than LDIR: