    So, can we do better than CRC32X? The M1 can run eight instructions per cycle, and our best idea so far only runs at one instruction per cycle, so maybe we can.

    It seems that you can build a sequence of instructions in assembler that compute CRC32 faster than the dedicated built-in instruction CRC32X!

    Is there any reason why this would not be fixed in a future version of the chip? For instance, could CRC32X be more energy-efficient or something like that?

      It’s funny to think of CRC32X as a CISC instruction added to a RISC design, which was later beaten by composing RISC instructions. Kind of a fresh validation of RISC!