1. 26

  2. 1

    Arm no longer uses RISC to describe their architectures, since they don’t quite fit the description. They instead use ‘load-store architecture’. Great overview aside from that.

    1. 1

      I’m unconvinced any 80s style RISC exists at all in any modern “RISC” - the a core idea there was that the compiler would do instruction scheduling for the CPU but that turns out to be exceedingly brittle, and meant that the later generations of the same ISAs had to support exposed implementation details (hello unnecessary branch delay slots!) and do their own instruction scheduling anyway. If you exclude the very first few generations I’d argue that for the majority RISC history the difference between CISC and RISC has just been the load/store separation.

      The idea of RISC meaning “not many instructions” has always been kind of iffy because a lot of the claimed CISC instructions come about from considering instructions to be different if they were reading from a register or memory. So a RISC processor might have one add integer instruction, but x86 might have many because it had add two registers, but also add register and memory, and add register and memory+index, etc. Many of the “complex” instructions in performance CISC chips that were added after the 80s were also added to performance RISC ones as well. The only stable difference between CISC and RISC in reality has been the “no intermingling of processing and loading data”.

      1. 1

        Reduced didn’t necessarily mean few, it meant orthogonal. The early RISC ISAs, for example, had very limited addressing modes because you could implement the same things with adds, shifts, and then loads or stores. This often gave you more efficient code because the compiler could CSE the address computation and reuse it. This became a bad idea in the mid ‘90s for a few reasons:

        • Even in-cache memory ops took multiple cycles, so an extra add and shift added very little.
        • Pipelines were long enough that the shift and add could be hidden quite easily.
        • Register rename became the most costly part of a core and having an extra live rename register was far more expensive than doing some cheap arithmetic.

        RISC-V failed to learn that lesson.

        Early RISC ISAs also lacked hardwares multiply and divide. Arm has both (though I think at least divide is optional in M profile). Divide was particularly bad because a new technique for hardware division was invented a year or two after the first RISC chips shipped and any ISA with a divide instruction could outperform the expanded sequence that RISC chips had. That’s very workload dependent though and RISC-V makes these optional for a good reason. In CHERIoT RTOS we don’t need a hardwares divide, the only place we do a divide that can’t be trivially converted to a shift is a divide by 10 for pretty printing numbers for debugging (not used in release builds), and this is a divide-by-constant that the compiler can expand. For small embedded cores, hardware division is a big area cost for little benefit.

        The key philosophical difference between early RISC and CISC was that RISC ISAs were created to be targets for compilers, whereas CISC ISAs were intended to be programmed directly (compilers were a nice optional extra). In this regard, even the first ARM core was more CISCy than RISCy: With load and store multiple instructions giving a single instruction for prologues and epilogues, rich addressing modes, and predicated instructions, ARM assembly was a nicer programming language than many compiled languages. The RiscOS source code (ARM assembly) is far more readable than most contemporary C UNIX kernels.