1. 16
  1.  

  2. 6

    Extremely sophisticated clickbait. Rarely have I so wanted my 10 minutes back.

    1. 2

      So it’s the CISC of clickbait?

      1. 2

        No, it’s just CISC. Complex Instruction Set Clickbait.

      2. 2

        How so?

        1. 2

          I’m not sure. Hence ‘sophisticated’. I don’t usually post comments to complain about stuff on the internet. So it wasn’t intended as a complaint, more reflective. Checking if others felt the same way. Though I guess my annoyance did come through.

          My background is in microarchitecture, so perhaps I’m not the target audience. But this post starts out summarizing the best past research in the first page, which leads me to not filter it as something introductory. And then there’s thousands of words that didn’t actually say anything more than that past literature. I guess the title led me to expect a new, better lens on x86 vs ARM.

          1. 2

            That’s fair. I feel like I learned a lot from it, so maybe it was aimed at an audience with less expertise than you, or maybe it’s clickbait that seems insightful to people with less expertise.

      3. 4

        What I got out of the article:

        • Nowadays, the ISA is quite cleanly separated from the implementation: it’s not just decoded then executed, the stream of instructions is chopped (sometimes fused!) into micro-operations, whose characteristics are quite different from any instruction set: they’re wider (most likely to make decoding easier or even trivial), and generally do less work. Then there’s a second stage where the micro operations are actually executed (out of order). It’s almost as if the processor had some internal ISA focused on implementation instead of external constraints like code density, ease of programming, or plain backward compatibility. Anyway, this separation makes the RISC vs CISC debate a bit moot, because the ISA doesn’t affect the whole processor. Internally, processors do what they gotta do.

        • On the other hand, the ISA does matter. The x86 line of instruction set is remarkably hard to decode, to the point where decoding has become a bottleneck, forcing CPU vendors to add micro-code cache and other complications so instruction decoding could be fast enough (a problem that ARM is mostly free from). So the separation of the ISA and the implementation isn’t quite complete, and having a complex, hard to decode ISA not only has direct costs at the decoding step, it has indirect costs where the rest of the CPU has to compensate for slower decoding. It’s not just a CISC vs RISC thing: it’s also about legacy, and adding instructions to an instruction set that didn’t leave room for new instructions (quite unlike RISC-V). What we may call the “CISC tax” really is a “legacy tax”.

        • One important point where the ISA matters though is backward compatibility. Especially through the 90’s and early 2000’s, where cross compilation was even harder than it is now (that with every compiler and environment assuming they were compiling for the local machine), and distribution media more limited: imagine shipping 3 versions of your productivity suite on the shelves (I believe this also explains for some extent the Windows hegemony). Now however, we can just download the right executable for our platform. Anyway, this backward compatibility thing, as well as the quality of Intel’s foundry, skewed the game to the point where it isn’t clear whether the x86 tax will really matter in the long run. Now though, we stand to know more as Apple deploys their new ARM processors (I for one was quite surprised: I thought x86-64 would be impossible to displace when Apple ditched their PowerPC processors. But I guess the reason Apple can switch away from x86-64 is the same reason it could switch to it: they have such control over their ecosystem that they can force everyone to recompile everything).

        1. 4

          The first point is both true and misleading. It’s true that there’s little complexity in the majority of the pipeline(s) from some instructions being a compressed representation. For example, an x86 register-memory add instruction may be cracked into a load and an ALU op in the front end and has little impact on the rest of the pipeline. This; however, ignores the massive impact of microcoded instructions. Microcode is very cheap to implement if the instructions that are implemented in microcode are rarely used. When you hit a microcoded instruction, you stall the pipeline and just push in the instructions from the microcode. If you have microcoded instructions that are performance critical, then there’s a huge power / complexity cost to implement these.

          The really interesting place where the architecture leaks into the microarchitecture is the memory model. TSO requires a lot more hardware resources (longer reorder buffers, more complex cache coherency) than a weak memory model. The really interesting thing about the Apple M1 is that they implement TSO for x86 emulation, so they’re paying that cost. I believe it’s configurable via an MSR: I’d love to know what they turn off in their core when it’s disabled.

          The third point is missing the ecosystem cost. You can just download an AArch64 binary quite easily now, because there are AArch64 ports of gcc, clang, V8, and so on (including debugging tools and other bits of critical developer infrastructure). The value of this ecosystem is estimated to be around $2bn. Arm is able to play in this space because between Android and iOS they have a huge number of developers to amortise the cost of this ecosystem over a huge number of developers. In theory, it’s just as easy to download a MIPS binary, but there are so many holes in the developer ecosystem for MIPS that it’s unlikely to exist. This is the big struggle for RISC-V: it’s hard to bootstrap a software ecosystem if there aren’t any devices and it’s hard to get people to invest in the hardware if there’s no software ecosystem.

          1. 1

            That was insightful, thanks.

            What are TSO and MSR? Three Letters Acronyms are not easy to search for… I believe I can guess from context that TSO is some kind of strong memory model where everything seems to happen in order (from the programmer’s point of view), but I’m not sure exactly. Also, I’m not sure exactly how weak are the weaker memory models you’re referring to. I have no idea what MSR might mean, though.

            I did miss the fact that Apple could lean on the iOS ecosystem (and Arm in general) to bootstrap its M1 change, though. I wonder whether that’s why they could switch to Intel as well: there was a huge user base and expertise to begin with.

            I would love to see competitive processor designs with RISC-V. I’ve seen both praises and criticisms about what this ISA may or may not enable, but without some big company putting it at use on a seriously high performance (or low power) environment, as a layman I can’t know for sure, and that’s bloody frustrating. I want to simplify computing at every level (and did my little contribution to that end), so I want to know: does the RISC-V ISA (or family of ISA, really) have a comparative advantage in the performance or power efficiency areas? Or does its simplicity has a cost?

            1. 4

              What are TSO and MSR?

              TSO is Total Store Ordering. It’s the x86 strong memory model.

              MSR is machine-specific register (sometimes model-specific register). RISC-V calls them CSRs (control and status register). Basically, registers that aren’t used as normal operands for instructions and may have side effects when read from or written to.

              I did miss the fact that Apple could lean on the iOS ecosystem (and Arm in general) to bootstrap its M1 change, though. I wonder whether that’s why they could switch to Intel as well

              Yes. For OS X on PowerPC, Apple was mostly maintaining the GCC port, for example. A huge number of other folks were working on the GCC, GDB, and Binutils x86 versions. Apple had to add Mach-O support and any Darwin-specific bits, but those are tiny in comparison. IBM was mostly investing in XLC back then, so was happy for Apple to do all of the GCC work, Freescale also had their own proprietary compilers for embedded, so didn’t care much. Apple was basically maintaining the PowerPC software ecosystem.

              does the RISC-V ISA (or family of ISA, really) have a comparative advantage in the performance or power efficiency areas? Or does its simplicity has a cost?

              My personal view is that RISC-V went too far in terms of simplicity and will have to back-pedal. For example, RISC-V doesn’t (didn’t?) have a broadcast i-cache invalidate. This means that if you want to avoid stale instruction fetches, every time you map a page with execute permission you need to IPI all of the cores and do a local i-cache invalidate. SPARC did this and found that, on large systems (which, back then, meant >8 cores), the process-creation cost was too high. There are a lot of things like this, where you get a small simplicity benefit, but it turns out that the extra hardware complexity in Arm leads to much better whole-system performance.

              I think this kind of thing is going to be the big problem with RISC-V. A lot of these things are easy to fix and if you fix them then you can get a big competitive advantage. This means that the incentive for any RISC-V vendor is to addd a bunch of custom extensions and show that (their branch of) Linux runs much faster on their hardware than any other RISC-V vendor’s hardware. This kind of fragmentation is exactly what killed MIPS: every MIPS vendor did something custom and clever. Every vendor had their own Linux / GCC fork. No one got any ecosystem benefits, and anything not from that vendor used MIPS III as the baseline because that’s the most that everyone supported.

              The only way out of this path that I see is for a large player (for example, Google or Huawei) to capture enough of a market that they define the de-facto RISC-V standard. If Google defined a RISC-V Android Extension, for example, and promised to support it for AOSP and the Play Store then they’d probably get a number of CPU vendors to adopt it. At that point; however, it’s a royalty-free ISA, but not an openly developed one.

              1. 1

                My personal view is that RISC-V went too far in terms of simplicity and will have to back-pedal.

                Okay thanks. I have the feeling this would be expected from a design that originated from a teaching environment. Also, if I understand correctly, this can mostly be solved by adding relevant extensions.

                Speaking of which, I’m not sure we should be that worried about fragmentation: we’ve had the example of Intel and AMD implementing compatible multimedia and SIMD extensions for instance. OpenGL also had vendor-specific extensions, that were eventually standardised (possibly with a slightly different API, but standardised nonetheless). The same could happen to RISC-V: if I recall correctly, they have a way to standardise extensions. That said, a larger player throwing its weight would definitely help (with the drawback you mentioned).

                it’s a royalty-free ISA

                Something that puzzles me: is it even possible for an ISA, meaning the interface of a CPU, to be anything but royalty free? It’s been long established that making stuff that’s compatible with a competitor’s is not illegal, at least in many Western countries (see third party printer cartridges). The only way to compete with an existing CPU with an existing software ecosystem is to implement the same ISA. It ought to be permitted. Of course I expect established players will try to bully newcomers with lawsuits (as Intel have done to AMD), and may even succeed. I’m just not sure they would actually have a case.

                1. 4

                  Okay thanks. I have the feeling this would be expected from a design that originated from a teaching environment. Also, if I understand correctly, this can mostly be solved by adding relevant extensions.

                  In theory, yes, but RISC-V’s encoding design means that there’s very little 16-bit instruction space (and it’s all used by the C extension) and most of the 32-bit encoding space is gone too. Extensions are soon going to start being pushed out into the 48-bit space. The problem is, no one wants that for their own extension and so vendors all use overlapping bits of the 32-bit encoding space.

                  Speaking of which, I’m not sure we should be that worried about fragmentation: we’ve had the example of Intel and AMD implementing compatible multimedia and SIMD extensions for instance

                  Intel implemented things, AMD copied them. AMD’s 3DNow was not widely deployed in software. Similarly, x86-64 was an AMD feature that Intel copied. This happened because one company gained significant market share and ecosystem buy-in, and the other had a patent cross-licensing agreement that allowed the other to implement it.

                  OpenGL also had vendor-specific extensions, that were eventually standardised (possibly with a slightly different API, but standardised nonetheless)

                  A few did. The fragmentation in the OpenGL ecosystem was a big reason for game developers moving to DirectX, which defined a set of features for each level and then required everyone to implement the same thing if they wanted to claim compliance. This was possible because Microsoft had enough market share that the DirectX team was able to act as a standards group and push consensus. Google may manage to be this central clearing point, but then it’s a Google ISA.

                  Something that puzzles me: is it even possible for an ISA, meaning the interface of a CPU, to be anything but royalty free?

                  Yes. There’s some awful precedent in MIPS, where they patented the technique for implementing their LWL / LWR instructions and then managed to sue a company that provided a fast software implementation in the illegal instruction trap on a CPU that implemented only the unpatented MIPS instructions.

                  In practice, there are two ways that you make an ISA not royalty-free. The simplest is that you patent a technique that is required to implement some instructions such that any vaguely efficient implementation is covered by a patent. Intel claims to have over 200 patents on SSE / AVX of this nature.

                  The second is to trademark the name and charge for certification of compliance. This is the main way that Arm protects their ISA. They do have some patents, but it’s almost impossible to implement a fully compliant core for any non-trivial ISA without a good conformance test suite and Arm controls the only one for their ISA (and their architecture docs are released under a license that prevents using them to develop a separate one). This means that you can possibly produce a knock-off Arm core without trampling any patents, but in practice it’s likely that it would have bugs that would break compatibility for some software.

                  Their internal x86 specs are some of the most valuable assets that companies like Centaur and AMD own. There’s a lot of undocumented / unspecified behaviour that comes from errata in older CPUs that software depends on, just implementing an x86 core from the Intel reference would probably not give something that worked.

                  1. 1

                    Something that puzzles me: is it even possible for an ISA, meaning the interface of a CPU, to be anything but royalty free?

                    Some ISAs have patented instructions or patents that cover all implementations; the Cray vector register patent and the MIPS patent on LWL/LWR are two historical examples of this.

                    1. 1

                      Some ISAs have patented instructions or patents that cover all implementations

                      Oh Dear. I suspected something similar, but if it’s that bad, I bet if I started to design my own CPU for fun I’d be likely to actually infringe some random patent. I wonder how many hoops RISC-V ended up hopping through just to avoid patents.

                      1. 3

                        Most of a conventional RISC ISA should be fine, e.g. load/store, logical operations, branches, etc. There are many 20+ year old designs you can copy instructions from. SIMD, bitfield, or string instructions are the main areas I would be concerned about.