1. 42
  1.  

  2. 24

    “I think somebody inside of Intel needs to really take a long hard look at their CPU’s, “

    He needs to take a long look at the security techniques used in his kernel while he’s saying that. Didn’t Linus and others gripe to them about the first, protection mechanism (segments) they added to x86 after it had been successful in security kernels that resisted NSA pentesters? Actually, they resisted about everything security engineers were pushing. He can stay off his high horse. If you think segments are small investment, they’ve tried to do [more] secure CPU’s three times:

    1. iAPX 432 that ran 25% of competitors’ speeds. Market failure.

    2. i960 in “Billions invested in Nothing” Project

    3. Itanium with use in at least one OS similarly a 8-9, zero loss.

    Now, I notice they keep breaking too much with backward compatibility with ISA, OS, and/or favorite language of the ecosystem for each big loss. The CompSci work plus their own that fixes things up within their ecosystem might not see such losses esp if performance is decent. Worst case is they sell it as a separate line where protections are on for companies who pay extra to turn them on. They can blame it on demand with numbers to back that claim.

    So, Intel already bet huge on hardware, OS’s, and languages with better security or maintainability. The results were billions lost. That probably reinforced an organizational instinct to avoid doing too much to piss off their locked-in, x86, C/Windows/Linux/Mac-using customers. Their market mostly doesn’t care about security enough to make sacrifices to achieve it. That includes really using the CPU features they already have to best effect. Intel’s best bet, like software market, is to just do a small fix, crank out another chip with good performance, absorb any problems, fix what they can enough to get next buying cycle going, and rinse repeat. Their only competitor in performance space, AMD, is similarly apathetic. That helps.

    1. 5

      FWIW: a possible hint at Intel’s future directions (or maybe just a temp mitigation?) are in the IBRS patchset to Linux at https://lkml.org/lkml/2018/1/4/615: one mode helps keep userspace from messing with kernel indirect call speculation and another helps erase any history the kernel left in the BTB. I bet both of these are blunt hammers on current CPUs (‘cause a microcode update can only do so much–turn a feature off or overwrite the BTB or whatever), but they’re defining an interface they want to make work more cheaply on future CPUs. It also seems to be enabled in Windows under the name “speculation control” (https://twitter.com/aionescu/status/948753795105697793)

      ARM says in their whitepaper that most ARM implementations have some way to turn off branch prediction or invalidate branch predictor state in kernel/exception handler code, which sounds about in line with what Intel’s talking about. The whitepaper also talks about barriers to stop out of bounds reads. The language is a bit vague but I think they’re saying an existing conditional move/select works on current chips and a new instruction, CLDB will barrier for future chips that provides just the minimum you need to avoid the cache side channel attack.

      1. 2

        A competent CPU engineer would fix this by making sure speculation doesn’t happen across protection domains. Maybe even a L1 I$ that is keyed by CPL.

        I feel like Linus of all people should be experienced enough to know that you shouldn’t be making assumptions about complex fields you’re not an expert in.

        1. 22

          To be fair, Linus worked at a CPU company,Transmeta, from about ‘96 - ‘03(??) and reportedly worked on, drumrolll, the Crusoe’s code morphing software, which speculatively morphs code written for other CPUs, live, to the Crusoe instruction set.

          1. 4

            My original statement is pretty darn wrong then!

            1. 13

              You were just speculating. No harm in that.

          2. 15

            To be fair to him, he’s describing the reason AMD processors aren’t vulnerable to the same kernel attacks.

            1. 1

              I thought AMD were found to be vulnerable to the same attacks. Where did you read they weren’t?

              1. 17

                AMD processors have the same flaw (that speculative execution can lead to information leakage through cache timings) but the impact is way less severe because the cache is protection-level-aware. On AMD, you can use Spectre to read any memory in your own process, which is still bad for things like web browsers (now javascript can bust through its sandbox) but you can’t read from kernel memory, because of the mitigation that Linus is describing. On Intel processors, you can read from both your memory and the kernel’s memory using this attack.

                1. 0

                  basically both will need the patch that I presume will lead to the same slowdown.

                  1. 9

                    I don’t think AMD needs the separate address space for kernel patch (KAISER) which is responsible for the slowdown.

            2. 12

              Linus worked for a CPU manufacturer (Transmeta). He also writes an operating system that interfaces with multiple chips. He is pretty darn close to an expert in this complex field.

              1. 3

                I think this statement is correct. As I understand, part of the problem in meltdown is that a transient code path can load a page into cache before page access permissions are checked. See the meltdown paper.

                1. 3

                  The fact that he is correct doesn’t prove that a competent CPU engineer would agree. I mean, Linux is (to the best of my knowledge) not a CPU engineer, so he’s probably wrong when it comes to get all the constraints of the field.

                  1. 4

                    So? This problem is not quantum physics, it has to do with a well known mechanism in CPU design that is understood by good kernel engineers - and it is a problem that AMD and Via both avoided with the same instruction set.

                    1. 3

                      Not a CPU engineer, but see my direct response to the OP, which shows that Linus has direct experience with CPUs, frim his tenure at Transmeta, a defunct CPU company.

                      1. 5

                        frim his tenure at Transmeta, a defunct CPU company.

                        Exactly. A company whose innovative CPU’s didn’t meet the markets needs and were shelved on acquisition. What he learned at a company making unmarketable, lower-performance products might not tell him much about constraints Intel faces.

                        1. 11

                          What he learned at a company making unmarketable, lower-performance products might not tell him much about constraints Intel faces.

                          This is a bit of a logical stretch. Quite frankly, Intel took a gamble with speculative execution and lost. The first several years were full of erata for genuine bugs and now we finally have a userland exploitable issue with it. Often security and performance are at odds. Security engineers often examine / fuzz interfaces looking for things that cause state changes. While the instruction execution state was not committed, the cache state change was. I truly hope intel engineers will now question all the state changes that happen due to speculative execution. This is Linus’ bluntly worded point.

                          1. 3

                            (At @apg too)

                            My main comment shows consumers didnt pay for more secure CPU’s. So, that’s not really a market requirement even if it might prevent costly mistakes later. Their goal was making things go faster over time with acceptable watts despite poorly-written code from humans or compilers while remaining backwards compatible with locked-in customers running worse, weirder code. So, that’s what they thought would maximize profit. That’s what they executed on.

                            We can test if they made a mistake by getting a list of x86 vendors sorted by revenues and market share. (Looks.) Intel is still a mega corporation dominating in x86. They achieved their primary goal. A secondary goal is no liabilities dislodging them from that. These attacks will only be a failure for them if AMD gets a huge chunk of their market like they did beating them to proper 64-bit when Intel/HP made the Itanium mistake.

                            Bad security is only a mistake for these companies when it severely disrupts their business objectives. In the past, bad security was a great idea. Right now, it mostly works with the equation maybe shifting a bit in future as breakers start focusing on hardware flaws. It’s sort of an unknown for these recent flaws. All depends on mitigations and how many that replace CPU’s will stop buying Intel.

                          2. 3

                            A company whose innovative CPU’s didn’t meet the markets needs and were shelved on acquisition.

                            Tons of products over the years have failed based simply on timing. So, yeah, it didn’t meet the market demand then. I’m curious about what they could have done in the 10+ years after they called it quits.

                            might not tell him much about constraints Intel faces.

                            I haven’t seen confirmation of this, but there’s speculation that these bugs could affect CPUs as far back as Pentium II from the 90s….

                        2. 1

                          The fact that he is correct doesn’t prove that a competent CPU engineer would agree.

                          Can you expand on this? I’m having trouble making sense of it. Agree with what?