1. 40
  1.  

  2. 24

    So, I saw two, well-known professionals in security falsely claiming cache-based timing channels were discovered in 2005. They were actually discovered in early-to-mid 1990’s along with most of x86’s problems following TCSEC methods one of them calls useless, red tape. Summarizing some of my recent comments, I laid out the history of what was done, when, by who, with what mitigations, and how actually reading prior work would’ve found new attacks sooner if not preventing them outright. The biggest reason readers see me constantly dropping CompSci submissions is a disturbing trend (esp in INFOSEC) of ignoring prior work to only rediscover what was in it the hard way after it does a lot of damage. They usually follow by patting themselves on the back for their “discoveries.” It seems to be a cultural thing motivated by social signalling in groups since most will ignore the work after a popular member claims it has no value. Eliminating this in favor of reading and building on prior work in INFOSEC or programming would’ve drastically accelerated development of good solutions.

    There’s a related note to this: security certifications. Anyone reading HN, Lobsters, or other forums with submissions about this will see security professionals often recommend against it. They often talk like it’s never happened, focus on some failed regulations, or speak speculatively. The work I cite in my submission was prior work by professionals following the TCSEC’s requirements. The systems those methods produced were highly secure when analyzed or pentested spotting threats such as cache-based, timing channels a decade in advance of popular folks and conferences. The methods clearly worked at improving security of commercial products. So, if you see the topic again, remind them of that so they start with a more honest and informed position: regulation of computer security worked before with great results but had (specific problems here) we need to fix if we do it again. That would be true.

    1. 8

      I can’t imagine how frustrating the past week must have been for you to watch, thanks for sharing all of these.

      1. 9

        Yeah, good guess. I paused to chill out before checking responses. The irritating part as I told Colin Percival just now is that we’ve been telling everyone from security folks to VMM builders about this specific work for over twenty years. I personally bring it up on every VMM thread I can if it’s security-focused since most of the problems and techniques still are true. The rest can be improved on. It’s extremely irritating just how systematically projects or people claiming to be focused on improving security avoid prior work in CompSci or high-security field that did exactly that. We see the old problems they ignore reappear after the tech is used by millions of people or critical functions where the damage will be high. It will happen again before 2019.

      2. 1

        I’m confused: is the purpose of this thread to stroke your own ego and claim you are smarter than Tom Ptacek and Colin Percival?

        …ignoring prior work to only rediscover what was in it the hard way after it does a lot of damage. They usually follow by patting themselves on the back for their “discoveries.

        By saying things like this you are downplaying one of the most impressively sophisticated attacks in the history of information security.

      3. 11

        I think Colin makes a good point about the distinction between covert channels and side channels. There’s certainly a relationship, but I think you’re overreaching a bit to say all of this was known. The covert channel research was about ensuring that two cooperating processes could not pass information. In high trust environments, that’s certainly important. But side channels involve an non cooperating victim. My kernel isn’t trying to bleed secrets to userland; it’s trying to do the opposite in fact. I think these side channels were less studied because the threat model assumes the kernel won’t be working against you.

        I generally consider my browser process to be hostile for instance. But I’m not equally suspicious of ssh-agent. I certainly don’t think the two are working in concert to pass secrets. So I don’t need a CPU that’s verifiably secure against that. Thus, I didn’t buy one. I surely would like a CPU that’s secure against my browser attacking a victim ssh-agent, though.

        (This is nevertheless a good opportunity to review existing literature in the space. We can have a new problem that’s solved by old methods.)

        1. 5

          Like I told him, I’m not sure: it can go either way. As I quoted and cited, they did describe the covert channels as an incidental effect of the design of hardware and software components where they leak secrets via storage or timing. They also focused on how malicious parties might drive that with a Sender but some like one I quoted had models where only a receiver was necessary. The two are equivalent in a model if you describe what they do in terms of process/shared-resource/process rather than why they do it. So, I’m not sure if they didn’t make the distinction because they didn’t know or because it didn’t matter. They’d have to do something about every interaction with shared resources with a dynamic element to stop leaks whether a malicious process was sending them or not.

          They were also worried about everything from CPU cache to memory buses to transmission wires leaking electromagnetic current which sounds like side channel research. I have to call them all side channels now, though, due to what happens when a term gets popular taking over a segment of the field. So, I usually described covert channels as “covert or side channels” to new people anyway. Helps them avoid confusion.

          “ But I’m not equally suspicious of ssh-agent.”

          Why not? Let me give you one of simplest rules I learned from MLS-style systems: any program acting on malicious input can become malicious. By mixing input that’s low integrity (malicious) and high integrity (saintly), SSH gets classified low integrity (malicious) by default since it could become malicious at any time. By TCSEC rules, you have to rigorously analyze SSH to prove it won’t become malicious on arbitrary inputs. If you can’t or for extra defense, you have to analyze underlying TCB to show it will contain SSH’s malice. That malice ranged from access attempts to covert channels in 1980’s with cache channels added in early 1990’s. Repeat this analysis or protection strategy for any component that can change state in response to potentially malicious input. You eventually get to one or more components you can’t mediate that absolutely have to work. That was the reference monitor in old systems, often a security kernel but could be a CPU, that was done with high-assurance methods.

          Far as a CPU, it might be better at stopping the cache/memory attacks, isolating different domains, or both. The first one might keep SSH trustworthy if it was at least designed right. The malicious inputs should have reduced or no effect. The second might contain it. The third will do both. Hardware faults, transient or logical, become the next problem but I’m keeping scope simple. You benefit from a CPU that does both unless you can vet every bit of code to not mess anything low-level up when parsing a file or receiving protocol I/O that’s malicious immediately or many steps down the line. The partitioning CPU’s meet your basic requirement, though. They don’t perform like Core Duo 2’s, though.

          “This is nevertheless a good opportunity to review existing literature in the space. We can have a new problem that’s solved by old methods.”

          Yep, yep. That’s the most important lesson. Good news is a few parts of CompSci were improving on old work instead of ignoring it. The information-flow control, security-typed languages, and covert channel segments have all been making tools to help with a range of automation and practicality. So, the old methods can work plus there’s new ones that might be better. My recent idea combining the language-oriented methods with models of hardware as interacting state machines to see if their automation would spot all of that. Keeping those things in back of mind in case I see opportunities.

          1. 2

            For me, ssh-agent is part of the (largish) TCB. If it’s under adversarial control, all is lost.

            1. 1

              Well, that makes sense for quite a few use cases. It doesn’t have to be so where even SSH might not be allowed to overwrite system files, backups, logs, and so on. There’s scenarios where one doesnt want just one admin doing that or where admin would prefer doing a few things locally.

              So, SSH could be disruptive quite a bit but most stuff that’s important are read only. That’s along with the logs that might ahow how it went down.

        2. 10

          For anyone who avoids HN, here’s cpercivia’s response and my follow up:

          “If you read my 2005 paper, you’ll see that I devoted a section to providing the background on covert channels, dating back to Lampson’s 1973 paper on the topic. I was very much aware of that earlier work. My paper was the first to demonstrate that microarchitectural side channels could be used to steal cryptologically significant information from another process, as opposed to using a covert channel to deliberately transmit information.” (cperciva)

          My response:

          Hmm. It’s possible you made a previously-unknown distinction but I’m not sure. The Ware Report that started INFOSEC field in 1970 put vulnerabilities in three categories: “accidental disclosures, deliberate penetration, and physical attack.” The diagram on Figure 3 (p 6) shows with radiation and crosstalk risks they were definitely considering hardware problems and side channels at least for EMSEC. When talking of that stuff, they usually treat it as a side effect of program design rather than deliberate.

          https://csrc.nist.gov/csrc/media/publications/conference-paper/1998/10/08/proceedings-of-the-21st-nissc-1998/documents/early-cs-papers/ware70.pdf

          Prior and current work usually models secure operation as a superset of safe/correct operation. Schell, Karger, and others prioritized defeating deliberate penetration with their mechanisms since (a) you had to design for malice from the beginning and (b) defeating one takes care of the other as a side effect. They’d consider the ability for any Sender to leak to any Receiver to be a vulnerability if that flow violates the security policy. That’s something they might not have spelled out since they habitually avoided accidental leaks with mechanisms. Then again, you might be right where they never thought of it while working on the superset model. It’s possible. I’m leaning toward they already considered side channels to be covert channels given descriptions from the time:

          “A covert channel is typically a side effect of the proper functioning of software in the trusted computing base (TCB) of a multilevel system… Also, as we explain later, malicious users can exploit some special kinds of covert channels directly without using any Trojan horse at all.”

          “Avoiding all covert channels in multilevel processors would require static, delayed, or manual allocation of all the following resources: processor time, space in physical memory, service time from the memory bus, kernel service time, service time from all multilevel processes, and all storage within the address spaces of the kernel and the multilevel processes. We doubt that this can be achieved in a practical, general purpose processor. “

          https://csrc.nist.gov/CSRC/media/Publications/conference-paper/1992/10/13/proceedings-15th-national-computer-security-conference-1992/documents/1992-15th-NCSC-proceedings-vol-1.pdf

          The description is it’s incidental problem from normal, software functioning that can be maliciously exploited with or without a Trojan horse. They focus on penetration attempts since that was culture of time (rightly so!) but know it can be incidental. They also know in second quote just how bad the problem is with later work finding covert channels in all of that. Hu did the timing channels in caches that same year. Wray made a SRM replacement for timing channels year before. They were all over this area but without a clear solution that wouldn’t kill the performance or pricing. We may never find one if talking timing channels or just secure sharing of physical resources.

          Now far as your work, I just read it for refresher. It seems to assume, not prove, that the prior research never considered incidental disclosure. Past that, you do a great job identifying and demonstrating the problem. I want to be extra clear here I’m not claiming you didn’t independently discover this or do something of value: I give researchers like you plenty credit elsewhere on researching practical problems, identifying solutions, and sharing them. I’m also grateful for those like you who deploy alternatives to common tech like scrypt and tarsnap. Much respect.

          My counter is directed at the misinformation than you personally. My usual activity. I’m showing this was a well-known problem with potential mitigations presented at security conferences, one product was actually built to avoid it, it was higly cited with subsequent work in high-security imitating some of its ideas, these prior works/research is not getting to new ones concerned about similar problems, some people in security field are also discouraging or misrepresenting it on top of that, and I’m giving the forerunners their due credit plus raising awareness of that research to potentially speed up development of next, new ideas. My theory is people like you might build even greater things if you know about prior discoveries in problems and solutions, esp on root causes behind multiple problems. That I keep seeing prior problems re-identified makes me think it’s true.

          So, I just wanted to make that clear as I was mainly debunking this recent myth of cache-based, timing channels being a 2005 problem. It was rediscovered in 2005, perhaps under a new focus on incidental leaks, in a field where majority of breakers or professionals either didn’t read much prior work or went out of their way to avoid it depending on who they are. Others and I studying such work also have posted that specific project in many forums for around a decade. You’d think people would’ve have checked out or tried to imitate something in early secure VMM’s or OS’s by now when trying to figure out how to secure VMM’s or OS’s. For some reason, they don’t in majority of industry and FOSS. Your own conclusion echos that problem of apathy:

          “Sadly, in the six months since this work was first quietly circulated within the operating system security community, and the four months since it was first publicly disclosed, some vendors failed to provide any response.”

          In case you wondered, that was also true in the past. Only the vendors intending to certify under higher levels of TCSEC looked for or mitigated covert channels. The general market didn’t care. There’s a reason: the regulations for acquisition said they wouldn’t get paid their five to six digit licensing fees unless they proved to evaluators they applied the security techniques (eg covert-channel analysis). They also knew the evaluators would re-run what they could of the analyses and tests to look for bullshit. It’s why I’m in favor of security regulations and certifications since they worked under TCSEC. Just gotta keep what worked while ditching bullshit like excess paperwork, overly prescriptive, and so on. DO-178B/DO-178C has been really good, too.

          Whereas, understanding why FOSS doesn’t give a shit I’m not sure on. My hypothesis is cultural attitudes, how security knowledge disseminates in the groups, and rigorous analysis of simplified software not being fun to most developers versus piles of features they can quickly throw together in favorite language. Curious what your thoughts are on FOSS side of it given FOSS model always had highest potential for high-security given labor advantage. Far as high-security, it never delivered it even once with all strong FOSS made by private parties (esp in academia) or companies that open-sourced it after the fact. Proprietary has them beat from kernels to usable languages several to nothing.

          1. 2

            I think at least in FOSS people were discussing on how timing attacks make certain protection models unfeasible, e.g. KASLR can’t do anything to protect from local users who can already run code, because the hardware is leaking timing information. JavaScript is an inherently bad idea. Not trying to desperately combat the leaks.

          2. 2

            The title of this thread belongs in a tabloid.

            We have absolutely known there are cache-timing sidechannels for an awfully long time. Here, I wrote a blog post about them in 2014:

            https://tonyarcieri.com/cream-the-scary-ssl-attack-youve-probably-never-heard-of

            But… there is simply no comparison. This attack is pretty much hands down the most sophisticated attack I’ve ever seen in my life, and I’ve seen people break into TrustZone with a single null byte overflow.

            The complexity and sophistication of this attack greatly outclasses any previous cache timing sidechannel. Period. End of story.

            1. 3

              “I’m confused: is the purpose of this thread to stroke your own ego and claim you are smarter than Tom Ptacek and Colin Percival?”

              You’d be better off asking if the purpose of Thomas’s dismissals of work that spotted and mitigated problems a decade or two ahead of him was about ego or some value to society in ignoring such work. He consistently dismisses any work I bring up in high-assurance security, about the TCSEC that produced highly-secure systems, and so on. Newcomers reading comments of highly-regarded people in security saying that TCSEC or A1-class techniques were useless or “just red tape” would (do) think those security professionals assessed that prior work, saw nothing secure came of it, and are now recommending against it. The truth is those “experts” have usually not read on that prior work, are misrepresenting it (i.e. slandering good researchers), and/or their dismissals or misrepresentations contribute to prior problems re-appearing down the road with all the damaging that brings.

              Those same people are often hypocritically saying, like Thomas in recent thread on email encryption, their security advice is about prioritizing avoiding damage to innocent parties. Yet, they’re willing to cause it by suppressing known-good techniques for… ego or social standing? That does piss me off when it happens in any field, esp INFOSEC. I do call it out. Instead of pure flamewars, I try to do so with clear arguments citing hard evidence what I say is true like landmark works talking about cache channels in 1990’s. Interesting enough, knocking out his and other people’s bullshit that way got me my early karma on Hacker News. People kept thanking me in email saying they’d do it but downvote mobs hit them after every dissenting comment. Happened to me, too, sometimes in seconds but usually reversed later in the day. So, I stayed on it (still do) for various myths or fads on these forums needing peer review.

              Far as Colin, I told him I respected him, considered his an independent rediscovery of the problem, thanked him for his FOSS work, and was clear I was knocking out misinformation (2005 claim) about when we could’ve spotted cache-based issues. Colin probably just didn’t read or see the early work since the community he entered collectively ignores that work. They’re not ignoring it for scientific reasons given the prior work was the strongest INFOSEC ever produced with methods that consistently outperformed ad hoc stuff mainstream, security community pushes. After seeing that evidence, only reason they’d collectively ignore or suppress it without qualifiers is social: politics, dominant egos, tribal signaling… something other than making us secure. I told Colin people like him having access to that information earlier might help them solve problems earlier with even more effective designs than if they don’t have access to good, prior work. It makes good researchers like him even better than they already are.

              So, I keep reinforcing the good, calling out the bad, and generally posting every piece of obscure research I can to folks that might benefit from it to facilitate serendipitous discovery across social or knowledge silos. I’m still working on better solutions to that problem but there’s group politics behind most that vary by group. Work in progress…

              “But… there is simply no comparison. This attack is pretty much hands down the most sophisticated attack I’ve ever seen in my life, and I’ve seen people break into TrustZone with a single null byte overflow. The complexity and sophistication of this attack greatly outclasses any previous cache timing sidechannel. Period. End of story.”

              I think you’re focusing too much on the developing attacks part of my comment (or these events) than the part about how prior work would help spot attacks and defend against them. The people who I told cpercival about that identified the side channels came up with methods that year to find them in models of software and hardware. There was even tooling for doing it with formal specs, like eg Gypsy Verification Environment, on software side that might have been extended for hardware: software method modeled state transistions; hardware developers often used state models, too. The researchers were saying in 1992 that about every component leaked in those systems with a need to develop leak-resistant versions of them (aka MLS-capable versions in their jargon). They said this was necessary so their VMM project running untrusted workloads side-by-side with secret ones wouldn’t allow secrets to leak to malicious apps using any of those components. Any of that sound familiar?

              So, while high-security and CompSci celebrated those works, mainstream security ignored all that. Many even mocked how “impractical” they were for daring find and fix root causes if there was a performance hit or you’d have to buy a non-Intel chip. Then, they rediscovered timing channels in caches talking big like you are now about how smart the attacks or discoveries were that nobody or few had thought about. All the security forums were talking about it. Most still ignored prior work and tech we were posting on same problems to mitigate it at every level in hardware. Mainstream folks were making mitigations that are very tactically-focused on each individual instance of leaks instead of root causes at whole-system level. As they did that, some in CompSci started building on that prior work for hardware and software mixes.

              Since mainstream researches piecemeal instead of whole-system, more of the same stuff is found in other components that a basic, covert-channel analysis would’ve found quickly. They still refuse to learn from or use those techniques as seen by fact that most people following DEFCON, etc often have never heard of them. People pushing old methods are pariahs of sorts. And now, another big problem is found that an info-flow analysis on a hardware model might have found (esp model-checking) or prevented by default simply from B3/A1/EAL6/7 designs avoiding constructions they can’t exhaustively analyze (eg AAMP7G versus ARM micros). That last part was and still is a rule in high-assurance development that keeps paying off where we assume it will screw up until we can rigorously show it can’t in all situations with good a model as we can use. Security professionals often argued with me about that, too, with many justifications for not analyzing or containing stuff with high-assurance methods.

              In summary, you’re amazed by the fact that people avoiding proven techniques for analyzing hardware interactions for information-flow leaks later discovered in performance-focused, info-sharing-oriented CPU’s that…

              1. A shared component known to be a source of leaks without leak mitigation…

              2. One or more other components with little to no leak analysis that also have no mitigations…

              3. Somehow interact to create a leak someone didn’t see coming. Mitigations might even be costly since 1 and 2 weren’t designed to do this from beginning against recommendations from the 1980’s for high-assurance design of any kind. “Can’t retrofit strong security” we say.

              Simplified like I did with Ted’s example, a known-insecure component mixing with maybe-insecure components had insecure result against user expectations it would be secure. You see why I’m not surprised even if I don’t deny the attack itself is clever? I mean, who saw problems like that coming past half a dozen people publishing in 1990-1995 saying that we need covert-channel mitigations in the CPU, memory, I/O, kernels, networking, filesystems, and multi-user apps!? Mainstream security’s focus on highlighting clever attacks instead of root-cause defense is much like those who insist on using C instead of memory-safe, system languages in apps that didn’t need it marveling at a dozen, clever ways their systems are broken by people manipulating known-bad components into new constructions that break security. We might have not anticipated the specifics of the attacks but they leverages known-unsafe primitives instead of known good (or worth trying) techniques. Developers ignored those for “performance,” “popularity,” “we hate FFI’s,” etc. Leaks in hardware/software are like C exploits with most builders telling us it’s not worth their time to look for or prevent them for decades. Then, they make exceptions for each individual attack that becomes massively popular or damaging like recent ones. Then, they fix that attack, earn creds on blog posts or conferences, and go back to whatever they were doing ignoring root causes or whole systems. In rare, rare instances we do see something like Rust get popular addressing big, root causes. Most not addressed, though.

              And I bet most of them are still not pulling prior work on spotting all leakage in hardware even now that I brought it up. They sure are talking a ton about how clever the new attack is, though, with them taking turns showing how much better each of them understand the recent reports, what they might do about just that attack ignoring analyses of whole chip, how many systems will fall, how much money will be lost, and so on. There’s you more comments motivated by ego rather than security since most aren’t contributing info to stop the next leaks.

              Those of us in high-assurance security are motivated to make secure, correct-by-construction systems with ego coming in when we do it right, esp avoiding problems. It’s so hard to prove a negative against smart attackers (even temporarily) it’s worth being proud of. That’s it. We’re obviously not trying to win popularity contests if we were droning on about covert channels since the 1980’s and CPU channels since 1990’s to a crowd whose majority dismissed us every year for it, made exceptions for some on occasion (eg 2005), and then continued dismissing. If I wanted popularity or ego, ‘d be talking… whatever’s popular in DEFCON, Black Hat, or (for money) cryptocurrencies.

              There is a minority that does listen. Even on recent posts, the votes indicate a lot of appreciation for sharing the prior work. I also put together the link above with some of the recent work on finding leaks in hardware. I also tell people regularly about simple, FOSS processors they can build and analyze themselves if Intel/AMD won’t make something to their standards. I enjoy providing people what they need to solve problems, once and for all wherever I can. It helps some of them avoid harm. I’m definitely proud and happy with doing that. I plan to do it better this year, though, since my prior style definitely needs improvement. Focus will be high-security marketing.