This seems to be largely missing the point of a TPM. The lack of programmability is not a bug, it is a feature. The device provides a set of interfaces that are generic, so the same TPM can be used with multiple operating systems easily and so that the behaviour of those interfaces can be relied on. I can have a system that dual boots Windows and Linux and will allow Linux to unlock my LUKS-encrypted root volume and Windows to unlock my BitLocker-encrypted root volume, and allow Linux to expose my SSH keys and Windows to expose my Windows Hello credentials, but without allowing either operating system to access the other’s state and, crucially, guaranteeing that compromising either does not allow the attacker to exfiltrate keys, only to perform online attacks.
Relying on DICE just punts the problem to another hardware root of trust. You can’t rely on an inductive system to prove your base case. If every machine comes with its own programmable device then I need to have some mechanism for my OS to trust ones that have been validated and, most importantly, when a bug is found in a particular version, I need a mechanism for anti-rollback so that I can guarantee that an attacker can’t bypass my security by just installing an old version of the firmware. I can’t do that validation in my OS because it relies on secure boot for integrity and a secure boot chain gives me an attestation that the only things that can compromise my boot are things earlier in the chain, so the thing I’m trying to validate is in the set of things that can compromise my chain. So now I need a root of trust that can boot my TPM replacement and can load its firmware.
It’s very easy to design a simpler alternative to a TPM if you avoid solving any of the hard problems (it’s almost certainly possible to design something better that does solve the hard problems, because TPM is, uh, not the best spec in the world). Many of the techniques in the article actually describe how TPMs are implemented, but providing direct access to the keys to the programmable bit of the RoT is something that no reputable RoT has done for a good ten years because it is terrible for security. A device built along these lines would be vulnerable to a load of power and probably timing side channels and likely to glitch injection attacks and a load of other things that are in scope for the TPM’s threat model.
The lack of programmability is not a bug, it is a feature.
As far as I understand it’s only a feature for Treacherous Computing, which requires a Root of Trust that is outside the control of the end user. If there’s one feature of the TPM I really really don’t care about, it’s this one.
Besides, you can achieve that with DICE anyway: the manufacturer can just provision a TPM-like firmware at the factory and issue a certificate for that. Users could change the firmware, but only factory-approved firmware would enjoy a factory-issued certificate.
The device provides a set of interfaces that are generic
“Generic” is not quite the right word. “Exhaustive” comes closer. The TPM for instance doesn’t have generic support for arbitrary signature algorithms. It has specific support for RSA, ECDSA, Ed25519… That’s just hard coding a gazillion use cases and hoping we covered enough to call that “generic”.
[…] and, crucially, guaranteeing that compromising either does not allow the attacker to exfiltrate keys, only to perform online attacks.
This has nothing to do with DICE. Just write firmware that don’t leak their own CDI and you’ll be okay. Which is pretty easy to do when each piece of firmware you write is focused on one single use case. Besides, manufacturers today write firmware hoping it won’t leak the root secret that is stored in fuses. They have the exact same problem you seem to criticise DICE for, only worse: if the root secret is leaked and it can’t be re-fused, the chip is scrap.
Relying on DICE just punts the problem to another hardware root of trust. You can’t rely on an inductive system to prove your base case.
But we do have a base case: it’s the bootloader, that derives the CDI from the root UDS and the firmware it loads. That bootloader is tiny and provided by the manufacturer. Surely they can shake out all the bugs from a C program that hardly requires a couple hundred lines? At these sizes even the compiled binary could be audited.
Firmware with actual functionality is another matter, but as I said in the article, they’re easier to make than one generic firmware to address all use cases. And if one has a bug that leaks its own CDI, we can correct the bug and the new version will have its new uncompromised CDI.
If every machine comes with its own programmable device then I need to have some mechanism for my OS to trust ones that have been validated
Ah, Treacherous Computing. As I’ve said, just add that capability to the manufacturer’s default firmware. That way the OS can check the firmware’s key against the manufacturer’s certificate.
Well, strictly speaking the OS is a bit late to check anything, and Secure Boot doesn’t work to begin with. If however the hard drive is encrypted by keys derived from the CDI, decryption is only possible when the approved firmware runs (else we get useless keys). Then the OS running at all is proof that we were running the correct firmware, and if the DICE chip is integrated into the main CPU (so MitM is not possible), its manufacturer issued firmware can refuse to give up the encryption keys if the bootloader isn’t right.
most importantly, when a bug is found in a particular version, I need a mechanism for anti-rollback so that I can guarantee that an attacker can’t bypass my security by just installing an old version of the firmware.
Assuming updates are even a thing… old firmware let the new firmware in, checks it is signed by the manufacturer, checks that it is a newer version, new firmware is swapped in, new public key (yeah, the CDI changed, so…) is sent to the manufacturer (on a secure channel of course), and the manufacturer issues a new certificate.
To prevent rollbacks my first thought would be a persistent increment-only hardware counter, that would be hashed together with the CDI instead of using the CDI directly. If every firmware version does this, incrementing the counter instantly change the keys of older versions. With the wrong keys they’re no longer approved, and rollback attempts are detected as soon as we check the certificate. We don’t even have to explicitly revoke old keys with that approach.
That was the first idea that popped up, we can probably do better.
It’s very easy to design a simpler alternative to a TPM if you avoid solving any of the hard problems
The problems the TPM solve aren’t hard, they’re many.
Hardware-wise, manufacturers of current secure elements can easily add DICE capability at no loss of security. They have the hardware secret sauce to do it. Unlike Tillitis, they don’t have to use an FPGA whose only threat model was making a token effort to protect their customers’ proprietary bitstream from IP theft.
Software-wise, DICE automatically makes thing easier by (i) allowing us to only address the uses cases we care about, and (ii) addressing them separately. Even if end users are derps, the manufacturer can publish a bunch of official firmware for the most popular use cases. They’re already doing that after all.
providing direct access to the keys to the programmable bit of the RoT is something that no reputable RoT has done for a good ten years because it is terrible for security
Which is why DICE does not do this. DICE firmware does not, I repeat, does not access the root key of the device. Only the tiny bootloader does. The main firmware only get access to a derived key that is specific to it, and it alone. Compromising the root key of the device from the main firmware is flat out impossible.
Sorry for re-stating the obvious, but it’s hard to interpret what you just wrote under the assumption that you understood that, I’m not sure what’s your point here.
A device built along these lines would be vulnerable to a load of power and probably timing side channels and likely to glitch injection attacks and a load of other things that are in scope for the TPM’s threat model.
No, you’re just talking out of your ass here. You need to explain how data can flow from secrets to the side channels with the DICE approach, in a way that they do not with the regular approach. You also need to be aware of the actual threat model: the timing side channel is almost always relevant, but in practice easy to address — even Daniel J. Bernstein said so. Glitch attacks and power analysis are harder, but they require physical access and as such are out of many threat models. They’re important when the user is the enemy, but you know what I think of Treacherous Computing.
For instance, it’s reasonable to assume that under the normal operation of the HSM, its immediate environment is trusted enough not to purposefully inject glitches or analyse power. It’s not always a valid assumption, but not everybody is making a credit card to be plugged into many untrusted terminals.
Then there’s the question of what can be extracted at which point. The bootloader is fixed, and hashes the program and secret together. We already know of ways to mitigate the power side channel, it’s to hash the UDS in its own block, and hash the rest of the program starting on the next block. (Ideally we’d want a constant power hash, but for RAX designs, constant power addition is very expensive, so let’s settle for the mitigation instead.)
Glitching the loading of the program would just yield a different CDI, not sure we can do much there. And once the program is loaded, access to the UDS is closed off, so no amount of power analysis and glitching can shake that off (assuming adequate protection of the latch, but that should be easy, compared to protecting the integrity of the entire program’s runtime state.)
As far as I understand it’s only a feature for Treacherous Computing, which requires a Root of Trust that is outside the control of the end user.
No, any form of security requires the root of trust to be out of control of the attacker. The attacker is assumed to be able to compromise the later stages of execution because the DICE model is based on inductive proofs and gives you guarantees only about the base state of a system that has unbounded states later. Each step is, effectively, promising not to do certain bad things (e.g. GRUB promises not to lie about the kernel that it has loaded) but once you get to a general purpose OS you can run arbitrary code. Once you get to this point, you lower layers are expected to lock down the system such that an attacker cannot do specified bad things. For example, the TPM may have unlocked certain keys (often implicitly by using a KDF over a secret mixed with some PCR values) but does not permit that key to be exfiltrated, so an attacker who compromises my OS cannot steal my SSH keys or disk encryption keys (which means that they can do online attacks during their compromise).
This has nothing to do with DICE. Just write firmware that don’t leak their own CDI and you’ll be okay. Which is pretty easy to do when each piece of firmware you write is focused on one single use case. Besides, manufacturers today write firmware hoping it won’t leak the root secret that is stored in fuses.
That is absolutely untrue for the hardware RoTs that I’ve worked with. They have fixed function hardware that stores keys and expose a set of operations on them (including ACLs that authorise what operations a key can be used for and what derived keys can be used for). They run the TPM stack on the programmable core, but they assume that this can be compromised. An attacker who compromises the TPM stack has very little more access than someone who compromises the first-stage boot loader on the host.
Assuming updates are even a thing… old firmware let the new firmware in, checks it is signed by the manufacturer, checks that it is a newer version, new firmware is swapped in, new public key (yeah, the CDI changed, so…) is sent to the manufacturer (on a secure channel of course), and the manufacturer issues a new certificate.
Okay, so now you need your device to be able to make network connections or you need to rely on the OS to provide that channel. Now you have a very wide attack surface. Being able to support this use case was actually one of the motivations for CHERIoT because it isn’t feasible with existing hardware within the threat models of people who rely on TPMs.
Glitch attacks and power analysis are harder, but they require physical access and as such are out of many threat models. They’re important when the user is the enemy, but you know what I think of Treacherous Computing.
Okay, I think this is where we disagree. The primary use case for a TPM, for me, is protecting my data if my machine is stolen. There’s no point using disk encryption if an attacker also has access to the key. You seem to think that DRM (‘repeating the phrase ‘Treacherous Computing’ does not inspire confidence, you seem to use the phrase to dismiss all of the things that people have done to improve security by assuming malicious intent) is the only threat model where attackers have physical access. I think that attackers with physical access (whether that’s via an ‘evil maid’ attack or outright theft) or who have compromised my OS are the only use cases where I want a hardware root of trust. If a system isn’t robust in those two cases, then it fails my threat model. By moving access to keys and crypto algorithms out of fixed-function units and my moving more of the code on the programmable core into the TCB, you make it harder to defend against this threat model. I’m not sure what your threat model actually is, you might have a good solution for it, but it isn’t the one that I care about.
any form of security requires the root of trust to be out of control of the attacker.
Okay.
The attacker is assumed to be able to compromise the later stages of execution because the DICE model is based on inductive proofs and gives you guarantees only about the base state of a system that has unbounded states later.
I don’t like this induction analogy, but even then we do have a base case: the DICE bootloader that is a fixed function. The firmware on top has its own key and therefore can be integrated in the root of trust (one with actual functionality this time), and, well… just like a TPM it’s not supposed to reveal its own secrets.
If we can’t do bug-free firmware, we’re kinda doomed anyway. Speaking of which…
That is absolutely untrue for the hardware RoTs that I’ve worked with. They have fixed function hardware that stores keys and expose a set of operations on them (including ACLs that authorise what operations a key can be used for and what derived keys can be used for).
You feel like you’re contradicting me, but I feel like you’re making my case for me: those fixed function they wrote, they have to make sure they don’t have any bug that would expose the secrets, right? This is exactly analogous to the firmware I spoke of. Nothing stops this firmware to provide fixed functions to the untrusted host…
…unless you need the variable functions on top to be trusted anyway, kind of security by layers. We’d need something like a 2-stage execution environment, where the first can set up some functions, and the second is denied access to the CDI of the first stage, but can issue commands to it nonetheless. I guess DICE isn’t enough for that. Whether this complexity is worth the trouble is another question though.
The primary use case for a TPM, for me, is protecting my data if my machine is stolen.
I use a password for that. It’s easier for me to trust a secret that’s not even on my machine. (Though ideally I’d use a password and a secure element.)
I think that attackers with physical access (whether that’s via an ‘evil maid’ attack or outright theft) or who have compromised my OS are the only use cases where I want a hardware root of trust.
Protection against theft should be straightforward. Evil Maids however are very powerful: a key logger, something that intercepts the video signals… I’m afraid those would be hard to detect, and mitigating them would pretty much require every major component of the laptop to contain a secure element so all communications between component can be encrypted and authenticated. This is no picnic.
My, if I ever come to actually fear an Evil Maid, there’s no way I’m leaving my laptop at the hotel.
By moving access to keys and crypto algorithms out of fixed-function units and my moving more of the code on the programmable core into the TCB, you make it harder to defend against this threat model.
Only to the extent the firmware is more likely to have bugs than fixed-function units. Keeping the firmware small and specialised improves our odds dramatically. Because of course, the attacker can’t change DICE firmware without changing the keys. If they do, the disk simply won’t decrypt. Their only chance is to exploit a bug in the firmware.
I don’t like this induction analogy, but even then we do have a base case:
Then you probably don’t like DICE, since the entire model is based on inductive proofs of security.
the DICE bootloader that is a fixed function. The firmware on top has its own key and therefore can be integrated in the root of trust (one with actual functionality this time), and, well… just like a TPM it’s not supposed to reveal its own secrets.
I am really struggling to understand how the security claims that you’re making map back to things that the security proofs about DICE give you.
You feel like you’re contradicting me, but I feel like you’re making my case for me: those fixed function they wrote, they have to make sure they don’t have any bug that would expose the secrets, right?
Right, and they do that by being implemented in a substrate that is amenable to formal verification, including side-channel resistance. In particular, it has no arbitrary control flow (everything is dataflow down wires, control signals are data), and it has almost no dynamic dataflow (values can flow only to places where there are wires). Doing the same thing on a general-purpose programmable core that does not provide hardware memory safety is vastly harder. It is easy to validate that keys cannot be used for the wrong purpose because those accesses have values coming from the relevant bit in the ACL and are anded with that value. There is no way of getting a key except via a wire from the register file that contains the key.
…unless you need the variable functions on top to be trusted anyway, kind of security by layers. We’d need something like a 2-stage execution environment, where the first can set up some functions, and the second is denied access to the CDI of the first stage, but can issue commands to it nonetheless. I guess DICE isn’t enough for that. Whether this complexity is worth the trouble is another question though.
And then you decide that you need to protect two different things at the higher level from each other. And two different things at the lower level. And now you have TrustZone and two privilege modes. Only now you have shared resources between different trust domains, so you have side channels. Again, there’s a reason people stopped building hardware RoTs like this! The TPM is intentionally inflexible because as soon as you start allowing programmable functionality you need multiple trust domains, as soon as you have multiple trust domains you need both fine-grained sharing and strong isolation and those are both hard problems.
I use a password for that. It’s easier for me to trust a secret that’s not even on my machine.
So you enter the password into your bootloader and now it’s in memory. Anyone who manages to compromise your OS once can exfiltrate it. The entire point of a security coprocessor is to ensure that keys are not accessible to someone who compromises your OS.
But, again, you haven’t given me a threat model. It seems like you’re happy with the level of security that you get from no hardware and you’re unhappy with the level of security that you get from a TPM, but somehow want something different. You’ve written a crypto library, so I presume you know what a threat model looks like: what capabilities to you assume your attacker has? What are your attackers’ goals? What tools will they be able to construct from their base capabilities towards that goal? How do you prevent them from doing so? If you start from a threat model, we can have a reasonable discussion about whether your approach addresses it, but it seems like you’re starting from a bunch of technologies that people building secure elements stopped using ages ago because they aren’t sufficient for the threat models for these devices, and saying that they’re a better way of building secure elements.
it seems like you’re starting from a bunch of technologies that people building secure elements stopped using ages ago
To be honest I don’t believe you. Please give me evidence.
I am really struggling to understand how the security claims that you’re making map back to things that the security proofs about DICE give you.
Okay, it’s simple: you start with two things:
Trusted DICE hardware/bootloader
Trusted firmware to load on top
You load the firmware on the hardware, and this hardware+firmware pair constitutes an HSM. That’s a bit strange, but strictly speaking DICE hardware is not by itself an HSM. It only becomes so when loaded with a particular piece of firmware. (In practice the bootloader alone doesn’t really count, since it doesn’t provide any functionality users want. Users want actual cryptographic services, loading firmware is only a means to that end.) Anyway, with trusted hardware and trusted firmware, you now have a trusted HSM. Ask for its public key, and register that as “trusted”.
Now let’s say you have those 3 things:
A trusted public key
An untrusted piece of DICE-looking hardware
An untrusted piece of firmware
Testing whether you can trust this hardware+firmware pair is simple:
load the firmware into the hardware (you now have an HSM).
Demand proof from your HSM that it has the private key matching the trusted public key.
And that’s it. Notice the similarity between that and a fixed-function HSM: it’s the same, you just need to start by loading firmware you don’t trust yet. The actual verification after that is exactly the same as for a classic HSM.
Of course this all hinges on the guarantee that changing the firmware unpredictably changes the CDI. That we trust the hardware+bootloader to provide that guarantee. If you have reasons to distrust this guarantee I’d like to know why.
Now what about Dragons?
Right, and they do that by being implemented in a substrate that is amenable to formal verification, including side-channel resistance.
You’re clearly assuming that firmware is not amenable to formal verification. This is blatantly false: even regular cryptographic libraries can be formally analysed, and some have been. Including side channel resistance. And since firmware runs on a specified piece of hardware, that verification is even easier.
In particular, it has no arbitrary control flow (everything is dataflow down wires, control signals are data), and it has almost no dynamic dataflow (values can flow only to places where there are wires).
No, you’re making a category error there. Programs (barring the self modifying relics) have a fixed control flow, just like hardware. When we talk about arbitrary control flow we don’t talk about programs having it, we’re talking about our ability to make or load arbitrary programs. Once the program is chosen the control flow is frozen.
Same thing for the dynamic flow: cryptographic libraries have almost no dynamic flow, and that’s what make them secure against the most important side channels. So while some pieces of firmware can have dynamic data flow, not all of them have, and obviously a sane cryptographic engineer would only trust the ones that has as little of it as is reasonable.
Doing the same thing on a general-purpose programmable core that does not provide hardware memory safety is vastly harder.
Again a category error, I believe. This is not about guaranteeing anything about arbitrary firmware, we want to guarantee that a particular piece of firmware is void of memory errors (among other errors). Hardware memory safety tricks like ASLR or executable XOR writeable help, but you can completely sidestep the problem and write your firmware in Rust.
What you call “vastly harder” is at worst a speed bump.
To be honest I don’t believe you. Please give me evidence.
Most of the details of these things are protected by NDAs, but if you let me know which manufacturers you’ve talked to I can point you in the right direction in their designs, if we’ve worked with any of the same people.
Demand proof from your HSM that it has the private key matching the trusted public key.
That works but only because you’re sidestepping all of the hard problems. As a user, I don’t care about the code identity I care about the guarantees. That’s always the hard bit in any attestation system and if you tie software on the host to specific versions of the RoT then you end up with something very fragile. In the RoT designs I’m familiar with, this is sidestepped entirely as a problem. In a TPM, the useful guarantees I get are all about negative properties: it’s not programmable and so I know you cannot do things that are not exposed, I don’t have to prove a completeness property over some general-purpose code (ignoring firmware TPMs, which were the worst idea ever and were all broken by Spectre if not before).
You’re clearly assuming that firmware is not amenable to formal verification. This is blatantly false: even regular cryptographic libraries can be formally analysed, and some have been.
Yes and no. I’ve worked with the EverCrypt team and they got further than most. When we tried to use their code in production, we discovered that their proofs of temporal safety were holding only because they never freed memory (not great for a resource-constrained environment). Those proofs also depend on compiling with CompCert and may or may not hold with other compilers. Most importantly, they hold only if no code in the system that they are linked with ever hits undefined behaviour. If, for example, you have a memory safety bug in the parser for mailbox messages from your host, none of the proof in EverCrypt hold because you have just violated one of their axioms.
With the Low* work, they are able to prove that the implementations are constant time, in conjunction with a modest microarchitectural model. They are not able to prove anything about power because most ALUs have data-dependent power consumption. The techniques used to harden against these in hardware (e.g. running identical pipelines in parallel that that XOR’d inputs so that the power is always uniform) are simply not expressive in instruction sets. The Low* based proofs also depend on their being no speculative execution (if I can train a branch predictor to go to the wrong place, it doesn’t matter that there are no timing leaks in the correct path) but that’s probably fine.
Programs (barring the self modifying relics) have a fixed control flow, just like hardware. When we talk about arbitrary control flow we don’t talk about programs having it, we’re talking about our ability to make or load arbitrary programs. Once the program is chosen the control flow is frozen.
No, pretty much every program will use the branch and link register instruction. Proving anything about control flow is now a data flow problem: you have to prove that the set of values that will reach the input of that instruction is constrained. This is possible only by assuming that you have constrained memory access, so you’re returning to the ‘memory safety is an axiom’ world. Now, with CHERIoT, we can make that guarantee, but you won’t find it in any other embedded ISA.
So while some pieces of firmware can have dynamic data flow, not all of them have, and obviously a sane cryptographic engineer would only trust the ones that has as little of it as is reasonable.
This only helps the verification if you have strong compartmentalisation. Again, not something provided by most non-CHERIoT embedded systems.
Again a category error, I believe. This is not about guaranteeing anything about arbitrary firmware, we want to guarantee that a particular piece of firmware is void of memory errors (among other errors). Hardware memory safety tricks like ASLR or executable XOR writeable help, but you can completely sidestep the problem and write your firmware in Rust.
ASLR is totally inappropriate, it barely works when you have a full 32-bit address space, you have no chance of getting useful levels of entropy doing it without an MMU. Even if it did work, it’s a probabilistic defence and so is inappropriate for building a foundation for formal verification. Immutable code is the default on embedded devices (many are Harvard architecture) but you’re still vulnerable to code reuse. You can write in Rust, but you can’t verify anything about timing in Rust because the abstract machine isn’t in Rust. Anything involving your assembly routines and anything involving I/O will be in unsafe Rust code, so there’s still scope for bugs even if you verify all of the safe Rust code (Rust verification tools are improving but they’re nowhere near the level that you’d require for a system like this).
I say vastly harder based on experiences of teams that I have worked with. They have shipped hardware RoTs and formally verified crypto implementations. The software verification effort took more people and took longer. The hardware RoT was deployed, at scale, to people with physical access some of whom had a large financial incentive to break it. It is still doing fine with no compromises. The verified crypto implementation was broken within a week of internal testing by finding bugs in the integration with the surrounding system. I consider the thing that multiple vendors have done and that has stood up to attacks in production vastly easier than the thing that hypothetically could be done but has not actually worked any time someone has tried it, but maybe that’s just me. If you want to prove me wrong and ship a formally verified firmware stack for a RoT that is resistant to timing and power side channels and glitching attacks, I’d be very happy to see it: even if I never use it, the tools that you’d have to build along the way would be game changers for the industry.
But you still haven’t told me what problem you’re trying to solve or what your threat model is.
Most of the details of these things are protected by NDAs
Fuck, I forgot about that. Evidence not available, then. Fuck. Still, I care less about implementation details than I care about the user-visible interface. Do those poor sods also put these under NDA? Like, they sell a chip with an ISA, and then they tell you “this ISA is a corporate/state secret, sign the NDA please”?
I bet they do. Fuck them.
But you still haven’t told me what problem you’re trying to solve or what your threat model is.
Same threats as anyone else. I’m just trying to have something simpler achieve the same goals. Because as a user, I just can’t stand these huge TPM specs. Specs that apparently had a critical vulnerability, discovered March of this year. Not in the hardware, not in the software. In the spec.
I say vastly harder based on experiences of teams that I have worked with.
I have an hypothesis: software teams are drooling incompetents compared to hardware teams. Maybe it’s less about the actual difficulty of software, and more about how hardware teams understand the stakes of what they’re doing and are trained (and tooled) accordingly, while software teams simply don’t and aren’t. I still remember this blog post by someone who worked with hardware teams, and then noticed how software teams just didn’t know how to test their stuff.
you can’t verify anything about timing in Rust
I’m aware of at least one language that can (forgot the name, it’s designed specifically for cryptography), and I believe cryptographic libraires have been written in it. Good point about having to verify the compilers as well though. That obviously has been done, though whether it was done with the relevant verified language(s) I don’t know.
No, pretty much every program will use the branch and link register instruction.
Jump and link to a hard coded constant. Unless indirect calls are involved, but that would typically mean the main program calls arbitrary code… Could be used internally of course, and we’d found we could still account for the jump and link destination… or just avoid indirect calls indirectly, if that makes verification easier.
Now if it’s something as trivial as “call foo if condition1, call bar if condition 2”, well… first of all that’s not constant time, but let’s say we’re doing signature verification and don’t care about leaking data just yet: a fixed function hardware equivalent would do exactly the same. How could it not, at some point there is stuff to do.
The techniques used to harden against these in hardware (e.g. running identical pipelines in parallel that that XOR’d inputs so that the power is always uniform) are simply not expressive in instruction sets.
Which means constant power can’t be achieved in software, I’m aware. This is a definite disadvantage, and it does exclude some use cases (like credit cards). I maintain that you don’t always need to address the energy side channel. Or even electromagnetic emissions.
That works but only because you’re sidestepping all of the hard problems. As a user, I don’t care about the code identity I care about the guarantees. That’s always the hard bit in any attestation system and if you tie software on the host to specific versions of the RoT then you end up with something very fragile. In the RoT designs I’m familiar with, this is sidestepped entirely as a problem.
How as a user would you have any guarantee? There’s only one way I’m aware of, it goes in 2 steps:
Trust the manufacturer to give you a secure HSM (any guarantee you may hope for is entirely dependent on how trustworthy the manufacturer actually is).
Use public key cryptography to verify the identity of that HSM.
That’s true of any HSM, DICE or not. And what are you on about with “tying to a specific version of the RoT”? Isn’t that what happens anyway? Doesn’t the manufacturer have to sign the entirety of the chip, firmware included, regardless of the approach taken? The only additional difficulty I see with DICE is that since changing the firmware changes the keys, updates to the firmware are harder. But that’s quite moot if the alternative is fixed functions etched in immutable hardware: can’t update those at all.
But then DICE does an advantage: someone other than the manufacturer could write the firmware and vouch for it. A downstream user (or trusted secondary shop) can then load the firmware and register its keys.
In a TPM, the useful guarantees I get are all about negative properties: it’s not programmable […]
Neither is DICE firmware. A dice HSM is not programmable by default, it has a “fixed-function” firmware, identical except in implementation details to fixed-function hardware.
One could load arbitrary firmware onto their DICE hardware, but only one such firmware has been signed and can be checked for identity. It can’t be swapped for anything else. Programs aren’t general purpose, the CPU running them is. Freeze the program and you get a special purpose appliance.
(ignoring firmware TPMs, which were the worst idea ever and were all broken by Spectre if not before)..
Perhaps, but since fusing the TPM to the main execution chip is the only way to have actual hardware security… that can be better than a discrete chip, even one that has a level-magical hardware security.
Fuck, I forgot about that. Evidence not available, then. Fuck. Still, I care less about implementation details than I care about the user-visible interface. Do those poor sods also put these under NDA? Like, they sell a chip with an ISA, and then they tell you “this ISA is a corporate/state secret, sign the NDA please”?
The ISA for the programmable part is often open but that’s not the interesting bit. The interfaces to the fixed-function units and the protection models are often NDAs. Some of the security also depends on exactly what you do in your EDA tools. Apparently if you try to fab Pluton with the default Cadence config it happily optimises away a load of mitigations. RoT vendors don’t like talking about their security features in public because it paints a target on them, they’re happy to talk about them to (potential) customers.
That’s less true with things like OpenTitan or Calyptra, but I haven’t looked at either in detail. I am pushing for both to adopt the CHERIoT Ibex as the programmable component because it enables some quite interesting things. We can statically verify (in the binary) which compartments in a firmware image have access to which bits of MMIO space, so we can allow (for example) an Intel-provided compartment to have access to back-door interfaces to the CPU that let it attest to CPU state, but not have access to the key storage, and that no component except the Intel-provided compartment has access to this bit of the state, so only Intel can compromise the security of the Intel processor using the RoT.
Same threats as anyone else.
So, to be clear, the attacker capabilities are:
Has physical access.
Can control power to the device.
Can retry things an unlimited number of times and measure timing.
Can retry things an unlimited number of times and measure power.
Can try installing other firmware versions and then run anything that the device does not prevent.
The attacker’s goal is to exfiltrate a key that will allow them to produce an emulation of the device that is indistinguishable from the real device, from the perspective of software.
The requirements for software are such that I must be able to:
Identify that the device is the device that it thinks it is.
Perform signing and encryption operations from a host without access to keys held in the device.
Restrict what keys can be used for which purposes.
Upgrade the firmware of the device without losing access to keys.
Share the device between mutually distrusting hosts without granting any of them the ability to impersonate another (including accessing keys that are private to that host but sharing ones that are private to the device owner).
Upgrade the firmware of the device on one system without another system losing access to its keys.
I probably missed something.
But it sounds like you also want to be able to run arbitrary code on the device to perform OS-specific functionality (again, this is actually one of the target use cases for CHERIoT because it’s really hard).
I’m just trying to have something simpler achieve the same goals. Because as a user, I just can’t stand these huge TPM specs. Specs that apparently had a critical vulnerability, discovered March of this year. Not in the hardware, not in the software. In the spec.
I don’t think we disagree that TPM is a clusterfuck, but I think it’s an interface that you can run on a secure system (we’ve actually run the TPM reference stack on a CHERIoT implementation), I just don’t think that you’re solving most of the problems that I want a RoT to solve.
I have an hypothesis: software teams are drooling incompetents compared to hardware teams
Absolutely not in this case. The folks on the EverCrypt project were some of the smartest people I’ve met (the kind of people I’d trust to build the theorem prover that I depend on, not just the kind that I’d trust to use it correctly).
AMD and Intel routinely mess up security critical things.
I’m aware of at least one language that can (forgot the name, it’s designed specifically for cryptography), and I believe cryptographic libraires have been written in it.
F* / Low*, which is what EverCrypt used.
Good point about having to verify the compilers as well though. That obviously has been done, though whether it was done with the relevant verified language(s) I don’t know.
Kind of. Formally verified compilers come with a lot of caveats. CompCert, for example, does not guarantee anything if your input code contains any undefined behaviour. This is fine for F*, which generates C code from an ML dialect that guarantees no UB, but makes CompCert pretty useless for C code written by humans. Even then, it’s not completely clear how well this works because both F* and CompCert have a formal model of C semantics but it might not be the same formal model of C semantics and any mismatch can invalidate the proofs.
Formal verification for hardware also comes with some caveats. My favourite hardware security PoC was a TrustZone exploit in a verified Arm core. They ran two wires close together so that if you rapidly toggled the value in a register you’d induce a current in a wire that led to the S state bit and would let you enter S state from unprivileged code. The core was correct at the RTL layer but not correct with respect to analogue effects. RoT designs typically also include mitigations at these layers but they have to be designed based on the specific circuits and it’s really hard to put them in general-purpose cores. The threat model for an AES engine is much easier to express than the threat model for an add instruction (are either of the operands of the add secret? It depends on the surrounding code. Is it preferable for a glitch to give a lower or a higher value for the result of an add? It depends on the surrounding code).
This is why you typically put the programmable logic outside of the trust. The bit that runs the TPM stack is not part of the TCB for key confidentiality or integrity. It is responsible for some bits of PCR state manipulation and command parsing, but if you compromise it then you still can’t exfiltrate keys. You might be able to add different things to PCR state, but that’s about it.
Jump and link to a hard coded constant. Unless indirect calls are involved, but that would typically mean the main program calls arbitrary code
Return instructions are also computed jumps (jump to the link register). A single stack bug can hijack control flow this way. Hence memory safety being a prerequisite for CFI.
a fixed function hardware equivalent would do exactly the same. How could it not, at some point there is stuff to do.
It’s about the negative cases. It’s easy to verify that a key stored in a key register never travels somewhere else: don’t put wires anywhere other than the input to the crypto engines. You can statically enumerate all of the possible dataflow paths. It’s much harder to verify that a value stored at some location in memory is never read by anything that can lead to I/O because you have to ensure that no load instruction that leads to I/O can read that address. That’s a global alias analysis problem.
How as a user would you have any guarantee? There’s only one way I’m aware of, it goes in 2 steps:
It’s a hard problem. It’s simpler if you can enforce constraints. If the TCB logic is fixed function, your proof obligations on the programmable bit are much less. Ideally, the programmable bit is in the TCB for availability and not confidentiality or integrity, so you just punt on it entirely. Now you have a much simpler problem of getting an attestation over the device, rather than the device plus a software stack. The attestation is a claim from the manufacturer that it provides some security guarantees.
That’s what I mean about the guarantees. Identity is a building block for attestation, it’s not sufficient. The important thing is that manufacturer X makes claims about software Y running on hardware Z. If these claims are untrue, you can sue them. That’s what you’re getting from attestation: accountability that you can use to build legal liability. In the worst case, you need to universally qualify these claims over all values of Y because the OS doesn’t want to carry a list of valid firmware versions. Often, you make slightly weaker claims that rely on the ability to have monotonic versions of Y (e.g. ‘I promise that this gives security guarantees as long as you’re running the latest firmware and, if there are security bugs in the firmware I will fix them within a week’), which is where you get the requirements to be able to do secure update with anti-rollback protection.
But then DICE does an advantage: someone other than the manufacturer could write the firmware and vouch for it. A downstream user (or trusted secondary shop) can then load the firmware and register its keys.
DICE (with slight tweaks) is used in most of these things already. The problem is not the firmware, it’s the space of things that a firmware image can possibly do and how you get a guarantee that your firmware is secure in the presence of all of the capabilities listed for attackers above. And it’s about ensuring that things like:
If I find a bug in firmware version X, I can’t then deploy firmware version X to get the keys from someone who wants to be running version X+1.
If I find a bug in firmware version X, I can’t extract all of the keys before the user has a chance to upgrade to X+1.
Those are the hard problems. DICE is a small part of solving them, it is not sufficient by itself.
Imagine I have one of these devices and I have an emulator for one that gives me complete visibility into all of the secret keys. How do you enable me to move the keys from firmware X on the real device to firmware X+N on the real device but not to:
Firmware X-1 (which has a known vulnerability that leaks keys) on the device.
Firmware X on the emulator.
Firmware X+N on the emulator.
DICE is an important building block for enforcing this kind of policy, but it’s nowhere near sufficient by itself, and without being able to enforce that policy you have no security for your secrets and may as well just store them in the kernel.
This gets even harder when you’re using it as part of your secure boot chain (which is what I want from a root of trust) because any change to the DICE signature for the RoT’s firmware will change the PCR values for the next steps of the boot. I can’t rely on code that loads after the RoT’s firmware to verify its attestation because it is able to tamper with any of that code, so I must be able to provide an attestation over trust properties from the RoT initialisation that I can use for the PCR values, not an attestation simply of identity.
Perhaps, but since fusing the TPM to the main execution chip is the only way to have actual hardware security… that can be better than a discrete chip, even one that has a level-magical hardware security.
This is why the good designs have a separate core in the same package as the main chip. If you want to see it done really well, take a look at the Xbox One. There’s not much public, but you can probably find people who have reverse engineered bits of it.
Looks like we’re converging. Thanks for the detailed reply.
So, to be clear, the attacker capabilities are:
Has physical access.
Can control power to the device.
Can retry things an unlimited number of times and measure timing.
Can retry things an unlimited number of times and measure power.
Can try installing other firmware versions and then run anything that the device does not prevent.
Well if the NSA gets a hold of Snowden’s computer I guess that’s what we have. I do reckon under this model though that there’s no escaping the energy side channel. Which kind of means masking everything that ever accesses the secrets, and… my that’s expensive. And we can probably forget about RAX designs (I’m told masking addition is expensive), and stick to stuff like AES and SHA-3 instead.
One thing that seems impossible to mitigate though is bignum arithmetic. How can we deal with elliptic curves or RSA when everything must draw constant power? Do we switch to curves with binary fields? I’m told the security literature is less sold on their mathematical security than prime fields, but that’s the only way I can think of to avoid multiplying stuff.
Or we bite the bullet and make a general purpose CPU where every operation that is naturally constant time, is masked to be constant energy as well. Addition, multiplication, bit manipulation… all constant energy, guaranteed by the CPU. And while we’re at it stay in-order and remove speculative execution, cryptographic code is unlikely to benefit from out of order speculative cores anyway. Making a CPU like this is probably a bear, but this could have the tremendous advantage of making constant time code automatically constant energy.
I won’t dispute that is a hard problem, and I understand the appeal to limit constant energy hardware to specialised operations instead.
The requirements for software are such that I must be able to:
Identify that the device is the device that it thinks it is.
Perform signing and encryption operations from a host without access to keys held in the device.
Restrict what keys can be used for which purposes.
Upgrade the firmware of the device without losing access to keys.
Share the device between mutually distrusting hosts without granting any of them the ability to impersonate another (including accessing keys that are private to that host but sharing ones that are private to the device owner).
Upgrade the firmware of the device on one system without another system losing access to its keys.
I’m not sold on everything here. Specifically:
Restrict what keys can be used for which purposes.
If the device derives one key pair for encryption, and another key pair for signature I kind of get your wish. But if I want to perform signatures for Debian packages and personal emails, I could allow users to use different keys if I accept a domain separation string, but I can’t prevent them from using the same domain separation string for two different purposes.
Also, isn’t the guarantee that different domain separation strings causes us to use unrelated keys enough?
Upgrade the firmware of the device without losing access to keys.
Upgrade the firmware of the device on one system without another system losing access to its keys.
It’s a nice to have if firmware can’t be expected to be bug free, but I’m not entirely sure which use case requires that not only the keys are preserved, but the upgrade can happen in an hostile and offline environment. For instance, if we have encrypted secrets and need to change the firmware, we could decrypt the thing, change the firmware, then encrypt the thing back. That simple procedure only works in a trusted environment, but I’m not sure how much of a show stopper this really is.
The ISA for the programmable part is often open but that’s not the interesting bit. The interfaces to the fixed-function units and the protection models are often NDAs.
Protection models are implementation details, so that’s not too bad. An open ISA is good. The interface to the fixed-function units however is part of the ISA: the firmware needs to call it one war or another.
I would almost forgive the vendor for not disclosing the exact algorithm used under the hood. Is it AES-GCM? ChaPoly? I don’t really care if I know where to put the plaintext, where to retrieve the ciphertext, and how big the authentication tag is. But I do need to know how to interact with the unit, so I hope this particular part is not under NDA.
But it sounds like you also want to be able to run arbitrary code on the device to perform OS-specific functionality
Yeah, I want the end user to have a choice what firmware they use. The primary use case would be the ability to use different cryptographic primitives (the old one is broken, or the new one is faster, whatever).
I also want to enable end users to load arbitrary firmware they would then trust on first use. Yes, this means their entire computer, including the insecure parts, is assumed trusted as well. But there’s a difference between trusting your computer now, and trusting it for 2 years straight working abroad and leaving your laptop at the hotel. It’s a freedom/security trade-off mostly.
I don’t think we disagree that TPM is a clusterfuck
Thanks. But there’s something more fundamental about it, I think: the TPM specifically supports a lot of cryptographic primitives, and a lot of use cases. Because it kinda has to: too many users to satisfy there. Doing something similar for a single customer having a precise use case in mind would automatically divide the size of the specs by a couple orders of magnitude.
At the same time though, the generality (if not genericness) of the TPM is desirable, and DICE seems to be a nice way to keep that generality while keeping things simple. Well, simple if you punt on the firmware, at some point some one does have to write bug-free code. Hence the “which is harder, software or hardware?”. I’ve always thought the two were comparable, and bet on tailored software being much smaller than kitchen-sink hardware, and as such, proportionally easier.
If I find a bug in firmware version X, I can’t then deploy firmware version X to get the keys from someone who wants to be running version X+1.
It’s kind of cheating, but DICE gives you that out of the box: firmware X and X+1 are different, so they get different keys. One could load firmware X and exploit a bug to get its keys, but this wouldn’t reveal the keys of X+1.
If I find a bug in firmware version X, I can’t extract all of the keys before the user has a chance to upgrade to X+1.
Depend what you mean. Keys accessible from X are toast, and there’s no way the user can perform an upgrade from X to X+1 in an untrusted environment. But if user can consider themselves “safe enough” to trust the environment around the chip, they can perform the upgrade and just assume no MitM screws up the update process.
How do you enable me to move the keys from firmware X on the real device to firmware X+N on the real device but not to:
Firmware X-1 (which has a known vulnerability that leaks keys) on the device.
Firmware X on the emulator.
Firmware X+N on the emulator.
We may have a solution:
Run old firmware.
Load new firmware in memory, and any associated certificate.
Have the old firmware check the new firmware’s certificate and version number.
If it all checks out (new firmware is newer and certificate is valid), then:
Put the CDI in a special persistent-ish region (it only needs to survive one single reboot).
Lock that special region with the new firmware’s hash.
Load new firmware
If the hash of the new firmware matches the lock, give it reading access. Otherwise just wipe the special region.
New firmware does whatever it needs to migrate from the old CDI to its own new CDI.
Maybe we could use fixed-function units instead, but I’m deliberately avoiding those to minimise hardware requirements. In any case, I agree DICE is not enough. Though if my special region trick works, it’s pretty close. Now, assuming no exploitable bug in the relevant firmware:
Can’t downgrade, the old firmware will prevent that.
Can’t move to the emulator, that would mean extracting the CDI, and the CDI doesn’t leave the device to begin with.
If I’m using version X and want to upgrade to version X+2, skipping X+1 because that one has a critical key extraction vulnerability, I’m probably screwed if I can’t get access to a revocation list (and most likely a reliable clock as well), I’m liable to be tricked into upgrading into X+1. The only mitigation I see here is upgrading regularly, hoping exploits for X+1 don’t have time to get in effect before I upgrade to X+2.
Oh, and one huge red flag about my approach: the CDI is transferred from one firmware to the next, so the very mechanism we use to mitigate vulnerabilities, is itself a source of vulnerabilities! I really don’t like that kind of… security/security trade-off.
Sorry if I missed something, my screen is no longer big enough to fit a useful subset of your comment and my reply on them and so you’re relying on my attention span, which is severely impacted by the nice weather.
Well if the NSA gets a hold of Snowden’s computer I guess that’s what we have. I do reckon under this model though that there’s no escaping the energy side channel. Which kind of means masking everything that ever accesses the secrets, and… my that’s expensive. And we can probably forget about RAX designs (I’m told masking addition is expensive), and stick to stuff like AES and SHA-3 instead.
These attacks are now feasible with hardware that costs a couple of thousand dollars. Five years ago it cost tens of thousands of dollars. Within the lifetime of the device, I expect it to be hundreds of dollars.
One thing that seems impossible to mitigate though is bignum arithmetic. How can we deal with elliptic curves or RSA when everything must draw constant power? Do we switch to curves with binary fields? I’m told the security literature is less sold on their mathematical security than prime fields, but that’s the only way I can think of to avoid multiplying stuff.
It’s non-trivial but there are techniques for building constant-power large multipliers. I expect to see some of these things exposed in a slightly more generic way as people start caring more about post-quantum security (for devices with a 10+ year lifetime, support for post-quantum encryption is now a requirement, but no one knows what the right algorithm is).
Or we bite the bullet and make a general purpose CPU where every operation that is naturally constant time, is masked to be constant energy as well. Addition, multiplication, bit manipulation… all constant energy, guaranteed by the CPU. And while we’re at it stay in-order and remove speculative execution, cryptographic code is unlikely to benefit from out of order speculative cores anyway. Making a CPU like this is probably a bear, but this could have the tremendous advantage of making constant time code automatically constant energy.
It’s not just that it’s hard, the problem is that it impacts power / clock frequency for everything. With a clean split, you don’t care too much about leaks from the general-purpose core because that’s outside your TCB for confidentiality and integrity and so you can make it fast / efficient and you can make the fixed-function bits fast and secure but less power efficient.
If the device derives one key pair for encryption, and another key pair for signature I kind of get your wish. But if I want to perform signatures for Debian packages and personal emails, I could allow users to use different keys if I accept a domain separation string, but I can’t prevent them from using the same domain separation string for two different purposes.
It’s mostly about defence in depth (which is a good principle for the whole system). For example, for WebAuthn, you really want to have a single secret that’s used with a KDF and some other data to generate a key that’s used for signing. You want to enforce the policy that the secret used with the KDF never leaves the device and is not used except as input to a KDF. You also want to enforce a policy on the derived keys that they also never leave the device and are used only for signing. This makes it harder for a compromised OS to leak the key (especially if you also throw in some rate limiting).
It’s a nice to have if firmware can’t be expected to be bug free, but I’m not entirely sure which use case requires that not only the keys are preserved, but the upgrade can happen in an hostile and offline environment.
My original post had one: I am dual-booting Linux and Windows, using BitKeeper for encrypting my NTFS partition and LUKS2 for my ext4 one. Linux doesn’t trust Windows, Windows doesn’t trust Linux. With the TPM today, both can hold disk encryption keys (which can be further protected by a PIN / password) that the other can’t use. If an attacker installs malware that compromises the NT kernel and gets full access to the TPM interface, they still can’t decrypt my Linux partition.
From this use case, it follows that either Linux or Windows should be able to upgrade the firmware on the device, without destroying the utility for the other. If everything that’s using the device needs to cooperate in updates then I have a difficult operational problem and the likely outcome is people don’t update the firmware and keep running vulnerable versions.
Given that even formally verified software isn’t bug free (formal verification aims to ensure that all of your bugs exist in the spec, sometimes it just guarantees that they exist outside of your abstract machine), I think assuming that the firmware contains bugs is a safe axiom.
Protection models are implementation details, so that’s not too bad. An open ISA is good. The interface to the fixed-function units however is part of the ISA: the firmware needs to call it one war or another.
Probably quibbling about semantics, but to me the ISA is the set of instructions that run. Interfaces to things that run outside of the main pipeline may be architectural but they’re not part of the instruction set architecture.
I would almost forgive the vendor for not disclosing the exact algorithm used under the hood. Is it AES-GCM? ChaPoly? I don’t really care if I know where to put the plaintext, where to retrieve the ciphertext, and how big the authentication tag is. But I do need to know how to interact with the unit, so I hope this particular part is not under NDA.
Typically you don’t even get the datasheets for these things without an NDA. The algorithms used may be in marketing material (because people looking for FIPS compliance read those before bothering to sign the NDA). How keys are protected and the security model is definitely NDA’d in most cases. In a few cases because the offerings are utter crap and everyone would point and laugh, others because they’re doing quite clever things that they don’t want competitors to copy.
Thanks. But there’s something more fundamental about it, I think: the TPM specifically supports a lot of cryptographic primitives, and a lot of use cases. Because it kinda has to: too many users to satisfy there. Doing something similar for a single customer having a precise use case in mind would automatically divide the size of the specs by a couple orders of magnitude.
Maybe. On the other hand, if you get a chance to look at what Apple does in their Secure Element, you might long for the days of something as simple as a TPM. Feature creep is really easy. A couple of requests that I’ve seen for people to run on the RoT:
OS-free firmware updates. Fetch firmware for all devices, validate them, and install them. Requires an SR-IOV VF from the NIC assigned to the RoT and requires the RoT to run a full network stack. If you can do it, it’s really nice because it lets you sidestep things like the NVIDIA vulnerability from yesterday.
Fingerprint recognition logic for unlocking keys without the OS being involved. Requires a connection to the fingerprint reader (oh, that’s a USB device, so now we need a USB stack) and to run some small ML model (so might also need to be control plane for an ML accelerator).
Once you have a secure box, everything wants to live in the secure box. See also: TrustZone.
At the same time though, the generality (if not genericness) of the TPM is desirable, and DICE seems to be a nice way to keep that generality while keeping things simple. Well, simple if you punt on the firmware, at some point some one does have to write bug-free code. Hence the “which is harder, software or hardware?”. I’ve always thought the two were comparable, and bet on tailored software being much smaller than kitchen-sink hardware, and as such, proportionally easier.
I still don’t think that DICE gives you enough unless you can ensure that your firmware is bug free. You need some signing infrastructure, attestations, and a trust model on top. And that’s complicated.
It’s kind of cheating, but DICE gives you that out of the box: firmware X and X+1 are different, so they get different keys. One could load firmware X and exploit a bug to get its keys, but this wouldn’t reveal the keys of X+1.
But that’s not solving the problem. I need to be able to get the keys from firmware X+1 because otherwise a firmware upgrade locks me out of everything I’m using this device for.
We may have a solution:
That kind-of works. The problem is that you can brick your device (or, at least, lose access to all keys) by installing a buggy firmware that doesn’t allow updates. You need to build some kind of A/B model on top. Normally this is done by running firmware A, it installs firmware in slot B. Set a flag so that B boots next but A boots after that. B then boots and runs some checks. B then updates the flag so that B always boots next and deletes A.
With your model, the first time B boots, it would be able to decrypt the keys and reencrypt with its CDI. This makes me super nervous because that’s the point the firmware has access to I/O and to the keys, so I just need to find one buggy firmware that’s newer than the current one to exfiltrate all of your keys. You really need to make sure that you keep up to date with updates (a few iPhone jailbreaks have worked this way: don’t upgrade for a while, wait for a vulnerability to be found, upgrade to the vulnerable version, run exploit).
This does meet my requirement for an untrusted OS being able to do the upgrade though and it’s probably easy for the driver on the host to do a periodic version check and push out updates if there’s an old version so you need to actively cooperate with an attacker to allow the delayed-upgrade attacks.
I’m happy now that there is a process, but it’s taken a thread four times longer than your original post to get there. This is the kind of thing that keeps me coming back here, thanks for your patience!
I still think you want to do a lot more in fixed-function logic if you want something secure against someone with more than a few hundred dollars to throw at breaking it though.
I’m happy now that there is a process, but it’s taken a thread four times longer than your original post to get there. This is the kind of thing that keeps me coming back here, thanks for your patience!
Thank you for yours. I found this process in no small part thank to this discussion, and I’ve learned a few things too. I don’t think there’s much I seriously disagree with any more, so I’ll just reply with some of my thoughts.
my reply on them and so you’re relying on my attention span, which is severely impacted by the nice weather.
Heat wave ongoing at home, my nights are no cooler than 28°C… and I’m kind of high on sleep deprivation. Next summer we’ll definitely install an A/C.
Probably quibbling about semantics, but to me the ISA is the set of instructions that run. Interfaces to things that run outside of the main pipeline may be architectural but they’re not part of the instruction set architecture.
Makes sense. I prefer to us a slightly more expansive definition: the ISA is everything I need to know to make software for a piece of hardware. To me it’s a slightly more useful definition because it really defines the contours of what I want to know, and the limits of what I believe is acceptable for an NDA.
Typically you don’t even get the datasheets for these things without an NDA.
Fuck them I guess, then? I want full ownership of what I buy, and that includes the right to explain how to use it. Hopefully these NDAs only happen in business-to-business transactions, where those freedom considerations matter a lot less.
These attacks are now feasible with hardware that costs a couple of thousand dollars. Five years ago it cost tens of thousands of dollars. Within the lifetime of the device, I expect it to be hundreds of dollars.
Okay, this is so much worse than I thought. I guess I can assume any police department or dedicated criminal have those, or will soon. Great. Now hardware security requires power side channel resistance. That sets expectations I guess. Good to know regardless.
It’s non-trivial but there are techniques for building constant-power large multipliers.
That’s good. Though I’m not entirely sold on the utility of huge multipliers. For instance when I compare Libsodium (that uses 128-bit multipliers) with Monocypher (that stops at 64 bits), the advantage of the bigger multipliers is only about 2x. Now the actual measure is screwed up by the crazy complex out of order architecture, I don’t know how many multiplier units there are in my CPU, and haven’t looked at the assembly. Still, I suspect that roughly speaking, the speed of bignum arithmetic is roughly proportional to the length of your biggest multiplier. (Schoolbook multiplication suggests a quadratic relation instead, but bigger multipliers are most likely slower.)
It may therefore be enough to use smaller multipliers and chain them or loop with them (with a hardware control unit, microcode, or even firmware). And if we can have constant power multipliers, then constant power everything is not so far out of reach. Though again, given the consequences on hardware design, this probably means sticking to a simple and weak CPU.
I am dual-booting Linux and Windows, using BitKeeper for encrypting my NTFS partition and LUKS2 for my ext4 one.
Ah, that one. It slipped my mind. That looks legitimate indeed. Still, I would like to try and cop out of this one by using small firmware.
The idea is simple: what if the firmware in this particular case is used only for OS boot, and maybe there’s a standard so everybody agrees on, if not a single cipher-suite, say a very small set thereof? If the thing does nothing more than measuring code and giving the go/no go, then all you need is a KDF? That’s a couple hundred lines of code at worst, significantly smaller than even TweetNaCl. So we throw all the formal methods we can, like writing a certified F* compiler that outputs RISC-V code directly, test the hell out of this thing… and maybe we’ll never need to update it?
Perhaps I’m grasping at straws, but I did say it was a cop-out.
I still don’t think that DICE gives you enough unless you can ensure that your firmware is bug free. You need some signing infrastructure, attestations, and a trust model on top. And that’s complicated.
Bug-free firmware is (almost?) as critical as bug-free fixed functions, no doubt about that. Which is why I want to keep it as small as possible. Just please don’t destroy my dreams…
On the other hand, if you get a chance to look at what Apple does in their Secure Element, you might long for the days of something as simple as a TPM. […] Once you have a secure box, everything wants to live in the secure box.
…You just destroyed my dreams. Well, we both saw that thread on no one actually wanting simplicity. No solution there, except perhaps implanting a bomb in their hearts that will explode if someone manages to exploit a bug in the wild.
To be honest, excluding cryptographic primitives and communication library, to me the maximum acceptable firmware size is around 50 lines of C code. 200 as an absolute maximum. If it has to be any higher, this seriously impacts the trust I have in it. As for the size of the cryptographic code, anything bigger than Monocypher is a bust in my opinion. We may not have the same tastes with respect to fixed-function units vs firmware, but I do agree on one thing: what we trust the keys with should be really, really, really small…
…even if in practice it won’t be.
That kind-of works. The problem is that you can brick your device (or, at least, lose access to all keys) by installing a buggy firmware that doesn’t allow updates. You need to build some kind of A/B model on top.
I’m guessing that it’s easy to test that the new firmware still allows update. A/B is safer in that respect, but that’s still more stuff to add to the thing, and as always I want the bare minimum.
This makes me super nervous because that’s the point the firmware has access to I/O and to the keys, so I just need to find one buggy firmware that’s newer than the current one to exfiltrate all of your keys. You really need to make sure that you keep up to date with updates (a few iPhone jailbreaks have worked this way: don’t upgrade for a while, wait for a vulnerability to be found, upgrade to the vulnerable version, run exploit).
I’m nervous for the same reason. The iPhone jail can take hike though. I guess most users would have regular control over their machine, and can update regularly… except they can no longer do that once their computer is stolen by a determined adversary.
Hmm, so hardware security mandates that no firmware, current or future, can ever be exploited into exfiltrating the keys. So we want to update it as infrequently as possible. This means minimising the reasons for updates, so the API of the firmware has to be very stable. Which it is more likely to be if we manage to keep it small. Again.
Makes sense. I prefer to us a slightly more expansive definition: the ISA is everything I need to know to make software for a piece of hardware. To me it’s a slightly more useful definition because it really defines the contours of what I want to know, and the limits of what I believe is acceptable for an NDA.
That’s still architecture (as opposed to microarchitecture, which may change between versions or across vendors) but it’s not instruction set architecture. The difference between the two is important in a lot of cases (for example, the Arm GIC specification is architectural, but it’s not instruction set architecture).
That’s good. Though I’m not entirely sold on the utility of huge multipliers. For instance when I compare Libsodium (that uses 128-bit multipliers) with Monocypher (that stops at 64 bits), the advantage of the bigger multipliers is only about 2x.
As you say, this makes more difference on smaller in-order cores. It’s also worth noting that a few multiplication units have special cases for multiplies by zero, which means that you may find that splitting into smaller parts introduces power and timing side channels.
It’s far more important for post-quantum algorithms though. These seem to involve a lot of huge (on the order of KiBs) numbers that need multiplying so having a nicely pipelined big number multiplier can improve performance and let you do all of the power / perf optimisations that you want in your normal multiply (assuming you have one - a lot of embedded cores lack hardware multiple or divide).
The idea is simple: what if the firmware in this particular case is used only for OS boot, and maybe there’s a standard so everybody agrees on, if not a single cipher-suite, say a very small set thereof? If the thing does nothing more than measuring code and giving the go/no go, then all you need is a KDF? That’s a couple hundred lines of code at worst, significantly smaller than even TweetNaCl. So we throw all the formal methods we can, like writing a certified F* compiler that outputs RISC-V code directly, test the hell out of this thing… and maybe we’ll never need to update it?
Unfortunately, that’s exactly what the TPM was supposed to be (though it also needed to define the communication protocol, which is a critical part of the system). Once you’ve covered all of the use cases, I think you’ll end up with something almost as complex as the TPM spec. Actually, possibly worse because doing it now people would insist on some post-quantium signature algorithms and the ability to plug in new ones after there’s consensus on the right ones to use.
To be honest, excluding cryptographic primitives and communication library, to me the maximum acceptable firmware size is around 50 lines of C code. 200 as an absolute maximum. If it has to be any higher, this seriously impacts the trust I have in it. As for the size of the cryptographic code, anything bigger than Monocypher is a bust in my opinion. We may not have the same tastes with respect to fixed-function units vs firmware, but I do agree on one thing: what we trust the keys with should be really, really, really small…
The bit I’d be most worried in is the protocol parsing and I/O code and you’d be lucky to get that down to 200 lines of code. Fortunately, verification of protocol parsing is quite easy. One of the spinoffs from EverCrypt is a thing that lets you define a binary protocol and will then generate C serialisers and deserialisers. As long as you sprinkle enough volatile in there that the compiler doesn’t introduce TOCTOU bugs, you’re probably fine.
But I think you’re coming around to part of my world view which is that the key thing that you need from the hardware is support for fine-grained compartmentalisation. If you have compartmentalisation, you can make the amount of code that has access to the keys tiny and move the rest of it out of your TCB for key confidentiality.
I’m nervous for the same reason. The iPhone jail can take hike though. I guess most users would have regular control over their machine, and can update regularly… except they can no longer do that once their computer is stolen by a determined adversary.
That’s the problem and why I consider most confidential computing things to be dual-use technologies: it’s very hard to build something that can be used for letting me run code on a computer that I’ve rented without the owner being able to corrupt or inspect it, but doesn’t allow DRM-like applications. Once you think of them as dual-use, that leads to treating them in law like other dual-use technologies and regulating the use, not the technology. I’d personally love to see fair use strengthened in statute law such that any form of DRM that prevents the end user from exercising their fair use rights is, if not corrected within a small time window, grounds for immediate revocation of copyright. If you rely on vigilante justice then you don’t get the protection of law.
Hmm, so hardware security mandates that no firmware, current or future, can ever be exploited into exfiltrating the keys. So we want to update it as infrequently as possible. This means minimising the reasons for updates, so the API of the firmware has to be very stable. Which it is more likely to be if we manage to keep it small. Again.
You should take a look at the CHERIoT platform. I think we’re providing you with a lot of the building blocks that you need. Oh, and the first consumer hardware (hopefully shipping next year) will come with OpenTitan so you get some fixed-function crypto bits and a DICE implementation out of the box, in addition to object granularity memory safety and (up to) function-granularity compartmentalisation.
That’s still architecture (as opposed to microarchitecture, which may change between versions or across vendors) but it’s not instruction set architecture. The difference between the two is important in a lot of cases (for example, the Arm GIC specification is architectural, but it’s not instruction set architecture).
Oh, what I called ISA you call “architecture”. Makes sense. Besides, all I want is a shortcut to point to what I mean, so “architecture” it is.
It’s also worth noting that a few multiplication units have special cases for multiplies by zero, which means that you may find that splitting into smaller parts introduces power and timing side channels.
Crap, I forgot about those. Well obviously if we design a CPU to use in a secure element we wouldn’t use that kind of shortcut.
It’s far more important for post-quantum algorithms though. These seem to involve a lot of huge (on the order of KiBs) numbers that need multiplying
I didn’t know about those. If most such algs do indeed multiply huge numbers together, huge multipliers have more value than I thought.
The bit I’d be most worried in is the protocol parsing and I/O code and you’d be lucky to get that down to 200 lines of code.
Okay, let’s check what Tillitis have done with their bootloader. It’s one of their biggest firmware from what I could gather. So, their protocol parser seems to be about 200 lines, give or take, though their coding style makes it artificially high. I think my coding style would squeeze it in 100. They have a 150 lines standard lib, which I don’t think I can meaningfully reduce. They have a BLAKE2s implementation for the KDF of course. Their main is fairly big, over 300 lines.
OK, maybe I was a little optimist there. I’ll need to try stuff out, maybe there’s a way to compress their code further, or at least put the common bits in a library. About that: one neat thing the TKey does, is give access to parts of the code that made the bootloader. Most notably BLAKE2s. That way programs may avoid bringing some of their own code, making not only their source code smaller, but their binary as well, and increase available RAM in the process.
The same thing can be done with the basic communication facilities. You could have a simple API that let you send and receive messages of limited size, and handle the tag & size for you. If it’s good enough everyone would use it, and the only job left is parsing what goes inside the messages. Which is easy if we’re sticking to reasonable fixed or TLV formats. Which we can: we’re controlling both sides of the channel.
I’ll need to experiment to know for sure.
That’s the problem and why I consider most confidential computing things to be dual-use technologies
That’s an excellent point, that actually influences an unrelated essay I may write soon: in the Andor StarWars series (highly recommended by the way), there’s the notion of imperial tech. Widespread, convenient, but ultimately serves the Empire before its denizens. Like that radio whose spyware got Cassian spotted at some point. An effective rebellion needs to do away with those, which is why Nemik uses an old, hard to use navigator to help with a heist: it’s free from Empire influence.
The problem however with that dichotomy is that it fails to account for dual use. There only so much time for a Cory Doctorow lecture in a StarWars series after all. But it does call into questions some of the ideas I had for our real world, and I need to think about it.
You should take a look at the CHERIoT platform.
Will do.
But I think you’re coming around to part of my world view which is that the key thing that you need from the hardware is support for fine-grained compartmentalisation.
I do indeed. Though I still have a problem with how it might complicate the architecture of that hardware. Here’s an idea I’ve just got: divide the firmware into isolated compartments. A given compartment can do 2 things:
Access keys unique to that compartment.
Call other compartments (with an RPC mechanism, compartments need memory protection from each other).
And then there’s one “main” compartment, that can also do I/O.
How to use this? If we want to keep things easy we could put all the firmware in the main compartment. No isolation, but easier prototyping. But if we want things to be secure, then we write one compartment for what would otherwise be a fixed-function unit (one compartment for X25519, one for EdDSA…), isolate more complex protocols in their own compartments as well (the Noise XK compartment would for instance call in the X25519 compartment), and the main compartment would only do the I/O and parsing.
Many of the techniques in the article actually describe how TPMs are implemented, but providing direct access to the keys to the programmable bit of the RoT is something that no reputable RoT has done for a good ten years because it is terrible for security.
Is that what it’s doing? I didn’t read the design that way…
If you want to plug in arbitrary crypto algorithms, you need to run them on the programmable core, not in fixed function units. If you want to run them on the programmable core then you need it to have raw access to keys. If the programmable core has access to keys then you have no way of protecting the, from bugs in the firmware. If you have persistent keys then one buggy or malicious firmware image can leak keys stored by others.
Damn, you really did miss the central point of my entire post. Please re-read it, and tell me where you get lost, or what specific point you think is incorrect. I’ve had feedback about my article not being crystal clear so that may be on me. Unfortunately I don’t know how I can make it better yet.
In the mean time, I can answer more directly:
If you want to plug in arbitrary crypto algorithms, you need to run them on the programmable core, not in fixed function units.
Correct so far.
If you want to run them on the programmable core then you need it to have raw access to keys.
Not quite. The programmable core does not need access to root keys. It does need access to some key, but DICE makes sure this key is independent from the root key, and unique to the particular firmware being loaded.
If you have persistent keys then one buggy or malicious firmware image can leak keys stored by others.
No, it cannot. DICE makes it flat out impossible.
Programmable firmware cannot read the root key, so it cannot leak it. It cannot read (or compute) the derived keys of other firmware, so it cannot leak those either. The only key it can leak is its own. Malicious firmware can leak their own key, but that doesn’t do anything. Buggy firmware is more problematic (their own secret is trusted until we find the bug), but then you fix it, and the new firmware automatically gets a new key.
I hope this helps clear things up, because we can’t have a meaningful discussion if you don’t understand this point.
Not quite. The programmable core does not need access to root keys. It does need access to some key, but DICE makes sure this key is independent from the root key, and unique to the particular firmware being loaded.
Only if you have fixed-function hardware doing key derivation based on PCRs. And that means that you depend on fixed-function implementations of PCRs and a KDF, at an absolute minimum, and you can’t plug in arbitrary KDFs.
You seem to be currently assuming that the thing in the PCR is the entire firmware image. As other have pointed out, that complicates your update process because now you need your new firmware to export your keys, wrapped in a key that the new firmware can decrypt.
I’m not even sure how you would do that because the only key that the new firmware can trust is one that’s derived from the secret and the PCR (anything not derived from the device secret could be run on an emulator and used to leak your keys) and what you really want is something where the old version can derive a public key and the new version can derive a secret key. Again, this is something that you can do if you have asymmetric crypto in hardware (as with a typical TPM implementation), because you can allow arbitrary code on the fixed-function unit to derive a key pair from the secret and an arbitrary value, which then doesn’t grant you access to the secret key and lets you access the secret key only if you use a PCR value as the input, but now you’re relying on more key derivation in hardware.
More conventional uses of DICE use a hash of the signing key as the value embedded in the PCR, which means that two firmware images signed by the same key can have access to the same keys. But that means you need signature verification in hardware[1] and that same logic is responsible for anti-replay. Anti-replay is hard if you want to allow multiple different firmwares to be installed by different users. This is normally implemented with two things:
A counter in NVRAM. Each firmware comes with a signed value and will not be loaded if the value is less than in NVRAM. Firmware is allowed to store a larger value in this space so that it can allow downgrade until it’s ensured that the new version works.
A unary value stored in fuses. This is used in conjunction with a second higher-bits version value in the firmware. Blow a fuse and you burn all of the lower-bits values. This can be done a limited number of times and is intended only for when you accidentally install sign firmware that can either maliciously set the NVRAM value to the maximum or a few other really bad cases.
These are easy only with a single signing entity. If you want different signatures then you probably can’t do the fuse thing and the NVRAM value would need to be one of a bank (if you allowed a finite number of firmware sources to be protected).
You’re also pushing a lot of complexity into software here and it’s really unclear to me what your goal is.
[1] Or initial-boot firmware in ROM, which amounts to almost the same thing.
Not quite. The programmable core does not need access to root keys. It does need access to some key, but DICE makes sure this key is independent from the root key, and unique to the particular firmware being loaded.
Only if you have fixed-function hardware doing key derivation based on PCRs. And that means that you depend on fixed-function implementations of PCRs and a KDF, at an absolute minimum, and you can’t plug in arbitrary KDFs.
Yes! That’s how DICE works! There’s a fixed bootloader with a fixed KDF inside that does the key derivation.
And indeed I can’t plug an arbitrary KDF at this stage. But I don’t care, because what KDF is used is immaterial to my perception of the CDI. If I put the same firmware on a different HSM it will get a different random CDI, and won’t even be able to tell that v1.0.2 and v1.0.3 are using different KDFs or not.
You seem to be currently assuming that the thing in the PCR is the entire firmware image.
Well… I think I do? The important part isn’t where the image is stored, it’s the fact that it is measured in its entirety.
As other have pointed out, that complicates your update process because now you need your new firmware to export your keys, wrapped in a key that the new firmware can decrypt.
We can’t change the firmware without changing the keys, so yes, the update process is made more complicated: any wrapped keys must be unwrapped, stored somewhere safe, and wrapped again… not sure how to best do it, short of having explicit support for this in the fixed functions (which I really want to minimise). As for the firmware keys themselves, they’ll be gone so if anything is derived from them we need to rotate them.
One thing the fixed function could do is reserve a small region of RAM to store arbitrary data to pass along the next firmware. Something like the following:
Old firmware stores sensitive cleartext data in the special region.
Old firmware declares the hash of the next firmware.
Old firmware asks the HSM to trigger a reset.
Reboot. The fixed function bootloader now has control.
Untrusted new firmware is loaded.
Fixed bootloader hashes new firmware.
If the new firmware hash matches what was declared in step (2), unlock the special region. Wipe it otherwise.
Derive the new CDI
Start new firmware.
This should void the need for the asymmetric big guns, thus minimising the amount of code in the fixed function. The price we pay for that is preserving state between firmware loads, and to be honest I’m not entirely comfortable with that.
Also note that the old firmware must be able to authenticate the new firmware (most likely by checking a certificate), else I could just use the upgrade path to upload malicious firmware. Oh, and if the old firmware has a vulnerability there, we’re kinda doomed.
More conventional uses of DICE use a hash of the signing key as the value embedded in the PCR, which means that two firmware images signed by the same key can have access to the same keys.
If the firmware is signed instead of locally measured this is no longer DICE. Though if we limit ourselves to manufacturer-provided firmware this could be a valid alternative. I do think however that arbitrary, user provided firmware, is too powerful to pass up.
I would really like to understand what problems you are solving with this, because it’s clearly targeting a different threat model and a different set of problems to a TPM. This makes it very hard for me to have a useful opinion because it definitely doesn’t solve the problems that a TPM solves, but it might solve some other interesting problems. I just don’t know what they are.
A security token, similar to YubiKey, only more flexible. But those are (i) discrete, and (ii) should be carried on my person instead of tied to my computer. Like, well… a key. Main threat model: fishing, stolen laptop (assuming the key isn’t next to it).
A security token, similar to the TPM, only simpler and more flexible. Something that’s always on my computer, that, despite being flexible to the point of allowing arbitrary code, can reasonably guarantee that its own private keys will never leak. Online attacks are something for my OS to do, I just want to make sure my keys don’t have to be rotated every time a CVE pops up.
A root of trust, that helps guarantee that the firmware & software I want to execute, is the one to execute. Something that guarantees that unwanted bootloader or kernel modifications don’t survive a reboot. The main threat model is software attacks (vulnerabilities in my programs, OS, or me clicking on the wrong thing), but I would like some resistance against physical access too though that’s impossible with discrete chips (the main execution unit and the secure element must have a secure channel between them, or they must be one and the same).
Important thing about the root of trust: I need to be able to tell the machine to trust arbitrary software. My software my choice. Letting big companies dictate what I’m allowed to execute on my machine is not a goal.
I feel like this could use a screen to authenticate the program itself. If I try using the key to log into somewhere, but it gets sent the program for getting my disk decryption password, I should be able to reject the request.
I assume the User Supplied Secret is supposed to help with that, but then you’d need unique secrets per use-case. You can’t just store them in your password manager, as it’s probably on the workstation you’re working from—so, the workstation that could perform this attack.
With a screen, I could just peek to see if it’s displaying “LOGIN SOMEWHERE” instead of “DECRYPT DISK”.
If I understand what you’re saying correctly that’s handled automatically. Say we have two use cases:
Login somewhere
Decrypt disk
In all likelihood those two use case would have their dedicated TKey programs.
The “login somewhere” program would probably be a signature program, that supply the public key upon request and sign challenges.
The “decrypt disk” program would probably be a simple symmetric encryption & decryption program, that you would use to decrypt an encryption key stored somewhere in the LUKS/True-crypt header.
Those programs are different, and as such use different CDIs. Swap them unwittingly, and not only will they not work (because their protocols would be different and their responses would not make sense), they cannot work, because they simply have the wrong key.
To be fair, the same HSM program may be used for different use cases. For instance, “login-somewhere” and “sign-package” would probably use the exact same signature HSM program, and that program would therefore have the same CDI, and ultimately, the same signing key pair (as you noted, relying on the USS is not enough, the user may mix passwords up). Problem is, you really really don’t want to use the same key pair for both use cases: what if you login to the website, and as a challenge it makes you sign a nefarious package? Congratulation, you’ve just endorsed a virus that will make its way into the next Debian distribution!
The standard counter for this is to use a domain separation string. Our signing program, instead of using the CDI directly as a private key, should instead hash it together with a domain separation string stating its purpose. In our case that would be something like "login://example.com" and "endorse-debian-package" respectively. That way you’d have no more than one purpose per key pair, and there’s no way that login somewhere would cause you to endorse viruses.
Now that was for the HSM side. The screen you speak of would happen at the host side. I can only agree with you here, when you plug the key for some purpose, it’s nice to have a visual indicator somewhere that tells you what your key is doing for you. That’s one of the purpose of the touch sensor by the way: if the key requires you to touch it before it does it thing, it gives you the opportunity to look at your screen, and see what your computer is up to.
But even then that is mostly kind of automatic in many cases: when you sign a Debian package, you must select a .deb file to sign. That’s a fundamentally different UI from login into a web site, which would likely use a browser extension to talk to your key directly. So in this case there’s no need to explicitly differentiating the two… Then again, sending “LOGIN SOMEWHERE” or “DECRYPT DISK” to the OS’s notification system doesn’t hurt.
If you can swap the programs sent, you likely control the further communication. I was thinking that a malicious workstation would send the disk encryption program and then communicate with it just as the bootloader would.
I’m mostly thinking about minimizing the potential damage if the entire system is compromised.
Ah, I see… but there’s no way to control a user’s screen from a separate USB port, so that’s kinda moot.
The only guarantee you have in the case of full system compromise is that the keys inside the HSM don’t leak: if the firmware is legitimate and bug-free nothing will leak, and if the firmware is malicious it won’t have the right keys to begin with. But I don’t think you can stop a compromised host from loading legitimate firmware and diverting it from its primary purpose.
I meant a screen on the device. Maybe the key could be sold with a connector that you could attach an optional screen to, or something like that.
But I don’t think you can stop a compromised host from loading legitimate firmware and diverting it from its primary purpose.
You can, if the device has a screen, and some sort of input (touch sensors in the TKey). You won’t really be able to see the details if you’re e.g. signing a package, but you could see the type of operation that’s happening, and the domain separation string. It’s not perfect, an inattentive user could not notice that something is off, but imo it’s still a big improvement.
edit: ugh, I keep forgetting about Markdown doing that
Updating the images changes the key, so… you need to revoke the old key and use the new one. If you need the old key for the migration process (encrypted disk, wrapped keys…), use the old image do decrypt, and encrypt again with the new image.
The above was also my main issue when reading the post. I thought that having a persistent keypair that is tied to the machine is an explicit feature of a TPM. Derive subkeys all you want but that root should remain static.
So when your secret suddenly changes due to software updates and you have no way of obtaining the correct secret without running an old, now known-insecure copy of your program .. you have a problem, no?
Correct, I do have a problem. But I believe the solution is easy: control the environment so the update process itself doesn’t leak the old keys. Because the old firmware is vulnerable doesn’t mean my update program has to exploit it.
The problem doesn’t go away entirely with the classical approach: if your old firmware is vulnerable you need to assume its keys have been leaked, so you can no longer talk to your TPM over an untrusted channel to perform the update. You need the controlled environment just the same. And on top of that you need to change the possibly leaked old keys, and cross fingers that you have enough fuses left to blow.
Compatibility issue with existing bootloaders would probably prevent you to use anything other than TPM 2.0. One second obstacle is that discrete TPMs are connected to your motherboard through an I2C bus, while the current TKey needs a USB port.
In principle though, they’re both HSMs with similar capabilities (though the TKey is more flexible). I see no obstacle to using the TKey to do secure boot, especially on an embedded device you can write all the firmware from.
(The one thing I haven’t wrapped my head around yet is how secure boot can even work with a discrete HSM: what prevents me from sending one bootloader to the HSM, and then booting with another bootloader anyway? I feel like there’s a missing component that should enforce the submitted and actual bootloader to be one and the same.)
I think you have to trust some of the mobo firmware/hardware to be tamper-resistant. And the security of the system depends on that tamper resistance.
A separate portable HSM like the Tkey could help a bit by forcing an attacker to get the contents of the internal HSM (so they can imitate it), which is harder than bypassing it. But you can do similar with only an internal HSM by requiring key material from the internal HSM to decrypt your drives, so it doesn’t seem like a big deal to me.
But if the attacker completely controls the laptop without you knowing, including the internal HSM, then I don’t think there’s any way a discrete HSM can help.
But if the attacker completely controls the laptop without you knowing, including the internal HSM, then I don’t think there’s any way a discrete HSM can help.
Yes, this is one thing I didn’t quite know (only suspected) and have been convinced of only the last few days. With a discrete TPM we can always send one bootloader for measurement, and execute another anyway. This opens up the TPM, and if BitLocker/LUKS didn’t require a user password we should be able to decrypt the hard drive despite using hostile/pirate/forensic tools instead of the expected chain of trust.
This means I wrote a mistake in my “Measured Boot” section of my post, I’ll correct it. And ping @Summer as well: sorry, the answer is no, because even though you could use a discrete TKey-like chip to plug into the I2C bus of your motherboard instead of a TPM, discrete chips most likely don’t actually work for secure boot. It only takes one Evil Maid to insert a TPM-Genie or similar between the motherboard and the discrete chip, and it’s game over.
But. If hardware security is not a goal, I think there is a way. If the motherboard can guarantee that it sends the bootloader it will execute over the I2C bus, and nobody “borrows” your laptop to insert a Chip in the Middle, then you can guarantee that the bootloader you execute is the one that is measured. Therefore, no software bootloader-altering attack will survive a reboot.
As another note, if an attacker has physical access to the machine (required to break into the internal HSM unless it is buggy), then they can do lots of other attacks if they instrument the laptop and then return it to you:
reading RAM (this isn’t a problem if your CPU transparently encrypts RAM, but I don’t know if any CPU does this)
attaching a device to the keyboard and then using that remotely as a keylogger or to type commands into the laptop once the user has already authenticated themselves (or similar for the pointer or any other trusted input peripherals)
attaching a device to the display and recording or transmitting what it sees (you can fit a long screen recording on a 1TB microSD if you’re okay with a low refresh rate)
I think really good tamper-resistant cases (if such things exist) or rigorous tamper-detection procedures are maybe the more important defence than clever tricks with HSMs for organisations that want to protect against physical attacks on portable hardware.
reading RAM (this isn’t a problem if your CPU transparently encrypts RAM, but I don’t know if any CPU does this)
Not mainstream, but:
The Xbox CPUs have done this since the Xbox 1.
AMD SEV does this, but in a stupid way. SEV-SNP does it in a somewhat less stupid way.
Intel TME does this with or without VMs, TDX then relies on it.
Arm CCA makes it optional (I think, possibly only encryption with multiple keys is optional) because they want to support a variety of threat models, but the expectation is that most implementations will provide encryption.
Neat. Thanks for sharing. If the hardware is fast enough then I’d hope that this becomes universal to protect against physical attacks and rowhammer-style attacks.
I’m not sure it protects against rowhammer, you can still induce bit flips they’ll just flip bits in cyphertext, which will flip more bits in plaintext. It may make toggling the specific bits in the row that you’re using harder. Typically these things use AES-XTS with either a single key or a per-VM key, so someone trying to do RowHammer attacks will be able to deterministically set or clear bits, it will just be a bit tricky to work out which bits they’re setting.
On the Xbox One, this was patched into the physical memory map. The top bits of the address were used as a key selector. On a system designed to support a 40-bit physical address space with 8-12 GiB of physical memory, there was a lot of space for this. The keys for each key slot were provisioned by Pluton, with a command exposed to the hypervisor to generate a new key in a key slot. Each game ran in a separate VM that had a new key. For extra fun, memory was not zeroed when assigning it to game VMs (and, from the guest kernel to games) because that was slow, so games could see other game’s data, encrypted with one random AES key and decrypted with another (effectively, encrypting with two AES keys).
Sure, you’ll probably still be able to flip bits by hammering rows, but I think gaining information with it will be much harder with encryption. I’m not 100% on any of this, but I think not having knowledge of the bit patterns physically written to disk may make it more difficult to flip bits, too.
Can you specify what was stupid about AMD-SEV? I tried to work with it and remember being disappointed that it didn’t run a fully encrypted VM image out of the box, but you may have something more precise in mind?
SEV was broken even before they shipped hardware. The main stupidity was that they didn’t HMAC the register state when they encrypted it, so a malicious hypervisor could tamper with it, see what happened, and try again. Some folks figured out that you could use that to find the trap handler code in the guest and then build code reuse attacks that let you compromise it. They fixed that specific stupidity with SEV-ES, but that left the rollback ability which still lets you do some bad things. SNP closes most of those holes (though it doesn’t do anti rollback protection for memory, so malicious DIMMs can do some bad things in cooperation with a malicious hypervisor) and is almost a sensible platform. Except that they put data plane things in their RoT (it hashes the encrypted VM contents, rather than doing that on the host core where it’s fast and just signing the hash in the RoT) and their RoT is vulnerable to glitch injection attacks (doubly annoying for MS because we offered them a free license to one that is hardened against these attacks, which they’d already integrated with another SoC, and they decided to use the vulnerable one instead).
I worked with Arm on CCA and it’s what I actually want: it lets you build a privilege-separated hypervisor where the bit that’s in your TCB for confidentiality and integrity is part of your secure boot attestation and the bit that isn’t (which can be much bigger) can be updated independently. It’s a great system design that should be copied for other things.
Looks I underestimated how difficult tamper resistance actually is. Looks like the problem is actually fundamentally unsolvable: they take my machine, modify it a little bit, and I get an altered machine that spies on my keystrokes, on my screen… so realistically, if I have reason to believe my laptop may have been tampered with, I can no longer trust it, and I need to put it in the trash bin right away… Damn.
At least the “don’t let thieves decrypt the drive” problem is mostly solved.
I’m sure I’m telling you things you already know, but with security you have to have a threat model or it’s just worrying/paranoia. Probably nobody in the whole world is interested in doing an evil-maid attack on you, and even if they are interested they would probably prefer a similar but less tough target if you took basic steps to protect yourself. If you have a threat model then you can take action. If you assume that your attacker has infinite motivation and resources then there is nothing you can do.
I’m sure I’m telling you things you already know, but with security you have to have a threat model or it’s just worrying/paranoia.
I have two relevant-ish anecdotes about that.
The first dates back years, I was attending a conference on security, mostly aimed at people who are more likely to be targetted than others. Because their work is not only sensitive, but because it is controversial. The presenter considered the Evil Maid attack likely enough that it was irresponsible to leave your laptop unattended at the hotel. That special precautions should be taken if you were ask to hand over your phone before entering a… microphone-free room (do you actually trust whoever took your phone not to try and tamper with it?) So as much as I don’t think those things apply to me right now, it does apply to some people, and it’s nice to give them (and perhaps more importantly, the Snowdens among them) options.
The second one was me working on AMD SEV at the beginning of this very year. So we had this “sensitive” server app that we were supposed to sell to foreign powers. Since we can’t have them peek into our national “secret sauce” (a fairly ordinary sauce actually), we were investigating thwarting reverse engineering efforts. Two options were on the table, code obfuscation and DRM. Seeing the promise of AMD SEV we went the DRM route.
Cool, so AMD SEV transparently encrypts all RAM. But once I got to actually set it up, I realised it wasn’t an all-encompassing solution: AMD-SEV doesn’t run a fully encrypted virtual machine, only the RAM is encrypted. All I/O was left in plaintext, and the VM image was also in cleartext. Crap, we need full disk encryption.
But then we have two problems: where do we put the encryption keys? Can’t have our representative type a password every time our overseas client wants to reboot the machine. So it must be hidden in the TPM. Oh and the TPM should only unlock itself if the right bootloader and OS is running, so we need some kind of Trusted Boot.
At this point I just gave up. I was struggling enough with AMD SEV, I didn’t want to suffer the TPM on top, and most of all I neither believed in their threat model nor liked their business model. I mean, the stuff was a fancy firewall. We make extra effort to hide our code from them, and they’re supposed to trust us to secure their network?
So as much as I don’t think those things apply to me right now, it does apply to some people, and it’s nice to give them (and perhaps more importantly, the Snowdens among them) options.
Sure, and I hope you were able to find a job/project more to your tastes after the firewall DRM project.
The DICE idea was interesting and cool, thank you for sharing.
Do you feel it is worth sticking to a simple RISC processor without any cryptography accelerating hardware? Seems that AES and other symmetric constructions can be done consistently in hardware. What about elliptic curves?
Do you feel it is worth sticking to a simple RISC processor without any cryptography accelerating hardware?
Well… the lack of hardware acceleration is rarely a show stopper, but adding it basically never hurts. So I would say no, it’s not worth it. Sticking to a simple RISC processor, yes. Abandoning any and all kinds of crypto hardware acceleration, hell no. Thankfully RISC-V ratified a set cryptographic extension that include things like AES, carry-less multiply (GHASH), and SHA-2 acceleration.
Now a dedicated crypto core that implements a big part, or the entirety of, a crypto primitive, I’m not sure it is worth the trouble. Sure it would be blazing fast, but that’s a lot of dedicated silicon to a single primitive, which better be central to your whole HSM. For a general purpose HSM similar to the TKey, I would first try to optimise the scalar CPU itself: 3-ports register file, execution pipeline, basic branch prediction… all the things that let me approach 1 instruction per cycle.
Then I would reach for crypto scalar extensions like the one RISC-V ratified, those easily multiply your speed by 2 to 4 (likely more for AES and GHASH) for very cheap. Then I would consider the big guns like vector instructions and dedicated crypto cores. I’d likely go for the vector instructions if my priority is Chacha20, or an AES and GHASH crypto cores if AES-GCM is more important.
One thing that’s missing from RISC-V at the moment that I would like to add though (as a custom extension most likely), is a rotate-immediate-and-xor instruction. It would provide a neat speed boost to ChaCha20 and BLAKE2s. Though I won’t complain if the only thing I have is a rotate instruction (which RISC-V already has an extension for). Having to do the (x << n) ^ (x >> (32 - n)) dance (3 instructions instead of 1) is a real bummer.
What about elliptic curves?
Basically the only thing elliptic curves need is bignum arithmetic. The best way to accelerate them is to have a big-ass multiply unit. Or several multiply units running in parallel, but that means reaching for vector instruction or going out of order, both of which I would consider kind of last resorts. Besides, the multiply instructions from the RISC-V M extension are already pretty good: one to compute the lower half, one to compute the upper half, that’s as efficient as it gets on a simple in-order core.
We could also consider a Multiply-accumulate instruction (d <- a × b + c), but needing 3 inputs mean the register file needs 3 read ports instead of just 2, and that tends to cost quite a lot of silicon. And don’t get me started on the 1-cycle version of this: 3 inputs, 2 outputs (lower and upper), now the register file needs to have 5-ports… it’ll work for sure, but that’s a big ask for an otherwise tiny crypto chip. (If I recall correctly register files with N register and P ports require something like O(N×P²) silicon).
Also I have yet to test it for myself, but I’m told the TKey can do a Curve25519 signature verification faster than the CortextM0 (which lack the 32->64 multiply instructions), despite running a core that doesn’t even have a pipeline, on a tiny FPGA. If Elliptic curves can run fast enough even on that, I wouldn’t worry too much about explicit support. Just make sure you have a good execution pipeline and blazing fast multiply units, and you’ll be fine.
This seems to be largely missing the point of a TPM. The lack of programmability is not a bug, it is a feature. The device provides a set of interfaces that are generic, so the same TPM can be used with multiple operating systems easily and so that the behaviour of those interfaces can be relied on. I can have a system that dual boots Windows and Linux and will allow Linux to unlock my LUKS-encrypted root volume and Windows to unlock my BitLocker-encrypted root volume, and allow Linux to expose my SSH keys and Windows to expose my Windows Hello credentials, but without allowing either operating system to access the other’s state and, crucially, guaranteeing that compromising either does not allow the attacker to exfiltrate keys, only to perform online attacks.
Relying on DICE just punts the problem to another hardware root of trust. You can’t rely on an inductive system to prove your base case. If every machine comes with its own programmable device then I need to have some mechanism for my OS to trust ones that have been validated and, most importantly, when a bug is found in a particular version, I need a mechanism for anti-rollback so that I can guarantee that an attacker can’t bypass my security by just installing an old version of the firmware. I can’t do that validation in my OS because it relies on secure boot for integrity and a secure boot chain gives me an attestation that the only things that can compromise my boot are things earlier in the chain, so the thing I’m trying to validate is in the set of things that can compromise my chain. So now I need a root of trust that can boot my TPM replacement and can load its firmware.
It’s very easy to design a simpler alternative to a TPM if you avoid solving any of the hard problems (it’s almost certainly possible to design something better that does solve the hard problems, because TPM is, uh, not the best spec in the world). Many of the techniques in the article actually describe how TPMs are implemented, but providing direct access to the keys to the programmable bit of the RoT is something that no reputable RoT has done for a good ten years because it is terrible for security. A device built along these lines would be vulnerable to a load of power and probably timing side channels and likely to glitch injection attacks and a load of other things that are in scope for the TPM’s threat model.
As far as I understand it’s only a feature for Treacherous Computing, which requires a Root of Trust that is outside the control of the end user. If there’s one feature of the TPM I really really don’t care about, it’s this one.
Besides, you can achieve that with DICE anyway: the manufacturer can just provision a TPM-like firmware at the factory and issue a certificate for that. Users could change the firmware, but only factory-approved firmware would enjoy a factory-issued certificate.
“Generic” is not quite the right word. “Exhaustive” comes closer. The TPM for instance doesn’t have generic support for arbitrary signature algorithms. It has specific support for RSA, ECDSA, Ed25519… That’s just hard coding a gazillion use cases and hoping we covered enough to call that “generic”.
This has nothing to do with DICE. Just write firmware that don’t leak their own CDI and you’ll be okay. Which is pretty easy to do when each piece of firmware you write is focused on one single use case. Besides, manufacturers today write firmware hoping it won’t leak the root secret that is stored in fuses. They have the exact same problem you seem to criticise DICE for, only worse: if the root secret is leaked and it can’t be re-fused, the chip is scrap.
But we do have a base case: it’s the bootloader, that derives the CDI from the root UDS and the firmware it loads. That bootloader is tiny and provided by the manufacturer. Surely they can shake out all the bugs from a C program that hardly requires a couple hundred lines? At these sizes even the compiled binary could be audited.
Firmware with actual functionality is another matter, but as I said in the article, they’re easier to make than one generic firmware to address all use cases. And if one has a bug that leaks its own CDI, we can correct the bug and the new version will have its new uncompromised CDI.
Ah, Treacherous Computing. As I’ve said, just add that capability to the manufacturer’s default firmware. That way the OS can check the firmware’s key against the manufacturer’s certificate.
Well, strictly speaking the OS is a bit late to check anything, and Secure Boot doesn’t work to begin with. If however the hard drive is encrypted by keys derived from the CDI, decryption is only possible when the approved firmware runs (else we get useless keys). Then the OS running at all is proof that we were running the correct firmware, and if the DICE chip is integrated into the main CPU (so MitM is not possible), its manufacturer issued firmware can refuse to give up the encryption keys if the bootloader isn’t right.
Assuming updates are even a thing… old firmware let the new firmware in, checks it is signed by the manufacturer, checks that it is a newer version, new firmware is swapped in, new public key (yeah, the CDI changed, so…) is sent to the manufacturer (on a secure channel of course), and the manufacturer issues a new certificate.
To prevent rollbacks my first thought would be a persistent increment-only hardware counter, that would be hashed together with the CDI instead of using the CDI directly. If every firmware version does this, incrementing the counter instantly change the keys of older versions. With the wrong keys they’re no longer approved, and rollback attempts are detected as soon as we check the certificate. We don’t even have to explicitly revoke old keys with that approach.
That was the first idea that popped up, we can probably do better.
The problems the TPM solve aren’t hard, they’re many.
Hardware-wise, manufacturers of current secure elements can easily add DICE capability at no loss of security. They have the hardware secret sauce to do it. Unlike Tillitis, they don’t have to use an FPGA whose only threat model was making a token effort to protect their customers’ proprietary bitstream from IP theft.
Software-wise, DICE automatically makes thing easier by (i) allowing us to only address the uses cases we care about, and (ii) addressing them separately. Even if end users are derps, the manufacturer can publish a bunch of official firmware for the most popular use cases. They’re already doing that after all.
Which is why DICE does not do this. DICE firmware does not, I repeat, does not access the root key of the device. Only the tiny bootloader does. The main firmware only get access to a derived key that is specific to it, and it alone. Compromising the root key of the device from the main firmware is flat out impossible.
Sorry for re-stating the obvious, but it’s hard to interpret what you just wrote under the assumption that you understood that, I’m not sure what’s your point here.
No, you’re just talking out of your ass here. You need to explain how data can flow from secrets to the side channels with the DICE approach, in a way that they do not with the regular approach. You also need to be aware of the actual threat model: the timing side channel is almost always relevant, but in practice easy to address — even Daniel J. Bernstein said so. Glitch attacks and power analysis are harder, but they require physical access and as such are out of many threat models. They’re important when the user is the enemy, but you know what I think of Treacherous Computing.
For instance, it’s reasonable to assume that under the normal operation of the HSM, its immediate environment is trusted enough not to purposefully inject glitches or analyse power. It’s not always a valid assumption, but not everybody is making a credit card to be plugged into many untrusted terminals.
Then there’s the question of what can be extracted at which point. The bootloader is fixed, and hashes the program and secret together. We already know of ways to mitigate the power side channel, it’s to hash the UDS in its own block, and hash the rest of the program starting on the next block. (Ideally we’d want a constant power hash, but for RAX designs, constant power addition is very expensive, so let’s settle for the mitigation instead.)
Glitching the loading of the program would just yield a different CDI, not sure we can do much there. And once the program is loaded, access to the UDS is closed off, so no amount of power analysis and glitching can shake that off (assuming adequate protection of the latch, but that should be easy, compared to protecting the integrity of the entire program’s runtime state.)
No, any form of security requires the root of trust to be out of control of the attacker. The attacker is assumed to be able to compromise the later stages of execution because the DICE model is based on inductive proofs and gives you guarantees only about the base state of a system that has unbounded states later. Each step is, effectively, promising not to do certain bad things (e.g. GRUB promises not to lie about the kernel that it has loaded) but once you get to a general purpose OS you can run arbitrary code. Once you get to this point, you lower layers are expected to lock down the system such that an attacker cannot do specified bad things. For example, the TPM may have unlocked certain keys (often implicitly by using a KDF over a secret mixed with some PCR values) but does not permit that key to be exfiltrated, so an attacker who compromises my OS cannot steal my SSH keys or disk encryption keys (which means that they can do online attacks during their compromise).
That is absolutely untrue for the hardware RoTs that I’ve worked with. They have fixed function hardware that stores keys and expose a set of operations on them (including ACLs that authorise what operations a key can be used for and what derived keys can be used for). They run the TPM stack on the programmable core, but they assume that this can be compromised. An attacker who compromises the TPM stack has very little more access than someone who compromises the first-stage boot loader on the host.
Okay, so now you need your device to be able to make network connections or you need to rely on the OS to provide that channel. Now you have a very wide attack surface. Being able to support this use case was actually one of the motivations for CHERIoT because it isn’t feasible with existing hardware within the threat models of people who rely on TPMs.
Okay, I think this is where we disagree. The primary use case for a TPM, for me, is protecting my data if my machine is stolen. There’s no point using disk encryption if an attacker also has access to the key. You seem to think that DRM (‘repeating the phrase ‘Treacherous Computing’ does not inspire confidence, you seem to use the phrase to dismiss all of the things that people have done to improve security by assuming malicious intent) is the only threat model where attackers have physical access. I think that attackers with physical access (whether that’s via an ‘evil maid’ attack or outright theft) or who have compromised my OS are the only use cases where I want a hardware root of trust. If a system isn’t robust in those two cases, then it fails my threat model. By moving access to keys and crypto algorithms out of fixed-function units and my moving more of the code on the programmable core into the TCB, you make it harder to defend against this threat model. I’m not sure what your threat model actually is, you might have a good solution for it, but it isn’t the one that I care about.
Okay.
I don’t like this induction analogy, but even then we do have a base case: the DICE bootloader that is a fixed function. The firmware on top has its own key and therefore can be integrated in the root of trust (one with actual functionality this time), and, well… just like a TPM it’s not supposed to reveal its own secrets.
If we can’t do bug-free firmware, we’re kinda doomed anyway. Speaking of which…
You feel like you’re contradicting me, but I feel like you’re making my case for me: those fixed function they wrote, they have to make sure they don’t have any bug that would expose the secrets, right? This is exactly analogous to the firmware I spoke of. Nothing stops this firmware to provide fixed functions to the untrusted host…
…unless you need the variable functions on top to be trusted anyway, kind of security by layers. We’d need something like a 2-stage execution environment, where the first can set up some functions, and the second is denied access to the CDI of the first stage, but can issue commands to it nonetheless. I guess DICE isn’t enough for that. Whether this complexity is worth the trouble is another question though.
I use a password for that. It’s easier for me to trust a secret that’s not even on my machine. (Though ideally I’d use a password and a secure element.)
Protection against theft should be straightforward. Evil Maids however are very powerful: a key logger, something that intercepts the video signals… I’m afraid those would be hard to detect, and mitigating them would pretty much require every major component of the laptop to contain a secure element so all communications between component can be encrypted and authenticated. This is no picnic.
My, if I ever come to actually fear an Evil Maid, there’s no way I’m leaving my laptop at the hotel.
Only to the extent the firmware is more likely to have bugs than fixed-function units. Keeping the firmware small and specialised improves our odds dramatically. Because of course, the attacker can’t change DICE firmware without changing the keys. If they do, the disk simply won’t decrypt. Their only chance is to exploit a bug in the firmware.
Then you probably don’t like DICE, since the entire model is based on inductive proofs of security.
I am really struggling to understand how the security claims that you’re making map back to things that the security proofs about DICE give you.
Right, and they do that by being implemented in a substrate that is amenable to formal verification, including side-channel resistance. In particular, it has no arbitrary control flow (everything is dataflow down wires, control signals are data), and it has almost no dynamic dataflow (values can flow only to places where there are wires). Doing the same thing on a general-purpose programmable core that does not provide hardware memory safety is vastly harder. It is easy to validate that keys cannot be used for the wrong purpose because those accesses have values coming from the relevant bit in the ACL and are anded with that value. There is no way of getting a key except via a wire from the register file that contains the key.
And then you decide that you need to protect two different things at the higher level from each other. And two different things at the lower level. And now you have TrustZone and two privilege modes. Only now you have shared resources between different trust domains, so you have side channels. Again, there’s a reason people stopped building hardware RoTs like this! The TPM is intentionally inflexible because as soon as you start allowing programmable functionality you need multiple trust domains, as soon as you have multiple trust domains you need both fine-grained sharing and strong isolation and those are both hard problems.
So you enter the password into your bootloader and now it’s in memory. Anyone who manages to compromise your OS once can exfiltrate it. The entire point of a security coprocessor is to ensure that keys are not accessible to someone who compromises your OS.
But, again, you haven’t given me a threat model. It seems like you’re happy with the level of security that you get from no hardware and you’re unhappy with the level of security that you get from a TPM, but somehow want something different. You’ve written a crypto library, so I presume you know what a threat model looks like: what capabilities to you assume your attacker has? What are your attackers’ goals? What tools will they be able to construct from their base capabilities towards that goal? How do you prevent them from doing so? If you start from a threat model, we can have a reasonable discussion about whether your approach addresses it, but it seems like you’re starting from a bunch of technologies that people building secure elements stopped using ages ago because they aren’t sufficient for the threat models for these devices, and saying that they’re a better way of building secure elements.
To be honest I don’t believe you. Please give me evidence.
Okay, it’s simple: you start with two things:
You load the firmware on the hardware, and this hardware+firmware pair constitutes an HSM. That’s a bit strange, but strictly speaking DICE hardware is not by itself an HSM. It only becomes so when loaded with a particular piece of firmware. (In practice the bootloader alone doesn’t really count, since it doesn’t provide any functionality users want. Users want actual cryptographic services, loading firmware is only a means to that end.) Anyway, with trusted hardware and trusted firmware, you now have a trusted HSM. Ask for its public key, and register that as “trusted”.
Now let’s say you have those 3 things:
Testing whether you can trust this hardware+firmware pair is simple:
And that’s it. Notice the similarity between that and a fixed-function HSM: it’s the same, you just need to start by loading firmware you don’t trust yet. The actual verification after that is exactly the same as for a classic HSM.
Of course this all hinges on the guarantee that changing the firmware unpredictably changes the CDI. That we trust the hardware+bootloader to provide that guarantee. If you have reasons to distrust this guarantee I’d like to know why.
Now what about Dragons?
You’re clearly assuming that firmware is not amenable to formal verification. This is blatantly false: even regular cryptographic libraries can be formally analysed, and some have been. Including side channel resistance. And since firmware runs on a specified piece of hardware, that verification is even easier.
No, you’re making a category error there. Programs (barring the self modifying relics) have a fixed control flow, just like hardware. When we talk about arbitrary control flow we don’t talk about programs having it, we’re talking about our ability to make or load arbitrary programs. Once the program is chosen the control flow is frozen.
Same thing for the dynamic flow: cryptographic libraries have almost no dynamic flow, and that’s what make them secure against the most important side channels. So while some pieces of firmware can have dynamic data flow, not all of them have, and obviously a sane cryptographic engineer would only trust the ones that has as little of it as is reasonable.
Again a category error, I believe. This is not about guaranteeing anything about arbitrary firmware, we want to guarantee that a particular piece of firmware is void of memory errors (among other errors). Hardware memory safety tricks like ASLR or executable XOR writeable help, but you can completely sidestep the problem and write your firmware in Rust.
What you call “vastly harder” is at worst a speed bump.
Most of the details of these things are protected by NDAs, but if you let me know which manufacturers you’ve talked to I can point you in the right direction in their designs, if we’ve worked with any of the same people.
That works but only because you’re sidestepping all of the hard problems. As a user, I don’t care about the code identity I care about the guarantees. That’s always the hard bit in any attestation system and if you tie software on the host to specific versions of the RoT then you end up with something very fragile. In the RoT designs I’m familiar with, this is sidestepped entirely as a problem. In a TPM, the useful guarantees I get are all about negative properties: it’s not programmable and so I know you cannot do things that are not exposed, I don’t have to prove a completeness property over some general-purpose code (ignoring firmware TPMs, which were the worst idea ever and were all broken by Spectre if not before).
Yes and no. I’ve worked with the EverCrypt team and they got further than most. When we tried to use their code in production, we discovered that their proofs of temporal safety were holding only because they never freed memory (not great for a resource-constrained environment). Those proofs also depend on compiling with CompCert and may or may not hold with other compilers. Most importantly, they hold only if no code in the system that they are linked with ever hits undefined behaviour. If, for example, you have a memory safety bug in the parser for mailbox messages from your host, none of the proof in EverCrypt hold because you have just violated one of their axioms.
With the Low* work, they are able to prove that the implementations are constant time, in conjunction with a modest microarchitectural model. They are not able to prove anything about power because most ALUs have data-dependent power consumption. The techniques used to harden against these in hardware (e.g. running identical pipelines in parallel that that XOR’d inputs so that the power is always uniform) are simply not expressive in instruction sets. The Low* based proofs also depend on their being no speculative execution (if I can train a branch predictor to go to the wrong place, it doesn’t matter that there are no timing leaks in the correct path) but that’s probably fine.
No, pretty much every program will use the branch and link register instruction. Proving anything about control flow is now a data flow problem: you have to prove that the set of values that will reach the input of that instruction is constrained. This is possible only by assuming that you have constrained memory access, so you’re returning to the ‘memory safety is an axiom’ world. Now, with CHERIoT, we can make that guarantee, but you won’t find it in any other embedded ISA.
This only helps the verification if you have strong compartmentalisation. Again, not something provided by most non-CHERIoT embedded systems.
ASLR is totally inappropriate, it barely works when you have a full 32-bit address space, you have no chance of getting useful levels of entropy doing it without an MMU. Even if it did work, it’s a probabilistic defence and so is inappropriate for building a foundation for formal verification. Immutable code is the default on embedded devices (many are Harvard architecture) but you’re still vulnerable to code reuse. You can write in Rust, but you can’t verify anything about timing in Rust because the abstract machine isn’t in Rust. Anything involving your assembly routines and anything involving I/O will be in unsafe Rust code, so there’s still scope for bugs even if you verify all of the safe Rust code (Rust verification tools are improving but they’re nowhere near the level that you’d require for a system like this).
I say vastly harder based on experiences of teams that I have worked with. They have shipped hardware RoTs and formally verified crypto implementations. The software verification effort took more people and took longer. The hardware RoT was deployed, at scale, to people with physical access some of whom had a large financial incentive to break it. It is still doing fine with no compromises. The verified crypto implementation was broken within a week of internal testing by finding bugs in the integration with the surrounding system. I consider the thing that multiple vendors have done and that has stood up to attacks in production vastly easier than the thing that hypothetically could be done but has not actually worked any time someone has tried it, but maybe that’s just me. If you want to prove me wrong and ship a formally verified firmware stack for a RoT that is resistant to timing and power side channels and glitching attacks, I’d be very happy to see it: even if I never use it, the tools that you’d have to build along the way would be game changers for the industry.
But you still haven’t told me what problem you’re trying to solve or what your threat model is.
Fuck, I forgot about that. Evidence not available, then. Fuck. Still, I care less about implementation details than I care about the user-visible interface. Do those poor sods also put these under NDA? Like, they sell a chip with an ISA, and then they tell you “this ISA is a corporate/state secret, sign the NDA please”?
I bet they do. Fuck them.
Same threats as anyone else. I’m just trying to have something simpler achieve the same goals. Because as a user, I just can’t stand these huge TPM specs. Specs that apparently had a critical vulnerability, discovered March of this year. Not in the hardware, not in the software. In the spec.
I have an hypothesis: software teams are drooling incompetents compared to hardware teams. Maybe it’s less about the actual difficulty of software, and more about how hardware teams understand the stakes of what they’re doing and are trained (and tooled) accordingly, while software teams simply don’t and aren’t. I still remember this blog post by someone who worked with hardware teams, and then noticed how software teams just didn’t know how to test their stuff.
I’m aware of at least one language that can (forgot the name, it’s designed specifically for cryptography), and I believe cryptographic libraires have been written in it. Good point about having to verify the compilers as well though. That obviously has been done, though whether it was done with the relevant verified language(s) I don’t know.
Jump and link to a hard coded constant. Unless indirect calls are involved, but that would typically mean the main program calls arbitrary code… Could be used internally of course, and we’d found we could still account for the jump and link destination… or just avoid indirect calls indirectly, if that makes verification easier.
Now if it’s something as trivial as “call foo if condition1, call bar if condition 2”, well… first of all that’s not constant time, but let’s say we’re doing signature verification and don’t care about leaking data just yet: a fixed function hardware equivalent would do exactly the same. How could it not, at some point there is stuff to do.
Which means constant power can’t be achieved in software, I’m aware. This is a definite disadvantage, and it does exclude some use cases (like credit cards). I maintain that you don’t always need to address the energy side channel. Or even electromagnetic emissions.
How as a user would you have any guarantee? There’s only one way I’m aware of, it goes in 2 steps:
That’s true of any HSM, DICE or not. And what are you on about with “tying to a specific version of the RoT”? Isn’t that what happens anyway? Doesn’t the manufacturer have to sign the entirety of the chip, firmware included, regardless of the approach taken? The only additional difficulty I see with DICE is that since changing the firmware changes the keys, updates to the firmware are harder. But that’s quite moot if the alternative is fixed functions etched in immutable hardware: can’t update those at all.
But then DICE does an advantage: someone other than the manufacturer could write the firmware and vouch for it. A downstream user (or trusted secondary shop) can then load the firmware and register its keys.
Neither is DICE firmware. A dice HSM is not programmable by default, it has a “fixed-function” firmware, identical except in implementation details to fixed-function hardware.
One could load arbitrary firmware onto their DICE hardware, but only one such firmware has been signed and can be checked for identity. It can’t be swapped for anything else. Programs aren’t general purpose, the CPU running them is. Freeze the program and you get a special purpose appliance.
Perhaps, but since fusing the TPM to the main execution chip is the only way to have actual hardware security… that can be better than a discrete chip, even one that has a level-magical hardware security.
The ISA for the programmable part is often open but that’s not the interesting bit. The interfaces to the fixed-function units and the protection models are often NDAs. Some of the security also depends on exactly what you do in your EDA tools. Apparently if you try to fab Pluton with the default Cadence config it happily optimises away a load of mitigations. RoT vendors don’t like talking about their security features in public because it paints a target on them, they’re happy to talk about them to (potential) customers.
That’s less true with things like OpenTitan or Calyptra, but I haven’t looked at either in detail. I am pushing for both to adopt the CHERIoT Ibex as the programmable component because it enables some quite interesting things. We can statically verify (in the binary) which compartments in a firmware image have access to which bits of MMIO space, so we can allow (for example) an Intel-provided compartment to have access to back-door interfaces to the CPU that let it attest to CPU state, but not have access to the key storage, and that no component except the Intel-provided compartment has access to this bit of the state, so only Intel can compromise the security of the Intel processor using the RoT.
So, to be clear, the attacker capabilities are:
The attacker’s goal is to exfiltrate a key that will allow them to produce an emulation of the device that is indistinguishable from the real device, from the perspective of software.
The requirements for software are such that I must be able to:
I probably missed something.
But it sounds like you also want to be able to run arbitrary code on the device to perform OS-specific functionality (again, this is actually one of the target use cases for CHERIoT because it’s really hard).
I don’t think we disagree that TPM is a clusterfuck, but I think it’s an interface that you can run on a secure system (we’ve actually run the TPM reference stack on a CHERIoT implementation), I just don’t think that you’re solving most of the problems that I want a RoT to solve.
Absolutely not in this case. The folks on the EverCrypt project were some of the smartest people I’ve met (the kind of people I’d trust to build the theorem prover that I depend on, not just the kind that I’d trust to use it correctly).
AMD and Intel routinely mess up security critical things.
F* / Low*, which is what EverCrypt used.
Kind of. Formally verified compilers come with a lot of caveats. CompCert, for example, does not guarantee anything if your input code contains any undefined behaviour. This is fine for F*, which generates C code from an ML dialect that guarantees no UB, but makes CompCert pretty useless for C code written by humans. Even then, it’s not completely clear how well this works because both F* and CompCert have a formal model of C semantics but it might not be the same formal model of C semantics and any mismatch can invalidate the proofs.
Formal verification for hardware also comes with some caveats. My favourite hardware security PoC was a TrustZone exploit in a verified Arm core. They ran two wires close together so that if you rapidly toggled the value in a register you’d induce a current in a wire that led to the S state bit and would let you enter S state from unprivileged code. The core was correct at the RTL layer but not correct with respect to analogue effects. RoT designs typically also include mitigations at these layers but they have to be designed based on the specific circuits and it’s really hard to put them in general-purpose cores. The threat model for an AES engine is much easier to express than the threat model for an add instruction (are either of the operands of the add secret? It depends on the surrounding code. Is it preferable for a glitch to give a lower or a higher value for the result of an add? It depends on the surrounding code).
This is why you typically put the programmable logic outside of the trust. The bit that runs the TPM stack is not part of the TCB for key confidentiality or integrity. It is responsible for some bits of PCR state manipulation and command parsing, but if you compromise it then you still can’t exfiltrate keys. You might be able to add different things to PCR state, but that’s about it.
Return instructions are also computed jumps (jump to the link register). A single stack bug can hijack control flow this way. Hence memory safety being a prerequisite for CFI.
It’s about the negative cases. It’s easy to verify that a key stored in a key register never travels somewhere else: don’t put wires anywhere other than the input to the crypto engines. You can statically enumerate all of the possible dataflow paths. It’s much harder to verify that a value stored at some location in memory is never read by anything that can lead to I/O because you have to ensure that no load instruction that leads to I/O can read that address. That’s a global alias analysis problem.
It’s a hard problem. It’s simpler if you can enforce constraints. If the TCB logic is fixed function, your proof obligations on the programmable bit are much less. Ideally, the programmable bit is in the TCB for availability and not confidentiality or integrity, so you just punt on it entirely. Now you have a much simpler problem of getting an attestation over the device, rather than the device plus a software stack. The attestation is a claim from the manufacturer that it provides some security guarantees.
That’s what I mean about the guarantees. Identity is a building block for attestation, it’s not sufficient. The important thing is that manufacturer X makes claims about software Y running on hardware Z. If these claims are untrue, you can sue them. That’s what you’re getting from attestation: accountability that you can use to build legal liability. In the worst case, you need to universally qualify these claims over all values of Y because the OS doesn’t want to carry a list of valid firmware versions. Often, you make slightly weaker claims that rely on the ability to have monotonic versions of Y (e.g. ‘I promise that this gives security guarantees as long as you’re running the latest firmware and, if there are security bugs in the firmware I will fix them within a week’), which is where you get the requirements to be able to do secure update with anti-rollback protection.
DICE (with slight tweaks) is used in most of these things already. The problem is not the firmware, it’s the space of things that a firmware image can possibly do and how you get a guarantee that your firmware is secure in the presence of all of the capabilities listed for attackers above. And it’s about ensuring that things like:
Those are the hard problems. DICE is a small part of solving them, it is not sufficient by itself.
Imagine I have one of these devices and I have an emulator for one that gives me complete visibility into all of the secret keys. How do you enable me to move the keys from firmware X on the real device to firmware X+N on the real device but not to:
DICE is an important building block for enforcing this kind of policy, but it’s nowhere near sufficient by itself, and without being able to enforce that policy you have no security for your secrets and may as well just store them in the kernel.
This gets even harder when you’re using it as part of your secure boot chain (which is what I want from a root of trust) because any change to the DICE signature for the RoT’s firmware will change the PCR values for the next steps of the boot. I can’t rely on code that loads after the RoT’s firmware to verify its attestation because it is able to tamper with any of that code, so I must be able to provide an attestation over trust properties from the RoT initialisation that I can use for the PCR values, not an attestation simply of identity.
This is why the good designs have a separate core in the same package as the main chip. If you want to see it done really well, take a look at the Xbox One. There’s not much public, but you can probably find people who have reverse engineered bits of it.
Looks like we’re converging. Thanks for the detailed reply.
Well if the NSA gets a hold of Snowden’s computer I guess that’s what we have. I do reckon under this model though that there’s no escaping the energy side channel. Which kind of means masking everything that ever accesses the secrets, and… my that’s expensive. And we can probably forget about RAX designs (I’m told masking addition is expensive), and stick to stuff like AES and SHA-3 instead.
One thing that seems impossible to mitigate though is bignum arithmetic. How can we deal with elliptic curves or RSA when everything must draw constant power? Do we switch to curves with binary fields? I’m told the security literature is less sold on their mathematical security than prime fields, but that’s the only way I can think of to avoid multiplying stuff.
Or we bite the bullet and make a general purpose CPU where every operation that is naturally constant time, is masked to be constant energy as well. Addition, multiplication, bit manipulation… all constant energy, guaranteed by the CPU. And while we’re at it stay in-order and remove speculative execution, cryptographic code is unlikely to benefit from out of order speculative cores anyway. Making a CPU like this is probably a bear, but this could have the tremendous advantage of making constant time code automatically constant energy.
I won’t dispute that is a hard problem, and I understand the appeal to limit constant energy hardware to specialised operations instead.
I’m not sold on everything here. Specifically:
If the device derives one key pair for encryption, and another key pair for signature I kind of get your wish. But if I want to perform signatures for Debian packages and personal emails, I could allow users to use different keys if I accept a domain separation string, but I can’t prevent them from using the same domain separation string for two different purposes.
Also, isn’t the guarantee that different domain separation strings causes us to use unrelated keys enough?
It’s a nice to have if firmware can’t be expected to be bug free, but I’m not entirely sure which use case requires that not only the keys are preserved, but the upgrade can happen in an hostile and offline environment. For instance, if we have encrypted secrets and need to change the firmware, we could decrypt the thing, change the firmware, then encrypt the thing back. That simple procedure only works in a trusted environment, but I’m not sure how much of a show stopper this really is.
Protection models are implementation details, so that’s not too bad. An open ISA is good. The interface to the fixed-function units however is part of the ISA: the firmware needs to call it one war or another.
I would almost forgive the vendor for not disclosing the exact algorithm used under the hood. Is it AES-GCM? ChaPoly? I don’t really care if I know where to put the plaintext, where to retrieve the ciphertext, and how big the authentication tag is. But I do need to know how to interact with the unit, so I hope this particular part is not under NDA.
Yeah, I want the end user to have a choice what firmware they use. The primary use case would be the ability to use different cryptographic primitives (the old one is broken, or the new one is faster, whatever).
I also want to enable end users to load arbitrary firmware they would then trust on first use. Yes, this means their entire computer, including the insecure parts, is assumed trusted as well. But there’s a difference between trusting your computer now, and trusting it for 2 years straight working abroad and leaving your laptop at the hotel. It’s a freedom/security trade-off mostly.
Thanks. But there’s something more fundamental about it, I think: the TPM specifically supports a lot of cryptographic primitives, and a lot of use cases. Because it kinda has to: too many users to satisfy there. Doing something similar for a single customer having a precise use case in mind would automatically divide the size of the specs by a couple orders of magnitude.
At the same time though, the generality (if not genericness) of the TPM is desirable, and DICE seems to be a nice way to keep that generality while keeping things simple. Well, simple if you punt on the firmware, at some point some one does have to write bug-free code. Hence the “which is harder, software or hardware?”. I’ve always thought the two were comparable, and bet on tailored software being much smaller than kitchen-sink hardware, and as such, proportionally easier.
It’s kind of cheating, but DICE gives you that out of the box: firmware X and X+1 are different, so they get different keys. One could load firmware X and exploit a bug to get its keys, but this wouldn’t reveal the keys of X+1.
Depend what you mean. Keys accessible from X are toast, and there’s no way the user can perform an upgrade from X to X+1 in an untrusted environment. But if user can consider themselves “safe enough” to trust the environment around the chip, they can perform the upgrade and just assume no MitM screws up the update process.
We may have a solution:
Maybe we could use fixed-function units instead, but I’m deliberately avoiding those to minimise hardware requirements. In any case, I agree DICE is not enough. Though if my special region trick works, it’s pretty close. Now, assuming no exploitable bug in the relevant firmware:
If I’m using version X and want to upgrade to version X+2, skipping X+1 because that one has a critical key extraction vulnerability, I’m probably screwed if I can’t get access to a revocation list (and most likely a reliable clock as well), I’m liable to be tricked into upgrading into X+1. The only mitigation I see here is upgrading regularly, hoping exploits for X+1 don’t have time to get in effect before I upgrade to X+2.
Oh, and one huge red flag about my approach: the CDI is transferred from one firmware to the next, so the very mechanism we use to mitigate vulnerabilities, is itself a source of vulnerabilities! I really don’t like that kind of… security/security trade-off.
Sorry if I missed something, my screen is no longer big enough to fit a useful subset of your comment and my reply on them and so you’re relying on my attention span, which is severely impacted by the nice weather.
These attacks are now feasible with hardware that costs a couple of thousand dollars. Five years ago it cost tens of thousands of dollars. Within the lifetime of the device, I expect it to be hundreds of dollars.
It’s non-trivial but there are techniques for building constant-power large multipliers. I expect to see some of these things exposed in a slightly more generic way as people start caring more about post-quantum security (for devices with a 10+ year lifetime, support for post-quantum encryption is now a requirement, but no one knows what the right algorithm is).
It’s not just that it’s hard, the problem is that it impacts power / clock frequency for everything. With a clean split, you don’t care too much about leaks from the general-purpose core because that’s outside your TCB for confidentiality and integrity and so you can make it fast / efficient and you can make the fixed-function bits fast and secure but less power efficient.
It’s mostly about defence in depth (which is a good principle for the whole system). For example, for WebAuthn, you really want to have a single secret that’s used with a KDF and some other data to generate a key that’s used for signing. You want to enforce the policy that the secret used with the KDF never leaves the device and is not used except as input to a KDF. You also want to enforce a policy on the derived keys that they also never leave the device and are used only for signing. This makes it harder for a compromised OS to leak the key (especially if you also throw in some rate limiting).
My original post had one: I am dual-booting Linux and Windows, using BitKeeper for encrypting my NTFS partition and LUKS2 for my ext4 one. Linux doesn’t trust Windows, Windows doesn’t trust Linux. With the TPM today, both can hold disk encryption keys (which can be further protected by a PIN / password) that the other can’t use. If an attacker installs malware that compromises the NT kernel and gets full access to the TPM interface, they still can’t decrypt my Linux partition.
From this use case, it follows that either Linux or Windows should be able to upgrade the firmware on the device, without destroying the utility for the other. If everything that’s using the device needs to cooperate in updates then I have a difficult operational problem and the likely outcome is people don’t update the firmware and keep running vulnerable versions.
Given that even formally verified software isn’t bug free (formal verification aims to ensure that all of your bugs exist in the spec, sometimes it just guarantees that they exist outside of your abstract machine), I think assuming that the firmware contains bugs is a safe axiom.
Probably quibbling about semantics, but to me the ISA is the set of instructions that run. Interfaces to things that run outside of the main pipeline may be architectural but they’re not part of the instruction set architecture.
Typically you don’t even get the datasheets for these things without an NDA. The algorithms used may be in marketing material (because people looking for FIPS compliance read those before bothering to sign the NDA). How keys are protected and the security model is definitely NDA’d in most cases. In a few cases because the offerings are utter crap and everyone would point and laugh, others because they’re doing quite clever things that they don’t want competitors to copy.
Maybe. On the other hand, if you get a chance to look at what Apple does in their Secure Element, you might long for the days of something as simple as a TPM. Feature creep is really easy. A couple of requests that I’ve seen for people to run on the RoT:
Once you have a secure box, everything wants to live in the secure box. See also: TrustZone.
I still don’t think that DICE gives you enough unless you can ensure that your firmware is bug free. You need some signing infrastructure, attestations, and a trust model on top. And that’s complicated.
But that’s not solving the problem. I need to be able to get the keys from firmware X+1 because otherwise a firmware upgrade locks me out of everything I’m using this device for.
That kind-of works. The problem is that you can brick your device (or, at least, lose access to all keys) by installing a buggy firmware that doesn’t allow updates. You need to build some kind of A/B model on top. Normally this is done by running firmware A, it installs firmware in slot B. Set a flag so that B boots next but A boots after that. B then boots and runs some checks. B then updates the flag so that B always boots next and deletes A.
With your model, the first time B boots, it would be able to decrypt the keys and reencrypt with its CDI. This makes me super nervous because that’s the point the firmware has access to I/O and to the keys, so I just need to find one buggy firmware that’s newer than the current one to exfiltrate all of your keys. You really need to make sure that you keep up to date with updates (a few iPhone jailbreaks have worked this way: don’t upgrade for a while, wait for a vulnerability to be found, upgrade to the vulnerable version, run exploit).
This does meet my requirement for an untrusted OS being able to do the upgrade though and it’s probably easy for the driver on the host to do a periodic version check and push out updates if there’s an old version so you need to actively cooperate with an attacker to allow the delayed-upgrade attacks.
I’m happy now that there is a process, but it’s taken a thread four times longer than your original post to get there. This is the kind of thing that keeps me coming back here, thanks for your patience!
I still think you want to do a lot more in fixed-function logic if you want something secure against someone with more than a few hundred dollars to throw at breaking it though.
Thank you for yours. I found this process in no small part thank to this discussion, and I’ve learned a few things too. I don’t think there’s much I seriously disagree with any more, so I’ll just reply with some of my thoughts.
Heat wave ongoing at home, my nights are no cooler than 28°C… and I’m kind of high on sleep deprivation. Next summer we’ll definitely install an A/C.
Makes sense. I prefer to us a slightly more expansive definition: the ISA is everything I need to know to make software for a piece of hardware. To me it’s a slightly more useful definition because it really defines the contours of what I want to know, and the limits of what I believe is acceptable for an NDA.
Fuck them I guess, then? I want full ownership of what I buy, and that includes the right to explain how to use it. Hopefully these NDAs only happen in business-to-business transactions, where those freedom considerations matter a lot less.
Okay, this is so much worse than I thought. I guess I can assume any police department or dedicated criminal have those, or will soon. Great. Now hardware security requires power side channel resistance. That sets expectations I guess. Good to know regardless.
That’s good. Though I’m not entirely sold on the utility of huge multipliers. For instance when I compare Libsodium (that uses 128-bit multipliers) with Monocypher (that stops at 64 bits), the advantage of the bigger multipliers is only about 2x. Now the actual measure is screwed up by the crazy complex out of order architecture, I don’t know how many multiplier units there are in my CPU, and haven’t looked at the assembly. Still, I suspect that roughly speaking, the speed of bignum arithmetic is roughly proportional to the length of your biggest multiplier. (Schoolbook multiplication suggests a quadratic relation instead, but bigger multipliers are most likely slower.)
It may therefore be enough to use smaller multipliers and chain them or loop with them (with a hardware control unit, microcode, or even firmware). And if we can have constant power multipliers, then constant power everything is not so far out of reach. Though again, given the consequences on hardware design, this probably means sticking to a simple and weak CPU.
Ah, that one. It slipped my mind. That looks legitimate indeed. Still, I would like to try and cop out of this one by using small firmware.
The idea is simple: what if the firmware in this particular case is used only for OS boot, and maybe there’s a standard so everybody agrees on, if not a single cipher-suite, say a very small set thereof? If the thing does nothing more than measuring code and giving the go/no go, then all you need is a KDF? That’s a couple hundred lines of code at worst, significantly smaller than even TweetNaCl. So we throw all the formal methods we can, like writing a certified F* compiler that outputs RISC-V code directly, test the hell out of this thing… and maybe we’ll never need to update it?
Perhaps I’m grasping at straws, but I did say it was a cop-out.
Bug-free firmware is (almost?) as critical as bug-free fixed functions, no doubt about that. Which is why I want to keep it as small as possible. Just please don’t destroy my dreams…
…You just destroyed my dreams. Well, we both saw that thread on no one actually wanting simplicity. No solution there, except perhaps implanting a bomb in their hearts that will explode if someone manages to exploit a bug in the wild.
To be honest, excluding cryptographic primitives and communication library, to me the maximum acceptable firmware size is around 50 lines of C code. 200 as an absolute maximum. If it has to be any higher, this seriously impacts the trust I have in it. As for the size of the cryptographic code, anything bigger than Monocypher is a bust in my opinion. We may not have the same tastes with respect to fixed-function units vs firmware, but I do agree on one thing: what we trust the keys with should be really, really, really small…
…even if in practice it won’t be.
I’m guessing that it’s easy to test that the new firmware still allows update. A/B is safer in that respect, but that’s still more stuff to add to the thing, and as always I want the bare minimum.
I’m nervous for the same reason. The iPhone jail can take hike though. I guess most users would have regular control over their machine, and can update regularly… except they can no longer do that once their computer is stolen by a determined adversary.
Hmm, so hardware security mandates that no firmware, current or future, can ever be exploited into exfiltrating the keys. So we want to update it as infrequently as possible. This means minimising the reasons for updates, so the API of the firmware has to be very stable. Which it is more likely to be if we manage to keep it small. Again.
That’s still architecture (as opposed to microarchitecture, which may change between versions or across vendors) but it’s not instruction set architecture. The difference between the two is important in a lot of cases (for example, the Arm GIC specification is architectural, but it’s not instruction set architecture).
As you say, this makes more difference on smaller in-order cores. It’s also worth noting that a few multiplication units have special cases for multiplies by zero, which means that you may find that splitting into smaller parts introduces power and timing side channels.
It’s far more important for post-quantum algorithms though. These seem to involve a lot of huge (on the order of KiBs) numbers that need multiplying so having a nicely pipelined big number multiplier can improve performance and let you do all of the power / perf optimisations that you want in your normal multiply (assuming you have one - a lot of embedded cores lack hardware multiple or divide).
Unfortunately, that’s exactly what the TPM was supposed to be (though it also needed to define the communication protocol, which is a critical part of the system). Once you’ve covered all of the use cases, I think you’ll end up with something almost as complex as the TPM spec. Actually, possibly worse because doing it now people would insist on some post-quantium signature algorithms and the ability to plug in new ones after there’s consensus on the right ones to use.
The bit I’d be most worried in is the protocol parsing and I/O code and you’d be lucky to get that down to 200 lines of code. Fortunately, verification of protocol parsing is quite easy. One of the spinoffs from EverCrypt is a thing that lets you define a binary protocol and will then generate C serialisers and deserialisers. As long as you sprinkle enough volatile in there that the compiler doesn’t introduce TOCTOU bugs, you’re probably fine.
But I think you’re coming around to part of my world view which is that the key thing that you need from the hardware is support for fine-grained compartmentalisation. If you have compartmentalisation, you can make the amount of code that has access to the keys tiny and move the rest of it out of your TCB for key confidentiality.
That’s the problem and why I consider most confidential computing things to be dual-use technologies: it’s very hard to build something that can be used for letting me run code on a computer that I’ve rented without the owner being able to corrupt or inspect it, but doesn’t allow DRM-like applications. Once you think of them as dual-use, that leads to treating them in law like other dual-use technologies and regulating the use, not the technology. I’d personally love to see fair use strengthened in statute law such that any form of DRM that prevents the end user from exercising their fair use rights is, if not corrected within a small time window, grounds for immediate revocation of copyright. If you rely on vigilante justice then you don’t get the protection of law.
You should take a look at the CHERIoT platform. I think we’re providing you with a lot of the building blocks that you need. Oh, and the first consumer hardware (hopefully shipping next year) will come with OpenTitan so you get some fixed-function crypto bits and a DICE implementation out of the box, in addition to object granularity memory safety and (up to) function-granularity compartmentalisation.
Oh, what I called ISA you call “architecture”. Makes sense. Besides, all I want is a shortcut to point to what I mean, so “architecture” it is.
Crap, I forgot about those. Well obviously if we design a CPU to use in a secure element we wouldn’t use that kind of shortcut.
I didn’t know about those. If most such algs do indeed multiply huge numbers together, huge multipliers have more value than I thought.
Okay, let’s check what Tillitis have done with their bootloader. It’s one of their biggest firmware from what I could gather. So, their protocol parser seems to be about 200 lines, give or take, though their coding style makes it artificially high. I think my coding style would squeeze it in 100. They have a 150 lines standard lib, which I don’t think I can meaningfully reduce. They have a BLAKE2s implementation for the KDF of course. Their main is fairly big, over 300 lines.
OK, maybe I was a little optimist there. I’ll need to try stuff out, maybe there’s a way to compress their code further, or at least put the common bits in a library. About that: one neat thing the TKey does, is give access to parts of the code that made the bootloader. Most notably BLAKE2s. That way programs may avoid bringing some of their own code, making not only their source code smaller, but their binary as well, and increase available RAM in the process.
The same thing can be done with the basic communication facilities. You could have a simple API that let you send and receive messages of limited size, and handle the tag & size for you. If it’s good enough everyone would use it, and the only job left is parsing what goes inside the messages. Which is easy if we’re sticking to reasonable fixed or TLV formats. Which we can: we’re controlling both sides of the channel.
I’ll need to experiment to know for sure.
That’s an excellent point, that actually influences an unrelated essay I may write soon: in the Andor StarWars series (highly recommended by the way), there’s the notion of imperial tech. Widespread, convenient, but ultimately serves the Empire before its denizens. Like that radio whose spyware got Cassian spotted at some point. An effective rebellion needs to do away with those, which is why Nemik uses an old, hard to use navigator to help with a heist: it’s free from Empire influence.
The problem however with that dichotomy is that it fails to account for dual use. There only so much time for a Cory Doctorow lecture in a StarWars series after all. But it does call into questions some of the ideas I had for our real world, and I need to think about it.
Will do.
I do indeed. Though I still have a problem with how it might complicate the architecture of that hardware. Here’s an idea I’ve just got: divide the firmware into isolated compartments. A given compartment can do 2 things:
And then there’s one “main” compartment, that can also do I/O.
How to use this? If we want to keep things easy we could put all the firmware in the main compartment. No isolation, but easier prototyping. But if we want things to be secure, then we write one compartment for what would otherwise be a fixed-function unit (one compartment for X25519, one for EdDSA…), isolate more complex protocols in their own compartments as well (the Noise XK compartment would for instance call in the X25519 compartment), and the main compartment would only do the I/O and parsing.
Is that what it’s doing? I didn’t read the design that way…
If you want to plug in arbitrary crypto algorithms, you need to run them on the programmable core, not in fixed function units. If you want to run them on the programmable core then you need it to have raw access to keys. If the programmable core has access to keys then you have no way of protecting the, from bugs in the firmware. If you have persistent keys then one buggy or malicious firmware image can leak keys stored by others.
Damn, you really did miss the central point of my entire post. Please re-read it, and tell me where you get lost, or what specific point you think is incorrect. I’ve had feedback about my article not being crystal clear so that may be on me. Unfortunately I don’t know how I can make it better yet.
In the mean time, I can answer more directly:
Correct so far.
Not quite. The programmable core does not need access to root keys. It does need access to some key, but DICE makes sure this key is independent from the root key, and unique to the particular firmware being loaded.
No, it cannot. DICE makes it flat out impossible.
Programmable firmware cannot read the root key, so it cannot leak it. It cannot read (or compute) the derived keys of other firmware, so it cannot leak those either. The only key it can leak is its own. Malicious firmware can leak their own key, but that doesn’t do anything. Buggy firmware is more problematic (their own secret is trusted until we find the bug), but then you fix it, and the new firmware automatically gets a new key.
I hope this helps clear things up, because we can’t have a meaningful discussion if you don’t understand this point.
Only if you have fixed-function hardware doing key derivation based on PCRs. And that means that you depend on fixed-function implementations of PCRs and a KDF, at an absolute minimum, and you can’t plug in arbitrary KDFs.
You seem to be currently assuming that the thing in the PCR is the entire firmware image. As other have pointed out, that complicates your update process because now you need your new firmware to export your keys, wrapped in a key that the new firmware can decrypt.
I’m not even sure how you would do that because the only key that the new firmware can trust is one that’s derived from the secret and the PCR (anything not derived from the device secret could be run on an emulator and used to leak your keys) and what you really want is something where the old version can derive a public key and the new version can derive a secret key. Again, this is something that you can do if you have asymmetric crypto in hardware (as with a typical TPM implementation), because you can allow arbitrary code on the fixed-function unit to derive a key pair from the secret and an arbitrary value, which then doesn’t grant you access to the secret key and lets you access the secret key only if you use a PCR value as the input, but now you’re relying on more key derivation in hardware.
More conventional uses of DICE use a hash of the signing key as the value embedded in the PCR, which means that two firmware images signed by the same key can have access to the same keys. But that means you need signature verification in hardware[1] and that same logic is responsible for anti-replay. Anti-replay is hard if you want to allow multiple different firmwares to be installed by different users. This is normally implemented with two things:
These are easy only with a single signing entity. If you want different signatures then you probably can’t do the fuse thing and the NVRAM value would need to be one of a bank (if you allowed a finite number of firmware sources to be protected).
You’re also pushing a lot of complexity into software here and it’s really unclear to me what your goal is.
[1] Or initial-boot firmware in ROM, which amounts to almost the same thing.
Yes! That’s how DICE works! There’s a fixed bootloader with a fixed KDF inside that does the key derivation.
And indeed I can’t plug an arbitrary KDF at this stage. But I don’t care, because what KDF is used is immaterial to my perception of the CDI. If I put the same firmware on a different HSM it will get a different random CDI, and won’t even be able to tell that v1.0.2 and v1.0.3 are using different KDFs or not.
Well… I think I do? The important part isn’t where the image is stored, it’s the fact that it is measured in its entirety.
We can’t change the firmware without changing the keys, so yes, the update process is made more complicated: any wrapped keys must be unwrapped, stored somewhere safe, and wrapped again… not sure how to best do it, short of having explicit support for this in the fixed functions (which I really want to minimise). As for the firmware keys themselves, they’ll be gone so if anything is derived from them we need to rotate them.
One thing the fixed function could do is reserve a small region of RAM to store arbitrary data to pass along the next firmware. Something like the following:
This should void the need for the asymmetric big guns, thus minimising the amount of code in the fixed function. The price we pay for that is preserving state between firmware loads, and to be honest I’m not entirely comfortable with that.
Also note that the old firmware must be able to authenticate the new firmware (most likely by checking a certificate), else I could just use the upgrade path to upload malicious firmware. Oh, and if the old firmware has a vulnerability there, we’re kinda doomed.
If the firmware is signed instead of locally measured this is no longer DICE. Though if we limit ourselves to manufacturer-provided firmware this could be a valid alternative. I do think however that arbitrary, user provided firmware, is too powerful to pass up.
I would really like to understand what problems you are solving with this, because it’s clearly targeting a different threat model and a different set of problems to a TPM. This makes it very hard for me to have a useful opinion because it definitely doesn’t solve the problems that a TPM solves, but it might solve some other interesting problems. I just don’t know what they are.
I want a couple things:
A security token, similar to YubiKey, only more flexible. But those are (i) discrete, and (ii) should be carried on my person instead of tied to my computer. Like, well… a key. Main threat model: fishing, stolen laptop (assuming the key isn’t next to it).
A security token, similar to the TPM, only simpler and more flexible. Something that’s always on my computer, that, despite being flexible to the point of allowing arbitrary code, can reasonably guarantee that its own private keys will never leak. Online attacks are something for my OS to do, I just want to make sure my keys don’t have to be rotated every time a CVE pops up.
A root of trust, that helps guarantee that the firmware & software I want to execute, is the one to execute. Something that guarantees that unwanted bootloader or kernel modifications don’t survive a reboot. The main threat model is software attacks (vulnerabilities in my programs, OS, or me clicking on the wrong thing), but I would like some resistance against physical access too though that’s impossible with discrete chips (the main execution unit and the secure element must have a secure channel between them, or they must be one and the same).
Important thing about the root of trust: I need to be able to tell the machine to trust arbitrary software. My software my choice. Letting big companies dictate what I’m allowed to execute on my machine is not a goal.
I feel like this could use a screen to authenticate the program itself. If I try using the key to log into somewhere, but it gets sent the program for getting my disk decryption password, I should be able to reject the request.
I assume the User Supplied Secret is supposed to help with that, but then you’d need unique secrets per use-case. You can’t just store them in your password manager, as it’s probably on the workstation you’re working from—so, the workstation that could perform this attack.
With a screen, I could just peek to see if it’s displaying “LOGIN SOMEWHERE” instead of “DECRYPT DISK”.
If I understand what you’re saying correctly that’s handled automatically. Say we have two use cases:
In all likelihood those two use case would have their dedicated TKey programs.
Those programs are different, and as such use different CDIs. Swap them unwittingly, and not only will they not work (because their protocols would be different and their responses would not make sense), they cannot work, because they simply have the wrong key.
To be fair, the same HSM program may be used for different use cases. For instance, “login-somewhere” and “sign-package” would probably use the exact same signature HSM program, and that program would therefore have the same CDI, and ultimately, the same signing key pair (as you noted, relying on the USS is not enough, the user may mix passwords up). Problem is, you really really don’t want to use the same key pair for both use cases: what if you login to the website, and as a challenge it makes you sign a nefarious package? Congratulation, you’ve just endorsed a virus that will make its way into the next Debian distribution!
The standard counter for this is to use a domain separation string. Our signing program, instead of using the CDI directly as a private key, should instead hash it together with a domain separation string stating its purpose. In our case that would be something like
"login://example.com"
and"endorse-debian-package"
respectively. That way you’d have no more than one purpose per key pair, and there’s no way that login somewhere would cause you to endorse viruses.Now that was for the HSM side. The screen you speak of would happen at the host side. I can only agree with you here, when you plug the key for some purpose, it’s nice to have a visual indicator somewhere that tells you what your key is doing for you. That’s one of the purpose of the touch sensor by the way: if the key requires you to touch it before it does it thing, it gives you the opportunity to look at your screen, and see what your computer is up to.
But even then that is mostly kind of automatic in many cases: when you sign a Debian package, you must select a
.deb
file to sign. That’s a fundamentally different UI from login into a web site, which would likely use a browser extension to talk to your key directly. So in this case there’s no need to explicitly differentiating the two… Then again, sending “LOGIN SOMEWHERE” or “DECRYPT DISK” to the OS’s notification system doesn’t hurt.If you can swap the programs sent, you likely control the further communication. I was thinking that a malicious workstation would send the disk encryption program and then communicate with it just as the bootloader would.
I’m mostly thinking about minimizing the potential damage if the entire system is compromised.
Ah, I see… but there’s no way to control a user’s screen from a separate USB port, so that’s kinda moot.
The only guarantee you have in the case of full system compromise is that the keys inside the HSM don’t leak: if the firmware is legitimate and bug-free nothing will leak, and if the firmware is malicious it won’t have the right keys to begin with. But I don’t think you can stop a compromised host from loading legitimate firmware and diverting it from its primary purpose.
I meant a screen on the device. Maybe the key could be sold with a connector that you could attach an optional screen to, or something like that.
You can, if the device has a screen, and some sort of input (touch sensors in the TKey). You won’t really be able to see the details if you’re e.g. signing a package, but you could see the type of operation that’s happening, and the domain separation string. It’s not perfect, an inattentive user could not notice that something is off, but imo it’s still a big improvement.
edit: ugh, I keep forgetting about Markdown doing that
(Skip a line between the quote and the response, else the response will show as if part of the quote.)
Didn’t think of the on device screen, sorry. Yes, that does sound like a good idea.
How do security updates work if the derived secret is bound to the image?
… I guess if images were able to call into each other, then a newer image could wrap an older image. That doesn’t solve feature upgrades though. Hmm.
Updating the images changes the key, so… you need to revoke the old key and use the new one. If you need the old key for the migration process (encrypted disk, wrapped keys…), use the old image do decrypt, and encrypt again with the new image.
I reckon it’s bloody tedious.
The above was also my main issue when reading the post. I thought that having a persistent keypair that is tied to the machine is an explicit feature of a TPM. Derive subkeys all you want but that root should remain static.
So when your secret suddenly changes due to software updates and you have no way of obtaining the correct secret without running an old, now known-insecure copy of your program .. you have a problem, no?
Correct, I do have a problem. But I believe the solution is easy: control the environment so the update process itself doesn’t leak the old keys. Because the old firmware is vulnerable doesn’t mean my update program has to exploit it.
The problem doesn’t go away entirely with the classical approach: if your old firmware is vulnerable you need to assume its keys have been leaked, so you can no longer talk to your TPM over an untrusted channel to perform the update. You need the controlled environment just the same. And on top of that you need to change the possibly leaked old keys, and cross fingers that you have enough fuses left to blow.
Would it be possible to use a TKey for secure boot? I don’t know much about that process itself.
Compatibility issue with existing bootloaders would probably prevent you to use anything other than TPM 2.0. One second obstacle is that discrete TPMs are connected to your motherboard through an I2C bus, while the current TKey needs a USB port.
In principle though, they’re both HSMs with similar capabilities (though the TKey is more flexible). I see no obstacle to using the TKey to do secure boot, especially on an embedded device you can write all the firmware from.
(The one thing I haven’t wrapped my head around yet is how secure boot can even work with a discrete HSM: what prevents me from sending one bootloader to the HSM, and then booting with another bootloader anyway? I feel like there’s a missing component that should enforce the submitted and actual bootloader to be one and the same.)
I think you have to trust some of the mobo firmware/hardware to be tamper-resistant. And the security of the system depends on that tamper resistance.
A separate portable HSM like the Tkey could help a bit by forcing an attacker to get the contents of the internal HSM (so they can imitate it), which is harder than bypassing it. But you can do similar with only an internal HSM by requiring key material from the internal HSM to decrypt your drives, so it doesn’t seem like a big deal to me.
But if the attacker completely controls the laptop without you knowing, including the internal HSM, then I don’t think there’s any way a discrete HSM can help.
Yes, this is one thing I didn’t quite know (only suspected) and have been convinced of only the last few days. With a discrete TPM we can always send one bootloader for measurement, and execute another anyway. This opens up the TPM, and if BitLocker/LUKS didn’t require a user password we should be able to decrypt the hard drive despite using hostile/pirate/forensic tools instead of the expected chain of trust.
This means I wrote a mistake in my “Measured Boot” section of my post, I’ll correct it. And ping @Summer as well: sorry, the answer is no, because even though you could use a discrete TKey-like chip to plug into the I2C bus of your motherboard instead of a TPM, discrete chips most likely don’t actually work for secure boot. It only takes one Evil Maid to insert a TPM-Genie or similar between the motherboard and the discrete chip, and it’s game over.
But. If hardware security is not a goal, I think there is a way. If the motherboard can guarantee that it sends the bootloader it will execute over the I2C bus, and nobody “borrows” your laptop to insert a Chip in the Middle, then you can guarantee that the bootloader you execute is the one that is measured. Therefore, no software bootloader-altering attack will survive a reboot.
As another note, if an attacker has physical access to the machine (required to break into the internal HSM unless it is buggy), then they can do lots of other attacks if they instrument the laptop and then return it to you:
I think really good tamper-resistant cases (if such things exist) or rigorous tamper-detection procedures are maybe the more important defence than clever tricks with HSMs for organisations that want to protect against physical attacks on portable hardware.
Not mainstream, but:
Neat. Thanks for sharing. If the hardware is fast enough then I’d hope that this becomes universal to protect against physical attacks and rowhammer-style attacks.
I’m not sure it protects against rowhammer, you can still induce bit flips they’ll just flip bits in cyphertext, which will flip more bits in plaintext. It may make toggling the specific bits in the row that you’re using harder. Typically these things use AES-XTS with either a single key or a per-VM key, so someone trying to do RowHammer attacks will be able to deterministically set or clear bits, it will just be a bit tricky to work out which bits they’re setting.
On the Xbox One, this was patched into the physical memory map. The top bits of the address were used as a key selector. On a system designed to support a 40-bit physical address space with 8-12 GiB of physical memory, there was a lot of space for this. The keys for each key slot were provisioned by Pluton, with a command exposed to the hypervisor to generate a new key in a key slot. Each game ran in a separate VM that had a new key. For extra fun, memory was not zeroed when assigning it to game VMs (and, from the guest kernel to games) because that was slow, so games could see other game’s data, encrypted with one random AES key and decrypted with another (effectively, encrypting with two AES keys).
Sure, you’ll probably still be able to flip bits by hammering rows, but I think gaining information with it will be much harder with encryption. I’m not 100% on any of this, but I think not having knowledge of the bit patterns physically written to disk may make it more difficult to flip bits, too.
Can you specify what was stupid about AMD-SEV? I tried to work with it and remember being disappointed that it didn’t run a fully encrypted VM image out of the box, but you may have something more precise in mind?
SEV was broken even before they shipped hardware. The main stupidity was that they didn’t HMAC the register state when they encrypted it, so a malicious hypervisor could tamper with it, see what happened, and try again. Some folks figured out that you could use that to find the trap handler code in the guest and then build code reuse attacks that let you compromise it. They fixed that specific stupidity with SEV-ES, but that left the rollback ability which still lets you do some bad things. SNP closes most of those holes (though it doesn’t do anti rollback protection for memory, so malicious DIMMs can do some bad things in cooperation with a malicious hypervisor) and is almost a sensible platform. Except that they put data plane things in their RoT (it hashes the encrypted VM contents, rather than doing that on the host core where it’s fast and just signing the hash in the RoT) and their RoT is vulnerable to glitch injection attacks (doubly annoying for MS because we offered them a free license to one that is hardened against these attacks, which they’d already integrated with another SoC, and they decided to use the vulnerable one instead).
I worked with Arm on CCA and it’s what I actually want: it lets you build a privilege-separated hypervisor where the bit that’s in your TCB for confidentiality and integrity is part of your secure boot attestation and the bit that isn’t (which can be much bigger) can be updated independently. It’s a great system design that should be copied for other things.
There’s AMD SEV for this.
Looks I underestimated how difficult tamper resistance actually is. Looks like the problem is actually fundamentally unsolvable: they take my machine, modify it a little bit, and I get an altered machine that spies on my keystrokes, on my screen… so realistically, if I have reason to believe my laptop may have been tampered with, I can no longer trust it, and I need to put it in the trash bin right away… Damn.
At least the “don’t let thieves decrypt the drive” problem is mostly solved.
Cool. I didn’t know about AMD SEV.
I’m sure I’m telling you things you already know, but with security you have to have a threat model or it’s just worrying/paranoia. Probably nobody in the whole world is interested in doing an evil-maid attack on you, and even if they are interested they would probably prefer a similar but less tough target if you took basic steps to protect yourself. If you have a threat model then you can take action. If you assume that your attacker has infinite motivation and resources then there is nothing you can do.
I have two relevant-ish anecdotes about that.
The first dates back years, I was attending a conference on security, mostly aimed at people who are more likely to be targetted than others. Because their work is not only sensitive, but because it is controversial. The presenter considered the Evil Maid attack likely enough that it was irresponsible to leave your laptop unattended at the hotel. That special precautions should be taken if you were ask to hand over your phone before entering a… microphone-free room (do you actually trust whoever took your phone not to try and tamper with it?) So as much as I don’t think those things apply to me right now, it does apply to some people, and it’s nice to give them (and perhaps more importantly, the Snowdens among them) options.
The second one was me working on AMD SEV at the beginning of this very year. So we had this “sensitive” server app that we were supposed to sell to foreign powers. Since we can’t have them peek into our national “secret sauce” (a fairly ordinary sauce actually), we were investigating thwarting reverse engineering efforts. Two options were on the table, code obfuscation and DRM. Seeing the promise of AMD SEV we went the DRM route.
Cool, so AMD SEV transparently encrypts all RAM. But once I got to actually set it up, I realised it wasn’t an all-encompassing solution: AMD-SEV doesn’t run a fully encrypted virtual machine, only the RAM is encrypted. All I/O was left in plaintext, and the VM image was also in cleartext. Crap, we need full disk encryption.
But then we have two problems: where do we put the encryption keys? Can’t have our representative type a password every time our overseas client wants to reboot the machine. So it must be hidden in the TPM. Oh and the TPM should only unlock itself if the right bootloader and OS is running, so we need some kind of Trusted Boot.
At this point I just gave up. I was struggling enough with AMD SEV, I didn’t want to suffer the TPM on top, and most of all I neither believed in their threat model nor liked their business model. I mean, the stuff was a fancy firewall. We make extra effort to hide our code from them, and they’re supposed to trust us to secure their network?
Sure, and I hope you were able to find a job/project more to your tastes after the firewall DRM project.
The DICE idea was interesting and cool, thank you for sharing.
Do you feel it is worth sticking to a simple RISC processor without any cryptography accelerating hardware? Seems that AES and other symmetric constructions can be done consistently in hardware. What about elliptic curves?
Well… the lack of hardware acceleration is rarely a show stopper, but adding it basically never hurts. So I would say no, it’s not worth it. Sticking to a simple RISC processor, yes. Abandoning any and all kinds of crypto hardware acceleration, hell no. Thankfully RISC-V ratified a set cryptographic extension that include things like AES, carry-less multiply (GHASH), and SHA-2 acceleration.
Now a dedicated crypto core that implements a big part, or the entirety of, a crypto primitive, I’m not sure it is worth the trouble. Sure it would be blazing fast, but that’s a lot of dedicated silicon to a single primitive, which better be central to your whole HSM. For a general purpose HSM similar to the TKey, I would first try to optimise the scalar CPU itself: 3-ports register file, execution pipeline, basic branch prediction… all the things that let me approach 1 instruction per cycle.
Then I would reach for crypto scalar extensions like the one RISC-V ratified, those easily multiply your speed by 2 to 4 (likely more for AES and GHASH) for very cheap. Then I would consider the big guns like vector instructions and dedicated crypto cores. I’d likely go for the vector instructions if my priority is Chacha20, or an AES and GHASH crypto cores if AES-GCM is more important.
One thing that’s missing from RISC-V at the moment that I would like to add though (as a custom extension most likely), is a rotate-immediate-and-xor instruction. It would provide a neat speed boost to ChaCha20 and BLAKE2s. Though I won’t complain if the only thing I have is a rotate instruction (which RISC-V already has an extension for). Having to do the
(x << n) ^ (x >> (32 - n))
dance (3 instructions instead of 1) is a real bummer.Basically the only thing elliptic curves need is bignum arithmetic. The best way to accelerate them is to have a big-ass multiply unit. Or several multiply units running in parallel, but that means reaching for vector instruction or going out of order, both of which I would consider kind of last resorts. Besides, the multiply instructions from the RISC-V M extension are already pretty good: one to compute the lower half, one to compute the upper half, that’s as efficient as it gets on a simple in-order core.
We could also consider a Multiply-accumulate instruction (
d <- a × b + c
), but needing 3 inputs mean the register file needs 3 read ports instead of just 2, and that tends to cost quite a lot of silicon. And don’t get me started on the 1-cycle version of this: 3 inputs, 2 outputs (lower and upper), now the register file needs to have 5-ports… it’ll work for sure, but that’s a big ask for an otherwise tiny crypto chip. (If I recall correctly register files with N register and P ports require something like O(N×P²) silicon).Also I have yet to test it for myself, but I’m told the TKey can do a Curve25519 signature verification faster than the CortextM0 (which lack the 32->64 multiply instructions), despite running a core that doesn’t even have a pipeline, on a tiny FPGA. If Elliptic curves can run fast enough even on that, I wouldn’t worry too much about explicit support. Just make sure you have a good execution pipeline and blazing fast multiply units, and you’ll be fine.