One extra fun thing: A student in Tom Melham’s group at Oxford has been formally verifying the implementation of this core against the ISA spec. He’s hoping to complete that work by the summer, at which point we’d have a verified implementation of a secure ISA for embedded systems. Our TCB for confidentiality and integrity is around 300 instructions and that should also be amenable to formal verification, which should give us some very nice end-to-end properties on the system.
Is there some hardware in the pipeline that mere mortals will be able to obtain to play with this? (Sorry if this is mentioned somewhere but skimming through most of it I didn’t notice anything to this effect.)
If you have an FPGA to play with, you should be able to synthesize the CHERIoT Ibex core, though it will require some integration work. We’re hoping to enable it in some open-source FPGA toolchains. The dev container for the RTOS includes the simulator built from the formal model (which you can build yourself if an OCaml toolchain doesn’t scare you).
I didn’t have time to look at replacing xmake with build2 before the release, but now it’s open and you can see what the build process does, I’d love to see how build2 would work in this context.
At the risk of venturing too far off-topic, or getting too deep into zero-sum thinking, I wonder if the rise of CHERI means that we can and should abort efforts to rewrite stuff in memory-safe languages like Rust. If we take for granted that we’re going to leave behind folks stuck on old hardware anyway, and fixing memory-safety through something like CHERI just requires another hardware upgrade cycle, then maybe that gives us freedom, or even an obligation (particularly for those of us that actually like Rust), to go back to the old workhorses of C++ and even C.
I think CHERI shifts the narrative for memory-safe languages somewhat, in terms of memory safety. They are no longer about improving confidentiality and integrity but they are still a win for availability. In CHERIoT, we can reuse the FreeRTOS network stack almost unmodified, but that doesn’t prevent it from having memory-safety bugs. If an attacker can find and exploit them, they can crash the network stack. We can limit the fallout from that, but it still crashes. In contrast, if someone where to rewrite the network stack in Rust, then they could eliminate those bugs at design time.
The flip side of that is that rewriting the network stack in Rust is a lot more effort than recompiling it with CHERI. Of the code that we’ve tried building for CHERIoT, we have very rarely needed to modify the code for CHERI, only to add compartmentalisation. In the worst cases, we’re needing to modify around 1% of the lines of code, so the cost of rewriting in Rust is at least 100x higher.
For new code, it’s almost certainly a better choice to use a safer language. For existing code, compiling it for CHERI and putting it in a sandbox is cheaper than rewriting it (assuming CHERI hardware) and may give you sufficient availability guarantees.
The important thing to remember about rewriting existing software is the opportunity cost. Any time spent rewriting something in Rust is time not spent writing software in Rust that solves new problems.
“The flip side of that is that rewriting the network stack in Rust is a lot more effort than recompiling it with CHERI. “
“assuming CHERI hardware”
Which leads to the real problem. The market has tanked or forced a detour on [almost?] every company that bet on secure hardware for about sixty years. B5000 had some memory safety in 1961. System/38 had capability security. The market chose against those techniques for raw price/performance, integration, and backward compatibility. Even today, they’d be more likely to avoid really leveraging MCP or IBM i for security. In embedded, people avoided secure processors (esp Java CPU’s) for similar reasons. Even mid-sized to high-profit companies mostly avoided RTOS’s like INTEGRITY-178B and hypervisors like LynxSecure to a fraction of a percent of their costs.
Intel had it worse. Schell said a Burroughs guy at Intel added segments to x86 for him which secure OS’s used (eg GEMSOS). Market was mostly annoyed with and didn’t use that. Then, Intel lost billions on i432, i960, and Itanium. I liked i960MX, Secure64 built on Itanium, and companies using both had to port later. Azul’s Vega’s were safest bet since you can hide them underneath enterprise Java but they got ditched IIRC. It looks like market suicide to make or target a secure, clean-slate architecture. Whereas, memory-safe languages and software-level mitigations have been getting stronger in the market over time.
I really like your work on CHERI because it stays close to existing architectures vs dumping compatibility. Others took too many design risks. Yet, we’ve seen the RISC market abandon good RISC’s, the server market abandon good server CPU’s, and Java abandon Java CPU’s. The reasons appear to be:
The market-leading CPU’s had highest performance-per-dollar-per-watt-per-etc. Even if prototypes met specs, the same design might not meet spec requirements in full-custom, high-speed CPU’s.
Compatibility with x86. That still dominates much of the desktop and server market. There’s a lot of reasons to think it would be harder to get right than ARM or RISC-V. The ARM market for non-embedded is huge now, though.
For enterprises, integration into server boards they can plug into their kind of infrastructure. These things need to be ready to go with all the extras they like having on them. For consumers, it’s whatever goes into their phones and stuff.
Integration with common, management software that lets them deploy and administer it as easily as their Windows and Linux machines. Now maybe on clouds, too. For consumers, integration with their favorite apps.
Whoever sells secure hardware in major markets has a tremendous investment to do in hardware/software development. There’s lots of ASIC blocks, prototype boards, ports, and so on. After all that is spent, they still have to sell it at or near cost of everything else to be competitive.
That said, embedded and accelerator cards are the best bets. Might spur on the risk-adverse markets, too. Your project is in one category but still with some of the risks. Accelerator cards would be things like ML boards, baseband boards, etc. I was hoping Cavium slapped secure CPU’s on their network processors but they went to ARM. Same pattern. I’m glad ARM has a CHERI prototype.
Meanwhile, I encourage people to do R&D in all directions, including rewrites and transpilers, in case secure hardware never arrives, is hard to buy, or the company goes bankrupt. (Again, again, and again.)
I agree on most of your points. The ‘massive investment’ bit is why the Digital Security by Design project is investing £170m in ecosystem bring up, including having Arm fab a server-class test chip and development system. This can now run FreeBSD, Wayland and KDE in a memory-safe build (Chromium is not there yet but is closer than I expected), but it doesn’t give up on backwards compatibility: it can still run unmodified AArch64 binaries. Linux support is getting there, though Linux’s lack of same abstractions made the bring-up cost higher than FreeBSD.
In the embedded space, people are a lot more willing to recompile their code for each SoC. We’ve done a lot of work to make sure that, in the common case, that’s all that they have to do.
I’m on the advisory board for the DESCRIBE project, which is looking at exactly that. It’s mostly comprised of social scientists, who are actually qualified to answer that question, I’m just there to help the, understand what the technology can (and, more importantly, can’t) do.
On the government side, a number of people involved remember mandating Ada and what a colossal mistake that was. Regulation requiring the technology is unlikely, but the two biggest things that I think could make a difference are:
Making it a requirement in government procurement. This is part of the reason that we’re looking at RISC-V. Most governments have rules against mandating single-vendor technologies, but if you have a choice of Linux or FreeBSD on Arm or RISC-V then they can mandate it, and that gives a big incentive for x86, Windows, and macOS to support it.
Making vendors liable for damage done by the kinds of bugs that CHERI could have prevented in certain verticals. For example, if you have a memory safety bug that causes medical data to be compromised then the fines get multiplied by some factor. This then causes insurance companies to increase your premiums significantly if you don’t either rewrite your software stack (including the kernel) in a safe language or use CHERI hardware for your critical infrastructure.
I’m on the advisory board for the DESCRIBE project, which is looking at exactly that. It’s mostly comprised of social scientists, who are actually qualified to answer that question, I’m just there to help the, understand what the technology can (and, more importantly, can’t) do.
I’ve done think-tank work in the past and I was thinking about doing some policy memos framing security as a human rights issue. The problem is finding a host institution, I could probably get an internship at Galois if there was some funding available….
On the government side, a number of people involved remember mandating Ada and what a colossal mistake that was. Regulation requiring the technology is unlikely, but the two biggest things that I think could make a difference…
Increasing liability (structured around bug bounties) and altering government procurement regulations to accept a ~20% cost increase for various security requirements is reasonable. However, verifiable security claims are something that I think could be reasonably mandated, such as having formal proofs of correctness for various components.
That’s a really, good start. I think the JIT’s were the wall that some others hit after getting that far. That at least narrows the problem down. That it’s usually a few, critical runtimes might save you trouble. A reusable one per ISA might help for future apps. I’m sure yall are already all over that.
On marketing end, one thing you might consider hiding the CHERI hardware in non-security offerings. They already want to buy certain products for specific capabilities. Maybe they don’t care about security or only a tiny percentage will. Put in a switch that lets them turn it on or off with something not easily bypassable, like antifuse or a jumper. It’s a normal product (eg FreeRTOS) unless they want the protections on. The CHERI vendor can also offer them a low-cost license for the core to create demand. So, the sales of the non-security product pays for the secure hardware which, configured differently, gets used for its true purpose.
(Note: You still try to market secure-by-default products in parallel. This would just be an option to pitch suppliers, esp hot startups, with a free or cheap OEM license to get them to build what might have less demand.)
Similarly, my older concept was trying to put it into SaaS platforms. The customer buys the app mainly with configs or programs in high-level languages. If it performs well, they neither know nor care what it actually runs on. Companies can gradually mix in high-security chips with the load-balancer spreading it out. If the chips start failing or aren’t competitive, move load back to mature systems. You still need people who want to buy it with some risk there. It’s not a business-ending risk this way, though.
I think the JIT’s were the wall that some others hit after getting that far.
There’s work ongoing on OpenJDK and v8. There’s a long tail after that, but those are probably the key ones.
On marketing end, one thing you might consider hiding the CHERI hardware in non-security offerings.
Specifically in the CHERIoT case, one of the advantages is that you can use a shared heap between different phases of computation, provided by different vendors, with non-interference guarantees. Oh, and we’re giving the core away: any SoC maker can integrate it at no cost.
Even better. “Giving the core away” really jumps out. That demands a few, follow-up questions:
You said CHERIoT. Is it just that core pre-made? Or can they integrate CHERI architecture, your build or their independent builds, into any commercial core with no licensing cost?
Is that in writing somewhere so that supplier know they can’t get patent trolled by Microsoft or ARM? Are they doing the open-patent thing or no patents on any of it? ISA’s etc are a patent minefield. Companies will want assurance.
The CHERIoT Ibex core is Apache licensed. You can do basically anything you want with it. The Apache license includes a patent grant, but we also have an agreement with all of the other companies involved in CHERI not to patent ‘capability-essential IP’, which basically covers anything necessary to implement a performant CHERI core. We have patented a design for a particular technique for accelerating temporal safety, but that’s not in the open source release (it may be at some point there probably isn’t much competitive advantage in keeping it proprietary).
All of the software stack is MIT licensed, except the LLVM bits which are Apache+GPL linking exemption.
Arm doesn’t have any patents required for CHERI, they may have some on other part of the core but, considering that it’s a similar design to 30-year-old RISC chips, I think it’s quite unlikely.
I consider this a regulatory issue: we need to be able to mandate the type of formal security guarantees that CHERI provides. I’m trying to get a position at a think tank to try working on this exact problem. But it’s not easy to get funding….
Regulation to produce higher-quality systems has been done before (pdf). Critics point out it had big issues (pdf). DO-178B proved it out again with that market still existing (i.e. DO-178C). I proposed on Lobsters just a few rules for routers that should reduce lots of DDOS attacks. I believe it can be done.
The biggest worry I have is new regulations are just some pile of paperwork with boxes to check off. That’s costly and useless. That’s what a lot of Common Criteria certification was. On other end, it might be so secure that it’s features and development pace drag behind the rest of the market. Steve Lipner said the VAX Security Kernel was unmarketable partly because high-assurance engineering made every change take two to three quarters to pull off. Many features nobody knew how to secure.
Another issue, explored in the second paper, is that requirements don’t make any sense. Regulators often cause that problem once they get too much control. Companies will respond deviously. Then, there’s gotta be enough suppliers so it meets competition requirements in places like DOD. If not compatible with profitable products, the lobbyists of the tech firms will probably kill it off to prevent billions of their revenue from being killed off. Examples of big companies’ responses to what threatened them included DMCA, patent trolling, and Oracle’s API ruling.
Just a few things I remembered having to balance in my proposals in case they help you.
I’m aware of past attempts which all devolved into a bureaucratic exercise in paperwork and box ticking. There is a rumor (which I can’t debunk without spending WAY too much personal time grokking obscure NIST standards) that INTEGRITY didn’t actually get an EAL6 cert but claimed to have one anyway. My favorite paper on the subject is Certification and evaluation: A security economics perspective.
But to say that past attempts failed doesn’t mean it isn’t a good idea. It took decades for airbag technology to advance to the point that it was reliable enough to actually improve safety. They have been mandatory for 20 years now.
Formal verification becoming practical for small code bases would make it feasible to mandate correctness of specific sub-components, such as specific cryptographic operations and process isolation.
We can avoid prescriptive rules by mandating large public bug bounties and an insurance policy to cover them. That gives the insurance policy provider flexibility to adjust to whatever security procedures vendors come up with. If Comcast had to pay a million dollars for every router exploit, they would certainly be more invested in high assurance software….
“Formal verification becoming practical for small code bases would make it feasible to mandate correctness of specific sub-components, such as specific cryptographic operations and process isolation.”
“ If Comcast had to pay a million dollars for every router exploit, they would certainly be more invested in high assurance software….”
I really like both of those concepts. Totally agree.
If you know anyone in the policy space that would like to chat (yourself included), let me know. I have think-tank experience and have developed both some interesting infosec policy ideas and how to frame them effectively.
CHERI also makes both garbage collection and formal verification of certain properties easier.
Rust developers regularly give up on fighting the borrow checker and switch to some form of reference counting (which is less efficient than a garbage collector). Making garbage collection less of a performance barrier opens up a market for programming languages that target infrastructure code. That’s not to say it’s not possible to get rid of (A)RC within RUST, just that a language built for CHERI would offer a more efficient use of a given engineering budget.
Rust doesn’t provide any formal guarantees, even in the intermediate representations. CHERI’s formally verified architecture makes it easier to prove some security properties across programming languages and down the binary level.
Rust means much more than memory safety. For instance, my favourite, it allows for type level state machines which allow HAL implementors to write more robust APIs that eg. make logical errors unrepresentable. As a result, as long as a usercode compiles, it has a higher probability of being just correct. Which means less classes of errors in runtime.
So, if one wants to use a specific peripheral in a microcontroller that requires a GPIO in a proper mode, we can represent this GPIO pin with its mode as a specific type and demand ownership over instance thereof during construction of such perpiheral abstraction. Changing a mode of a GPIO should consume the previous instance and yield a new one with a new type. And so on, and so on. These are just simple examples but possiblities are endless. Maybe Ada could compete with Rust in this area but C and C++ are just a no-go zone in comparison.
I don’t understand the last sentence in your comment. You most definitely can write C++ code that expresses this kind of state machine and it’s one of the reasons that I prefer C++ to C for embedded development.
Move semantics are quite weak in C++ (although still much much better than C). It’s not that difficult to make a mistake and use “an empty object” which was moved - where in Rust such object cannot be used.
Here is a comment from 6 days ago that states essentially the opposite. Who’s one believe..? Life is so confusing… Brings to mind this quote by Bertrand Russell:
The fundamental cause of the trouble is that in the modern world the stupid are cocksure while the intelligent are full of doubt.
I’m not sure it’s exactly the same scenario – it’s concerned with reading from peripherals with a different type model than the host (the parent comment that I was replying to also has a similar example), which the compiler can “misunderstand” and optimize away.
Write access obviously doesn’t incure the same kind of pitfalls, although it can be bumpy, too. For instance, on platforms that only allow accessing GPIO ports, as opposed to individual pins, the GPIO API needs to expose pins, but trade the ownership of ports of pins. That can be surprisingly difficult to represent, especially if you need to deviate from the straightforward case of eclusive ownership that’s acquired & relinquished synchronously and only relinquished after an operation is completed.
Edit: FWIW, I wouldn’t be surprised if “accidental” exclusive ownership of dependent resources were a remarkably frequent source of priority inversion bugs. However, this isn’t really a language’s fault. It’s very easy to bake it in a runtime model, too.
(Even later edit: that being said, none of this is really Rust’s fault per se, some things just can’t be adequately modeled at compile time. It’s a bad idea to take any compiler on its word, Rust or otherwise.)
It does not seem to state anything opposite to what I said. Modelling HW in software is hard, period. When dealing with unsafe code while writing a HAL library one has to be extra cautious to avoid violation of the platform’s invariants. Enums example is very good as they are tricky to work with directly on a FFI boundary.
However, I don’t really see in what way Rust is worse than C or C++ in that regard. I don’t consider neither C nor especially C++ to be meaningfully more inspectable. If anything it is still a massive improvement in all these areas. Like robust abstraction building, tooling & coherent build system, easiness of code reuse between the firmware and the host and so on. With RTIC you can write safe, interrupt-driven deadlock-free applications with resource sharing and priorities that do minimal locking using stack resource policy and with barely any overhead. It has a research paper behind it if anyone is interested in it. It does not compile if you violate invariants. It’s unheard of by C/C++ standards. cargo expand can be used if proc-macro is not trust-worthy enough and it’s possible to read it just fine.
I’m doubtful where it’s due and confident where it’s due. Embedded software development was for decades abysmally bad with poor Windows centric toolchains, reskins of Eclipse IDEs by different vendors (mcuxpressoide), code generation with placeholders “where user code should go to” (STM Cube), build systems based on XMLs and magic wizard GUIs - list could probably go on for a very long time. It feels insanely dishonest to write off Rust considering how empowering it is regardless of its shortcomings. Although some tools are not there yet, admittedly.
I don’t think there was a need for calling me stupid by using a quote of a famous person. I just think we have very low standards in this specific branch of IT industry when it comes to the technology we use. I look forward to another lang/tech stack that will attempt to fix everything that Rust got wrong. Again, I think we could probably learn quite a lot from Ada, for example.
However, I don’t really see in what way Rust is worse than C or C++ in that regard.
The C and C++ type systems guarantee less. That’s normally a bad thing, but it conversely means that there’s less for the compiler to take advantage of. Low level I/O generally steps outside or the language’s abstract machine. That opens more opportunities for accidentally invoking undefined behaviour in Rust than C, because there are more things that the type system doesn’t allow but that hardware or software in other languages can do. Defensively programming this is hard to get right for the same reason that signed integer overflow checking is hard in C: you have to make sure that your checks all happen before the thing that would be undefined behaviour are reachable. Rust relies quite heavily on being able to prune impossible things during optimisation for performance. Hardware interfaces, particularly in the presence of glitch attacks, can easily enter the space of things that should be impossible.
Memory safety enforced at compile time is strictly better than doing it by crashing at runtime in case of violations. So, no, this doesn’t make Rust any less attractive even as it make C/C++ behave better.
I think it’s important to point out that hardware w/o capability enforcement is probably going to continue to exist for a very long time, so if you’re going to write new code, you may as well do it in a language that protects you in both cases.
Cheri won’t stop you overflowing a small int, make sure you free that malloc, or keep you from writing to a closed file handle, and for a lot of attacks a runtime crash is as good as an overrun.
for a lot of attacks a runtime crash is as good as an overrun.
I don’t think I’d agree with that. The goal of most widely deployed mitigations is to take a bug that can attack confidentiality or integrity and turn them into attacks on availability. In general, security ranks the tree properties, in descending importance, as:
Integrity.
Confidentiality
Availability.
A breach in integrity lets the attacker corrupt state and can usually be used to attack the other two. Breach in confidentiality may leak secrets that the rest of your security depends on (see: heartbleed), or commercially or personally sensitive information with potentially very expensive implications for a long time, including the reputational cost of reducing trust, A breach in availability may cost some money in the short term, but that generally matters only in safety critical systems.
It’s also much easier to build in resistance to availability issues higher up in a system. If you are running a datacenter service and an attacker can compromise it to do their own work or leak customer data, that’s likely to cost millions of dollars. If they can crash a node, you restart it and block the attacking IP at the edge. You also record telemetry and fix the bug that they’re exploiting. As soon as the fix is deployed, the attack has no lasting effect. If they were able to install root kits or leak your customers’ data, the impact of the attack may extend for years after the breach.
Oh, and on the memory leak side: we can’t prevent it, but we can 100% accurately identify pointers in memory, so we can provide sanitisers that check it with periodic scans.
We previously had a blog about this discussed on lobste.rs, but now the code (including RTL for the core) is open source. Happy to answer any questions.
Congrats on the publication! I can’t wait to dig into it and find you some questions.
One extra fun thing: A student in Tom Melham’s group at Oxford has been formally verifying the implementation of this core against the ISA spec. He’s hoping to complete that work by the summer, at which point we’d have a verified implementation of a secure ISA for embedded systems. Our TCB for confidentiality and integrity is around 300 instructions and that should also be amenable to formal verification, which should give us some very nice end-to-end properties on the system.
Is there some hardware in the pipeline that mere mortals will be able to obtain to play with this? (Sorry if this is mentioned somewhere but skimming through most of it I didn’t notice anything to this effect.)
If you have an FPGA to play with, you should be able to synthesize the CHERIoT Ibex core, though it will require some integration work. We’re hoping to enable it in some open-source FPGA toolchains. The dev container for the RTOS includes the simulator built from the formal model (which you can build yourself if an OCaml toolchain doesn’t scare you).
I didn’t have time to look at replacing xmake with build2 before the release, but now it’s open and you can see what the build process does, I’d love to see how build2 would work in this context.
At the risk of venturing too far off-topic, or getting too deep into zero-sum thinking, I wonder if the rise of CHERI means that we can and should abort efforts to rewrite stuff in memory-safe languages like Rust. If we take for granted that we’re going to leave behind folks stuck on old hardware anyway, and fixing memory-safety through something like CHERI just requires another hardware upgrade cycle, then maybe that gives us freedom, or even an obligation (particularly for those of us that actually like Rust), to go back to the old workhorses of C++ and even C.
I think CHERI shifts the narrative for memory-safe languages somewhat, in terms of memory safety. They are no longer about improving confidentiality and integrity but they are still a win for availability. In CHERIoT, we can reuse the FreeRTOS network stack almost unmodified, but that doesn’t prevent it from having memory-safety bugs. If an attacker can find and exploit them, they can crash the network stack. We can limit the fallout from that, but it still crashes. In contrast, if someone where to rewrite the network stack in Rust, then they could eliminate those bugs at design time.
The flip side of that is that rewriting the network stack in Rust is a lot more effort than recompiling it with CHERI. Of the code that we’ve tried building for CHERIoT, we have very rarely needed to modify the code for CHERI, only to add compartmentalisation. In the worst cases, we’re needing to modify around 1% of the lines of code, so the cost of rewriting in Rust is at least 100x higher.
For new code, it’s almost certainly a better choice to use a safer language. For existing code, compiling it for CHERI and putting it in a sandbox is cheaper than rewriting it (assuming CHERI hardware) and may give you sufficient availability guarantees.
The important thing to remember about rewriting existing software is the opportunity cost. Any time spent rewriting something in Rust is time not spent writing software in Rust that solves new problems.
“The flip side of that is that rewriting the network stack in Rust is a lot more effort than recompiling it with CHERI. “ “assuming CHERI hardware”
Which leads to the real problem. The market has tanked or forced a detour on [almost?] every company that bet on secure hardware for about sixty years. B5000 had some memory safety in 1961. System/38 had capability security. The market chose against those techniques for raw price/performance, integration, and backward compatibility. Even today, they’d be more likely to avoid really leveraging MCP or IBM i for security. In embedded, people avoided secure processors (esp Java CPU’s) for similar reasons. Even mid-sized to high-profit companies mostly avoided RTOS’s like INTEGRITY-178B and hypervisors like LynxSecure to a fraction of a percent of their costs.
Intel had it worse. Schell said a Burroughs guy at Intel added segments to x86 for him which secure OS’s used (eg GEMSOS). Market was mostly annoyed with and didn’t use that. Then, Intel lost billions on i432, i960, and Itanium. I liked i960MX, Secure64 built on Itanium, and companies using both had to port later. Azul’s Vega’s were safest bet since you can hide them underneath enterprise Java but they got ditched IIRC. It looks like market suicide to make or target a secure, clean-slate architecture. Whereas, memory-safe languages and software-level mitigations have been getting stronger in the market over time.
I really like your work on CHERI because it stays close to existing architectures vs dumping compatibility. Others took too many design risks. Yet, we’ve seen the RISC market abandon good RISC’s, the server market abandon good server CPU’s, and Java abandon Java CPU’s. The reasons appear to be:
The market-leading CPU’s had highest performance-per-dollar-per-watt-per-etc. Even if prototypes met specs, the same design might not meet spec requirements in full-custom, high-speed CPU’s.
Compatibility with x86. That still dominates much of the desktop and server market. There’s a lot of reasons to think it would be harder to get right than ARM or RISC-V. The ARM market for non-embedded is huge now, though.
For enterprises, integration into server boards they can plug into their kind of infrastructure. These things need to be ready to go with all the extras they like having on them. For consumers, it’s whatever goes into their phones and stuff.
Integration with common, management software that lets them deploy and administer it as easily as their Windows and Linux machines. Now maybe on clouds, too. For consumers, integration with their favorite apps.
Whoever sells secure hardware in major markets has a tremendous investment to do in hardware/software development. There’s lots of ASIC blocks, prototype boards, ports, and so on. After all that is spent, they still have to sell it at or near cost of everything else to be competitive.
That said, embedded and accelerator cards are the best bets. Might spur on the risk-adverse markets, too. Your project is in one category but still with some of the risks. Accelerator cards would be things like ML boards, baseband boards, etc. I was hoping Cavium slapped secure CPU’s on their network processors but they went to ARM. Same pattern. I’m glad ARM has a CHERI prototype.
Meanwhile, I encourage people to do R&D in all directions, including rewrites and transpilers, in case secure hardware never arrives, is hard to buy, or the company goes bankrupt. (Again, again, and again.)
I agree on most of your points. The ‘massive investment’ bit is why the Digital Security by Design project is investing £170m in ecosystem bring up, including having Arm fab a server-class test chip and development system. This can now run FreeBSD, Wayland and KDE in a memory-safe build (Chromium is not there yet but is closer than I expected), but it doesn’t give up on backwards compatibility: it can still run unmodified AArch64 binaries. Linux support is getting there, though Linux’s lack of same abstractions made the bring-up cost higher than FreeBSD.
In the embedded space, people are a lot more willing to recompile their code for each SoC. We’ve done a lot of work to make sure that, in the common case, that’s all that they have to do.
I would love to hear your thoughts on how to craft regulations that would mandate some of the security guarantees that CHERI provides.
I’m on the advisory board for the DESCRIBE project, which is looking at exactly that. It’s mostly comprised of social scientists, who are actually qualified to answer that question, I’m just there to help the, understand what the technology can (and, more importantly, can’t) do.
On the government side, a number of people involved remember mandating Ada and what a colossal mistake that was. Regulation requiring the technology is unlikely, but the two biggest things that I think could make a difference are:
I’ve done think-tank work in the past and I was thinking about doing some policy memos framing security as a human rights issue. The problem is finding a host institution, I could probably get an internship at Galois if there was some funding available….
Increasing liability (structured around bug bounties) and altering government procurement regulations to accept a ~20% cost increase for various security requirements is reasonable. However, verifiable security claims are something that I think could be reasonably mandated, such as having formal proofs of correctness for various components.
That’s a really, good start. I think the JIT’s were the wall that some others hit after getting that far. That at least narrows the problem down. That it’s usually a few, critical runtimes might save you trouble. A reusable one per ISA might help for future apps. I’m sure yall are already all over that.
On marketing end, one thing you might consider hiding the CHERI hardware in non-security offerings. They already want to buy certain products for specific capabilities. Maybe they don’t care about security or only a tiny percentage will. Put in a switch that lets them turn it on or off with something not easily bypassable, like antifuse or a jumper. It’s a normal product (eg FreeRTOS) unless they want the protections on. The CHERI vendor can also offer them a low-cost license for the core to create demand. So, the sales of the non-security product pays for the secure hardware which, configured differently, gets used for its true purpose.
(Note: You still try to market secure-by-default products in parallel. This would just be an option to pitch suppliers, esp hot startups, with a free or cheap OEM license to get them to build what might have less demand.)
Similarly, my older concept was trying to put it into SaaS platforms. The customer buys the app mainly with configs or programs in high-level languages. If it performs well, they neither know nor care what it actually runs on. Companies can gradually mix in high-security chips with the load-balancer spreading it out. If the chips start failing or aren’t competitive, move load back to mature systems. You still need people who want to buy it with some risk there. It’s not a business-ending risk this way, though.
There’s work ongoing on OpenJDK and v8. There’s a long tail after that, but those are probably the key ones.
Specifically in the CHERIoT case, one of the advantages is that you can use a shared heap between different phases of computation, provided by different vendors, with non-interference guarantees. Oh, and we’re giving the core away: any SoC maker can integrate it at no cost.
Even better. “Giving the core away” really jumps out. That demands a few, follow-up questions:
You said CHERIoT. Is it just that core pre-made? Or can they integrate CHERI architecture, your build or their independent builds, into any commercial core with no licensing cost?
Is that in writing somewhere so that supplier know they can’t get patent trolled by Microsoft or ARM? Are they doing the open-patent thing or no patents on any of it? ISA’s etc are a patent minefield. Companies will want assurance.
The CHERIoT Ibex core is Apache licensed. You can do basically anything you want with it. The Apache license includes a patent grant, but we also have an agreement with all of the other companies involved in CHERI not to patent ‘capability-essential IP’, which basically covers anything necessary to implement a performant CHERI core. We have patented a design for a particular technique for accelerating temporal safety, but that’s not in the open source release (it may be at some point there probably isn’t much competitive advantage in keeping it proprietary).
All of the software stack is MIT licensed, except the LLVM bits which are Apache+GPL linking exemption.
Arm doesn’t have any patents required for CHERI, they may have some on other part of the core but, considering that it’s a similar design to 30-year-old RISC chips, I think it’s quite unlikely.
Thank you! I’ll definitely pass all you’ve told me along to people who might build it.
Please encourage them to reach out to me directly - we’re looking to connect to silicon partners that might want to build something based on it.
I consider this a regulatory issue: we need to be able to mandate the type of formal security guarantees that CHERI provides. I’m trying to get a position at a think tank to try working on this exact problem. But it’s not easy to get funding….
Regulation to produce higher-quality systems has been done before (pdf). Critics point out it had big issues (pdf). DO-178B proved it out again with that market still existing (i.e. DO-178C). I proposed on Lobsters just a few rules for routers that should reduce lots of DDOS attacks. I believe it can be done.
The biggest worry I have is new regulations are just some pile of paperwork with boxes to check off. That’s costly and useless. That’s what a lot of Common Criteria certification was. On other end, it might be so secure that it’s features and development pace drag behind the rest of the market. Steve Lipner said the VAX Security Kernel was unmarketable partly because high-assurance engineering made every change take two to three quarters to pull off. Many features nobody knew how to secure.
Another issue, explored in the second paper, is that requirements don’t make any sense. Regulators often cause that problem once they get too much control. Companies will respond deviously. Then, there’s gotta be enough suppliers so it meets competition requirements in places like DOD. If not compatible with profitable products, the lobbyists of the tech firms will probably kill it off to prevent billions of their revenue from being killed off. Examples of big companies’ responses to what threatened them included DMCA, patent trolling, and Oracle’s API ruling.
Just a few things I remembered having to balance in my proposals in case they help you.
I’m aware of past attempts which all devolved into a bureaucratic exercise in paperwork and box ticking. There is a rumor (which I can’t debunk without spending WAY too much personal time grokking obscure NIST standards) that INTEGRITY didn’t actually get an EAL6 cert but claimed to have one anyway. My favorite paper on the subject is Certification and evaluation: A security economics perspective.
But to say that past attempts failed doesn’t mean it isn’t a good idea. It took decades for airbag technology to advance to the point that it was reliable enough to actually improve safety. They have been mandatory for 20 years now.
Formal verification becoming practical for small code bases would make it feasible to mandate correctness of specific sub-components, such as specific cryptographic operations and process isolation.
We can avoid prescriptive rules by mandating large public bug bounties and an insurance policy to cover them. That gives the insurance policy provider flexibility to adjust to whatever security procedures vendors come up with. If Comcast had to pay a million dollars for every router exploit, they would certainly be more invested in high assurance software….
“Formal verification becoming practical for small code bases would make it feasible to mandate correctness of specific sub-components, such as specific cryptographic operations and process isolation.”
“ If Comcast had to pay a million dollars for every router exploit, they would certainly be more invested in high assurance software….”
I really like both of those concepts. Totally agree.
That paper was really good, too. Thanks for it.
If you know anyone in the policy space that would like to chat (yourself included), let me know. I have think-tank experience and have developed both some interesting infosec policy ideas and how to frame them effectively.
CHERI also makes both garbage collection and formal verification of certain properties easier.
Rust developers regularly give up on fighting the borrow checker and switch to some form of reference counting (which is less efficient than a garbage collector). Making garbage collection less of a performance barrier opens up a market for programming languages that target infrastructure code. That’s not to say it’s not possible to get rid of (A)RC within RUST, just that a language built for CHERI would offer a more efficient use of a given engineering budget.
Rust doesn’t provide any formal guarantees, even in the intermediate representations. CHERI’s formally verified architecture makes it easier to prove some security properties across programming languages and down the binary level.
Rust means much more than memory safety. For instance, my favourite, it allows for type level state machines which allow HAL implementors to write more robust APIs that eg. make logical errors unrepresentable. As a result, as long as a usercode compiles, it has a higher probability of being just correct. Which means less classes of errors in runtime.
So, if one wants to use a specific peripheral in a microcontroller that requires a GPIO in a proper mode, we can represent this GPIO pin with its mode as a specific type and demand ownership over instance thereof during construction of such perpiheral abstraction. Changing a mode of a GPIO should consume the previous instance and yield a new one with a new type. And so on, and so on. These are just simple examples but possiblities are endless. Maybe Ada could compete with Rust in this area but C and C++ are just a no-go zone in comparison.
I don’t understand the last sentence in your comment. You most definitely can write C++ code that expresses this kind of state machine and it’s one of the reasons that I prefer C++ to C for embedded development.
Move semantics are quite weak in C++ (although still much much better than C). It’s not that difficult to make a mistake and use “an empty object” which was moved - where in Rust such object cannot be used.
Here is a comment from 6 days ago that states essentially the opposite. Who’s one believe..? Life is so confusing… Brings to mind this quote by Bertrand Russell:
The fundamental cause of the trouble is that in the modern world the stupid are cocksure while the intelligent are full of doubt.
I’m not sure it’s exactly the same scenario – it’s concerned with reading from peripherals with a different type model than the host (the parent comment that I was replying to also has a similar example), which the compiler can “misunderstand” and optimize away.
Write access obviously doesn’t incure the same kind of pitfalls, although it can be bumpy, too. For instance, on platforms that only allow accessing GPIO ports, as opposed to individual pins, the GPIO API needs to expose pins, but trade the ownership of ports of pins. That can be surprisingly difficult to represent, especially if you need to deviate from the straightforward case of eclusive ownership that’s acquired & relinquished synchronously and only relinquished after an operation is completed.
Edit: FWIW, I wouldn’t be surprised if “accidental” exclusive ownership of dependent resources were a remarkably frequent source of priority inversion bugs. However, this isn’t really a language’s fault. It’s very easy to bake it in a runtime model, too.
(Even later edit: that being said, none of this is really Rust’s fault per se, some things just can’t be adequately modeled at compile time. It’s a bad idea to take any compiler on its word, Rust or otherwise.)
It does not seem to state anything opposite to what I said. Modelling HW in software is hard, period. When dealing with unsafe code while writing a HAL library one has to be extra cautious to avoid violation of the platform’s invariants. Enums example is very good as they are tricky to work with directly on a FFI boundary.
However, I don’t really see in what way Rust is worse than C or C++ in that regard. I don’t consider neither C nor especially C++ to be meaningfully more inspectable. If anything it is still a massive improvement in all these areas. Like robust abstraction building, tooling & coherent build system, easiness of code reuse between the firmware and the host and so on. With RTIC you can write safe, interrupt-driven deadlock-free applications with resource sharing and priorities that do minimal locking using stack resource policy and with barely any overhead. It has a research paper behind it if anyone is interested in it. It does not compile if you violate invariants. It’s unheard of by C/C++ standards.
cargo expandcan be used if proc-macro is not trust-worthy enough and it’s possible to read it just fine.I’m doubtful where it’s due and confident where it’s due. Embedded software development was for decades abysmally bad with poor Windows centric toolchains, reskins of Eclipse IDEs by different vendors (mcuxpressoide), code generation with placeholders “where user code should go to” (STM Cube), build systems based on XMLs and magic wizard GUIs - list could probably go on for a very long time. It feels insanely dishonest to write off Rust considering how empowering it is regardless of its shortcomings. Although some tools are not there yet, admittedly.
I don’t think there was a need for calling me stupid by using a quote of a famous person. I just think we have very low standards in this specific branch of IT industry when it comes to the technology we use. I look forward to another lang/tech stack that will attempt to fix everything that Rust got wrong. Again, I think we could probably learn quite a lot from Ada, for example.
The C and C++ type systems guarantee less. That’s normally a bad thing, but it conversely means that there’s less for the compiler to take advantage of. Low level I/O generally steps outside or the language’s abstract machine. That opens more opportunities for accidentally invoking undefined behaviour in Rust than C, because there are more things that the type system doesn’t allow but that hardware or software in other languages can do. Defensively programming this is hard to get right for the same reason that signed integer overflow checking is hard in C: you have to make sure that your checks all happen before the thing that would be undefined behaviour are reachable. Rust relies quite heavily on being able to prune impossible things during optimisation for performance. Hardware interfaces, particularly in the presence of glitch attacks, can easily enter the space of things that should be impossible.
Memory safety enforced at compile time is strictly better than doing it by crashing at runtime in case of violations. So, no, this doesn’t make Rust any less attractive even as it make C/C++ behave better.
I think it’s important to point out that hardware w/o capability enforcement is probably going to continue to exist for a very long time, so if you’re going to write new code, you may as well do it in a language that protects you in both cases.
Cheri won’t stop you overflowing a small int, make sure you free that malloc, or keep you from writing to a closed file handle, and for a lot of attacks a runtime crash is as good as an overrun.
I don’t think I’d agree with that. The goal of most widely deployed mitigations is to take a bug that can attack confidentiality or integrity and turn them into attacks on availability. In general, security ranks the tree properties, in descending importance, as:
A breach in integrity lets the attacker corrupt state and can usually be used to attack the other two. Breach in confidentiality may leak secrets that the rest of your security depends on (see: heartbleed), or commercially or personally sensitive information with potentially very expensive implications for a long time, including the reputational cost of reducing trust, A breach in availability may cost some money in the short term, but that generally matters only in safety critical systems.
It’s also much easier to build in resistance to availability issues higher up in a system. If you are running a datacenter service and an attacker can compromise it to do their own work or leak customer data, that’s likely to cost millions of dollars. If they can crash a node, you restart it and block the attacking IP at the edge. You also record telemetry and fix the bug that they’re exploiting. As soon as the fix is deployed, the attack has no lasting effect. If they were able to install root kits or leak your customers’ data, the impact of the attack may extend for years after the breach.
Oh, and on the memory leak side: we can’t prevent it, but we can 100% accurately identify pointers in memory, so we can provide sanitisers that check it with periodic scans.