1. 33
    1. 11

      Thanks for writing this up, it’s always interesting to see the journey people take to understand CHERI and I wish we were better at making that shorter (hopefully this article will help).

      I’ve been told (after a decade of explaining CHERI to various people) that they key thing to understanding CHERI is that the hardware understands pointers. I think that’s true for a shallow understanding but it misses some important subtlety.

      Pointers are a language construct. On most architectures, the compiler lowers pointers to integers that include an address. On a few systems (e.g. 16-bit x86) it lowers them to two integers representing a segment and an offset. CHERI provides a data type that captures more of the source-level semantics when you lower pointers to it, in particular it allows the compiler to express bounds. CHERI capabilities are a merger of ideas from capabilities and fat pointers. Fat pointers are a fairly well-known idea in language implementations, as a pointer expressed as an address and some other metadata, typically bounds but sometimes type and other fun things (for example, Go’s slices and interface-typed pointers are both implemented as forms of fat pointer).

      I found it useful to understand capabilities in general before understanding CHERI capabilities. A capability is an unforgeable token of authority that, when presented, allows you to perform some action. This is important for the principle of intentionality (you must not exercise privilege only when you choose to explicitly exercise a particular privilege). This is a huge win for designing secure systems: just because I have write access to a directory, for example, doesn’t mean that I can be tricked into accessing it when I’m expecting to write into a temporary directory because I will present the authorising capability to the temporary directory and that will not let me write to the home directory. Capsicum (from many of the same people as CHERI) makes UNIX file descriptors into capabilities (they are almost capabilities already).

      CHERI requires a capability to access memory. Every load or store instruction on a conventional architecture takes an address as a base operand. CHERI takes a capability and that capability must convey the relevant rights (e.g. load or store) to the range of memory being loaded or stored. The same applies to jump-register instructions, which must take a capability with a permit-execute permission.

      The thing that makes CHERI capabilities distinct from traditional fat pointers is that they are unforgeable and have monotonic permissions. The thing that makes them distinct from traditional capabilities is that they also contain an address and so identify a specific point as well as granting access to a range. They’re an address plus metadata, not ‘pointers with extra information’. The whole unit of address and extra information is a pointer, an address is only part of a pointer, even at the C abstract machine level where using a pointer to one object to access another is undefined behaviour even if two pointers happen to have the same address.

      Capabilities are a dynamic (i.e. run-time) concept and humans are notoriously bad at reasoning about the dynamic (or “temporal”) behaviour of programs

      That absolutely does not match my experience working on capability systems. Exercising privilege is an inherently temporal property (I do an action now). Many of the security bugs that I’ve seen in the wild have come from not having a dynamic mechanism to describe this and, instead, having to encode a static policy. Being able to say ‘this component may allocate memory from my pool for the duration of this call’, for example, makes it far easier to reason about the security properties than having some global configuration that says ‘this component may allocate memory from my pool’. Capabilities flow around your program just like data, so it is easy to couple the rights to perform an action with the thing that causes the action to need to be performed.

      When traversing a tree, for example, humans are good at understanding that they are operating over a tree node that changes over time. Understanding the set of capabilities that you have access to at a point in time is no different.

      Having a security policy that is distinct from your programming language is far harder to reason about than one where you can encode your security policy as a natural expression of your code.

      That means that I’m left hoping that capabilities will catch the bad cases that I haven’t thought of. History suggests that if a programmer hasn’t thought of, or hasn’t tested, something then it will not work as the programmer would hope.

      That’s the entire point of a capability system. Anything that you haven’t thought of will fail. You have to think of the set of things that are permitted, which is finite (and, usually, small) not the set of things that must be disallowed (which is infinite). In the worst case, you have a failure of availability and not of confidentiality or integrity.

      CHERI has a second “mode” of operations where one can use both normal 64-bit pointers and 128-bit capabilities alongside each other.

      Some CHERI systems do, in the same way that some 64-bit Arm cores can run 32-bit binaries. CHERIoT, for example, does not require it and we don’t have it in any of our implementations.

      To many people’s surprise, all current practical CHERI systems use hybrid mode in, at least, their OS kernels, because converting a large OS kernel to purecap is too much work to contemplate (kernels do a lot of low-level, unusual, things that don’t align well with capabilities).

      Are you sure? I was under the impression that CheriBSD had been defaulting to pure-cap mode for a while. Linux is further behind. CHERIoT is 100% pure-cap.

      In general, porting to pure-cap mode is far easier than trying to thread some capabilities through the system, because in the latter case you have to modify everything that touches those pointers. For example, we originally ported tcpdump to use hybrid mode and protect the packet buffers with capabilities. This involved a diff of well over a thousand lines. We then built it as a pure-cap binary and this resulted in a diff of under 10 lines. It’s also much easier to reason about the security guarantees in pure-cap mode.

      Hybrid mode is considered passé in the CHERI world

      No, it’s considered something that we put a lot of effort into, spent years working on, and discovered required orders of magnitude more porting effort than the pure-cap mode. It looked like it might be a lower-friction path to adoption (hey, the ABI doesn’t change!) but turned out to be vastly more work (rather than giving you a slightly stricter version of the C abstract machine, it gives you a much more complex abstract machine). In particular, hybrid mode gives you three colours of pointer:

      • Ones that are in the current sandbox (whatever that means) but about which you can make no other claims (they might be the result of an attacker-controlled integer converted to a pointer) and so you cannot use them to build any higher-level security guarantees.
      • Ones that point to the sandbox’s global address space (DDC) and where you can’t corrupt the state via this pointer, but where out-of-bounds or use-after-free bugs anywhere else in the system can corrupt this object.
      • Ones that point to memory reachable only via a capability explicitly constructed to this object. These are the only ones that you can use to build useful higher-level security properties.

      You have no way, in the static type system, to differentiate the second and third cases. You can build some constrained use cases where you remove the mechanism to construct the second kind (for example, a WebAssembly FaaS system you have three WAsm memory objects - the program memory, the input buffer and the output buffer - each represented by capabilities) but in general it’s far more effort. If you have MS-WAsm, for example, then you can build the same kind of abstraction in a way that’s easier for programmers to target on your FaaS system.

      Hybrid “subprocesses” seem to me to be a new point in the design space.

      Colocated processes (‘coprocesses’) have been the subject of a lot of research with CHERI (the first version was, I think 6-7 years ago). You can do hybrid ones (though that requires DDC offsetting, which the RISC-V folks want to remove from their MVP CHERI and Arm doesn’t really like, or they require PIE binaries and some invasive changes to rtld) but they’re more useful in pure-capability mode because then it’s easy to share arbitrary objects. With hybrid mode, you either need invasive changes (plumbing __capability annotations throughout your code) or you need to do copying. It is nice that you can do efficient one-copy I/O between coprocesses in hybrid mode, but it’s even nicer that you can do zero-copy I/O between coprocesses in pure-cap mode.

      Pedantry time (sorry @ltratt!):

      Your C example will most likely not fault because you’re calling malloc, which typically rounds allocations up to 8 or 16 bytes. What memory allocator are you using that causes this to fault? (Note: I would expect this to trap with a stack allocation or a global).

      I think you mean SIGSEGV when you say SEGFAULT.

      I found this confusing:

      Earlier I used read/write as an example of capability permissions. However, by far the most commonly used permissions are bounds

      A CHERI capability has bounds and permissions. These, between them, convey rights. It’s confusing that you say permissions when you mean the complete set of rights that are granted. Permit-store is a permission that conveys the right to store data in the range identified by the bounds. The bounds are not permissions.

      1. 1

        Colocated processes (‘coprocesses’) have been the subject of a lot of research with CHERI (the first version was, I think 6-7 years ago). You can do hybrid ones (though that requires DDC offsetting, which the RISC-V folks want to remove from their MVP CHERI and Arm doesn’t really like, or they require PIE binaries and some invasive changes to rtld) but they’re more useful in pure-capability mode because then it’s easy to share arbitrary objects. With hybrid mode, you either need invasive changes (plumbing __capability annotations throughout your code) or you need to do copying. It is nice that you can do efficient one-copy I/O between coprocesses in hybrid mode, but it’s even nicer that you can do zero-copy I/O between coprocesses in pure-cap mode.

        Could just go for a single-address space setup. You have capabilities, after all.

        This isn’t new either - IBM i is capability based and single-address space (with single-level storage, at that).

        1. 1

          Kind of. A 64-bit address space is probably big enough for most things (though a lot of things now take advantage of the sparsity available in 64-bit systems and consuming 40 bits isn’t too unusual in a single process, even though most of it is not backed by real memory). Most systems don’t actually give you 64 bits of virtual address space though, they give 48 or 56, which would limit you to a fairly small number of such processes.

          In CHERIoT, we have a flat address space because removing the need for an MMU was a big area saving, but we don’t expect to run POSIX programs. In the coprocess model, you can’t use fork, which breaks a lot of UNIX code (you can do vfork and coexecve) including things that use a zygote model. It’s useful to have both a traditional UNIX process model and the ability to collocate processes in a single address space. If processes aren’t sharing anything then there’s little benefit in collocating them.

          1. 1

            I don’t know of anybody who gives 56 bits of address space. x86 was recently extended to 57 bits, and arm to 52.

            It’s not clear how useful sparsity is if you’re safe.

            Edit: in what sense is a file descriptor not a capability? I think it is a capability wrt the unix unit of encapsulation of security (the process); obviously you can forge any integers you want intra-process, but that seems irrelevant.

            1. 2

              I don’t know of anybody who gives 56 bits of address space. x86 was recently extended to 57 bits, and arm to 52.

              57 is usually used as 56 with a user/kernel discriminator. Userspace processes have only a 56-bit address space.

              It’s not clear how useful sparsity is if you’re safe.

              Sparsity is used for a lot more than randomisation. In snmalloc, for example, we allocate a flat array that lets us map addresses to metadata. The fact that the kernel will initially not bother populating the pages, then populate them with CoW zero pages on read and populate them with fresh pages on write makes it a very efficient data structure for our fast paths. We aren’t the only people to use address space like this: since 64-bit systems made address space effectively free but didn’t make memory much cheaper, a lot of other people have come up with interesting data structures that rely on sparse address spaces. This includes a lot of managed languages, which will do tricks like reserve 2^35 bits (32 GiBs) of address space for their heap and use a 32-bit value right shifted by 3 as an offset into that region for the in-memory representation of pointers (assuming 8-byte-aligned objects). They will also often map the same region twice so that they can use MMU tricks for read / write barriers during relocation.

        2. 1

          Hello! Yes, I was wrong about the “all kernels are hybrid” – I know CheriBSD was hybrid and CHERI Linux is, but I had failed to realise that CheriBSD is now hybrid or purecap. Oops – I’ve put an “update” in the post!

          As to the dynamic flow of capabilities — I think this is something upon which reasonable people can differ. I think my opinions on this – which are now informed my many more years programming in statically typed languages – are different today than those I would have held 20 years ago. Of course, I might have been more right then than now!

          1. 1

            I’d like to understand you reasoning behind the claim that programmers are not good at temporal properties. I don’t believe this is true, I believe that programmers are not good at non-local properties.

            For example, use after free is a problem not because it is a temporal property but because (in a language with unbounded aliasing and no garbage collection), it is a non-local property. If you free an object, this impacts code somewhere else in the system. If you remove the non-locality either by requiring exclusive ownership or by providing a GC, programmers no longer have problems. Similarly, issues such as TOCTOU arise because modification happens concurrently somewhere else (it is a non-local property).

            Programming is reason about if everything that affects a piece of code is visible. This is why functional programming is seeing a resurgence: your function depends only on its arguments and calling a function will not affect any state that you did not explicitly pass it. This makes it easy to reason about even if, for example, the function is implementing a state machine (temporal property) and will behave differently depending on the state encoded in one of the arguments, which you pass through as an opaque value.

            This lack of spooky action at a distance is why actor model systems are easier to scale than shared memory systems. The very first program that I wrote in Erlang, I wrote on my single-core laptop and then watched scale linearly when I ran it on a 64-core SGI machine. It handled concurrent clients and had a lot of temporal behaviour (including prediction models and prefetching), but all of the temporal behaviour was local.

            This is, to me, the huge win for capability systems. Exercising privilege is a local property. With CHERI, If I do not pass an object to a function, that object will not be mutated by the function unless it is reachable for, globals reachable from that function (the latter is a potentially large caveat if you don’t have compartmentalisation). With Capsicum, I know a process will access and mutate only OS objects (files, IPC channels, and so on) that I explicitly hand it. When I wrote the process sandboxing code for Verona, the Capsicum version was a fraction of the code, faster, and easier to understand, than the Linux version with seccomp-bpf.

            This is orthogonal to the static vs dynamic typing argument. If you have a static type system then you can encode more about the rights that a capability must grant and check things earlier, at the possible expense of generality. In CHERIoT, we use C++ wrapper types for various forms of capability (these eventually compile down to a single pointer), but not across compartment boundaries because static type systems aren’t enforced between mutually distrusting code (I don’t trust you, I don’t trust your compiler, I will validate any arguments that you give me).

            I am, of course, hugely biased but, in spite of having four or five orders of magnitude less memory than I’m used to working with, I’ve found CHERIoT a far easier platform to build the kind of high-level structures that I want than any non-CHERI platform. We have both static (provisioned to a compartment at build time) and dynamic (created in a compartment and returned as a function result) capabilities.

            1. 1

              I completely agree about the difficulties of non-local properties, and I should have been more explicit about that, instead of taking it as a given – sorry!

              A brief (i.e. not-fully-thought-through) version of my current thinking is that non-local properties plus the ticking of time within a system combine to rapidly exceed my mental bandwidth. A simple example of this are memory “leaks” in garbage collected languages: the memory hasn’t really leaked (it’s a GCd language after all!) but programmers frequently forget that some no-longer-needed large objects are kept alive due to the need for some other small objects.

              Similarly, I find that as systems evolve, programmers often punch holes through it as they find that one part of a system needs information that they previously thought was only required in another. Rarely are the consequences of that hole-punching fully thought through, particularly in regard to the dynamic state of the system. I am frequently surprised to realise that at certain points the running system can now, via a tangle of functions and (dynamic) references, reach another part of a system that I thought it could not.

              This has led me, over time, to prefer cruder, but harder, barriers between components because it’s harder for me to unintentionally violate them. A very valid criticism is that I am projecting my own limitations as a programmer onto programmers in general, though!

              FWIW, I think the use case for purecap CHERI in small devices is really strong. So, thinking about it, there’s an element of “scale” here, but I’m not sure how to define it or where to draw the line.

              1. 1

                Similarly, I find that as systems evolve, programmers often punch holes through it as they find that one part of a system needs information that they previously thought was only required in another. Rarely are the consequences of that hole-punching fully thought through, particularly in regard to the dynamic state of the system. I am frequently surprised to realise that at certain points the running system can now, via a tangle of functions and (dynamic) references, reach another part of a system that I thought it could not.

                I don’t disagree, but I suspect that this is at least partly due to not having language-exposed compartmentalisation primitives. If you can reason about compartments in the source language then you see the hole punching and you’re encouraged to refactor your code to avoid it.

                FWIW, I think the use case for purecap CHERI in small devices is really strong. So, thinking about it, there’s an element of “scale” here, but I’m not sure how to define it or where to draw the line.

                I suspect it’s more maturity than scale. The focus on CheriBSD and CHERIoT RTOS have been very different.

                The primary goal for CheriBSD has been to provide an environment that gives memory safety with the absolute minimum number of programmer-visible changes to POSIX. There’s some ongoing work (e.g. coprocesses, making file descriptors capabilities, and so on) on how you can extend POSIX to take advantage of CHERI features, but that’s been far less of a priority than getting to the point where you can build Wayland, KDE, Chrome (nearly there!) and have them all Just Work, as a drop-in replacement for a non-memory-safe system.

                In contrast, with CHERIoT we aimed to do clean-slate design of the OS and compartmentalisation model with the CHERI-derivative ISA. This let us build things that required invasive changes throughout the stack. We want to be able to reuse existing libraries easily but we don’t aim to reuse existing programs (such a concept doesn’t really exist in the small end of the embedded space). That allowed us to focus far more on programmer affordances and build something that owes as much to Smalltalk as to existing RTOSs.

                1. 1

                  language-exposed compartmentalisation primitives

                  Are there good examples of these to which you would point? I assume you mean something stronger than ordinary module boundaries.

                  1. 2

                    To my knowledge, no existing language has anything like this. I did some work in collaboration with some folks at INRIA Lille ages ago on an object spaces abstraction for dynamic languages. The idea was to have a runtime construct that has the same relationship to namespaces that objects do to classes. Every object lived in a space. Spaces could be nested, but there was an interception point at space boundaries so you could inspect and modify every message send that went from one plane to another. We built some fun things on top of that (persistent spaces, transactional spaces with automatic rollback, spaces protected by a lock for concurrent access, and so on).

                    In Verona, we’re trying to build this into the static type system. Every object exists in a region. Regions have a sentinel object that dominates all others in it (I.e. all objects in a region are, eventually, reachable from the sentinel). Within a region, you can have any shape of object graph (trees, DAGs, cycles, whatever) but there is only one pointer from the outside and that points to the sentinel (we also have a notion of a weak pointer that points inside a region but can be made strong only if you hold a pointer to the region and if the object hasn’t been deallocated).

                    The region abstraction in Verona has some really nice properties. You can transfer ownership of a complex data structure between concurrent contexts by passing the linear pointer to the sentinel, so you get the properties of Rust’s Send trait, with arbitrary data structures. You can choose a memory management strategy per region. If you have a group of related objects with similar lifetimes, you can bump allocate them and free them as a single operation (no pointers from the outside means bulk free is safe). If you know you don’t have cycles, you can use reference counting (which can be non-atomic, because regions don’t permit concurrent mutation). If you want cycles, you can use a tracing GC (which doesn’t have to trace all of memory, only the region, and which is safe from concurrent mutation). And it’s a convenient hook to attach sandboxing to.

        3. 2

          It’s nice to see a CHERI-related link posted here almost every week. The project is definitely making a lot of progress.

          1. 7

            As the person leading a commercial RISC-V CHERI project, it was very nice a few weeks ago to see the UK Government’s silicon strategy document highlight RISC-V and CHERI as their two key areas for investment.

        🇬🇧 The UK geoblock is lifted, hopefully permanently.