1. 6
  1. 3

    Not visible in the link, but it looks as if this was published at ISCA, which explains why it doesn’t appear to have been reviewed by any security experts. If I had been a reviewer, my main objections would have been:

    • There are exploits being deployed in the wild against PAC that rely on copying valid return addresses over other valid return addresses to hijack control flow. These attacks are easier against this scheme (a valid return address with their scheme is a valid return address, with PAC it’s only a valid return address at a specific depth) and with PAC they’re already sufficient for arbitrary code execution. It may be that you can’t find the copy-return-address gadget with their code, but it’s not clear how you’d implement exception handling if you didn’t have it.
    • There’s prior work in the CPI space that demonstrates that, without memory safety, any CPI scheme can be bypassed fairly easily (see: ‘Missing the point(er)’, one of my favourite paper titles).
    • C requires type-oblivious copying: you must be able to copy pointers as data if you want an even vaguely efficient C implementation (if you can’t memcpy an arbitrary struct, you can’t implement a C compiler that works with non-trivial code).
    • C allows void* and function pointer types to be different but POSIX’s dlsym effectively mandates that they be the same for any POSIX system and this, in turn, requires POSIX C implementation to support carrying code and data pointers in void* variables. A lot of non-trivial C/C++ codebases rely on storing a code pointer into a data-pointer-typed variable and loading it back, code in the middle loads and stores it as a data-typed thing.
    • Real-world C code uses intptr_t (if you’re lucky, long if you’re not) to store pointers in memory and do arithmetic in pointers (the most common case is clearing the low bits to guarantee alignment), it’s really unclear how this works with those idioms.
    • Real-world software steals the top bits from pointers (e.g. NaN boxing), so a technique that relies on silently modifying these bits on L1 eviction is likely to be very exciting to debug.
    • I don’t think their metadata encoding actually works in the presence of an attacker. It uses only one pointer bit to distinguish between ‘contains extra inline metadata’ and ‘doesn’t contain inline metadata’. If an attacker can control the layout of a cache line then any possible interleaving of pointers and data are possible. Creating a data bit pattern that looks like a pointer seems possible. As an attacker I can control the first 8 bytes of a 64-byte cache line. Their one-bit metadata tells the L2 -> L1 logic that there’s some metadata there. I can craft my 64-byte value so that it looks like a valid code pointer (for example) and they will happily mark that in L1 is a code pointer and the real code pointer in the second 8 bytes as plain data.
    • Wrapping free and clearing metadata in the entire allocation is incredibly bad for allocators on systems that do lazy commit (i.e. all modern operating systems) because it will trigger page faults on free: if you allocate 1 MiB, touch the first 4 KiB, and then call free, the wrapper will cause all of the rest of the allocation to be allocated by the OS. This isn’t done but the SPEC suite, but it’s done depressingly often in real C/C++ code.
    • I’m really surprised their compiler works, it almost certainly doesn’t do what they think it does. The ElementType for a pointer in LLVM is not guaranteed to be the same between stores and loads and often isn’t for unions and for C++. It’s also going away soon because it doesn’t provide any meaningful information but it leads people to think that it does. Looking at their eval, it doesn’t look as if they actually tried to run the code on any realistic simulator, so who knows what it actually did. Their compiler is enforcing some property, but there’s no real evidence that it’s CPI.
    • SPEC is an awful workload for evaluating this kind of thing. First, it actually contains several memory safety bugs, so if you can run it without errors then that’s a big red flag. Second, it doesn’t test any of the exciting corner cases of C/C++/POSIX that make these schemes actually difficult.
    • Their setjmp / longjmp looks as if it would prevent return address tampering, but not protect forward-edge code pointers (or data pointers), so an attacker can tamper with any pointer in a jump buffer.

    I’ve been on programme committees that have rejected better hardware security papers than this. It’s representative of a common set of things that really bug me in academic work: The rush to submit a paper and the lack of RSE funding in grants means that they weren’t able to do the engineering work that’s required to properly evaluate it with a large corpus of real-world C/C++ code and a proper red team, so they never hit any of the corner cases that break this kind of scheme. This composes with our field’s refusal to publish negative results means that they have no incentive to take it to that level because if they find problems then they can’t publish. I would love to read a paper that had done a proper analysis of this and shown why it didn’t work, but ISCA would never accept that paper.

    Oh, and on a personal note, their evaluation of CHERI appears to be based on the ISCA paper from 2014 and not any of the follow-on work (hint: capabilities haven’t been 256 bits in CHERI since around 2016).