1. 78
  1.  

    1. 26

      Mentioned in a side comment, but worth linking directly: std::ptr now includes a thorough discussion of the stabilized strict provenance APIs. From its introduction there:

      “Strict Provenance” refers to a set of APIs designed to make working with provenance more explicit. They are intended as substitutes for casting a pointer to an integer and back.

      Entirely avoiding integer-to-pointer casts successfully side-steps the inherent ambiguity of that operation. This benefits compiler optimizations, and it is pretty much a requirement for using tools like Miri and architectures like CHERI that aim to detect and diagnose pointer misuse.

      One of the legitimate critiques of Rust (including especially from folks who like Zig, but also more generally) has been that writing safe unsafe Rust has been difficult—unergonomic, difficult to check that you got it right, and generally second-class. This is far from fixing that, but it’s a useful step in making it easier to write safe unsafe Rust, so I’m excited about it!

      1. 5

        The API really helps sanitizers and with documenting intent, but does it help with unsafe ease of use? Things like p.addr() vs p as usize and with_exposed_provenance(i) vs i as *mut _ seem mostly for annotation/intent purposes. Maybe p.map_addr(|i| i | bits) is nicer for pointer tagging, but the most common “unsafe is hard” interaction is often between raw pointers and references when it comes to aliasing, provenance, and boilerplate.

        1. 16

          I’m not sure if it helps ergonomics (I think it does, but I’m very biased) but it definitely makes it easier to support CHERI. On a CHERI system we represent pointers as CHERI Capabilities, which are double the size of an address and contain an address, base, top, and permissions, along with a non-addressable validity (tag) bit that indicates whether something really a capability (write data over any byte of a a capability and the tag bit is cleared. The only way you can set it is by starting with one of the primordial capabilities and doing valid operations that don’t violate memory safety).

          In C, you can cast a pointer to intptr_t, do arithmetic on it, and then cast it back (not: the arithmetic bit is actually implementation defined but practically you don’t have a usable C/C+. Implementation if it doesn’t work). For this to work, we also represent this type as a capability. When you do arithmetic in it, we extract the address, do the arithmetic, and then reinsert the address. This is awkward because a + b is now actually extract the address from a, add b and then set the address of a to the result (which might now result in out-of-bounds and therefore unusable capability). Swap the operand order and the operation changes. We have some compiler warnings to try to catch this.

          C is sufficiently expressive for all of the things that we want to do, because it has intptr_t (pointer type on which integer operations are defined), size_t (size of an object), and ptrdiff_t (difference between two pointers.

          Rust, unfortunately, only has usize and isize. These correspond to (the unsigned and signed variants of) both intptr_t and size_t. If we want to support round tripping pointers, we have to lower usize to a capability, but that doubles the size for all uses, and most uses are things like array indexes, which are just integers. Strict provenance fixes this, we can lower it to an address-sized integer and use std::ptr for pointers. It also fixes the argument-order problem that Rust with usize and C with intptr_t have: it is now unambiguous that the std::ptr is the capability and any other operand is explicitly an integer.

          It is now ten years since the first discussion I had about this problem with Rust folks, so great to see it in. I’m looking forward to getting Rust support in CHERIoT later in the year.

          The strict provenance model maps 1:1 to CHERI LLVM IR, which (modulo a little bit of folding) maps directly to the ISA, so (at least for us) it is far easier to understand what the hardware is doing by looking at the source code for Rust with strict provenance.

          1. 4

            The strict provenance model maps 1:1 to CHERI LLVM IR, which maps directly to the ISA

            Does this mean that all safe uses of these APIs in Rust produce legal CHERI capabilities map to the same model that CHERI uses, or just that they have a mapping to the ISA?

            The example I’m trying to understand is that you could use ptr::with_addr and ptr::addr to zero and then restore a capability’s address, but it won’t be the exact same capability anymore because if you go far enough away the bounds lose precision.

            1. 8

              Does this mean that all safe uses of these APIs in Rust produce legal CHERI capabilities, or just that they have a mapping?

              Anything that you can write with these is lowered unambiguously to CHERI machine code in such a way that you can look at the machine code and understand what the source looked like. Anything that uses these and is memory safe will run on a CHERI system without trapping (within the representable bounds restrictions).

              In Rust, unsafe code is code where the programmer, rather than the compiler, is responsible for enforcing some memory-safety and type-safety properties. This gives the compiler enough information that the hardware can enforce the memory-safety ones and a subset of the type-safety ones.

              The example I’m trying to understand is that you could use ptr::with_addr and ptr::addr to zero and then restore a capability’s address, but it won’t be the exact same capability anymore because if you far enough away the bounds lose precision.

              Yes, that will result in an invalid capability. That’s unfortunate, but the tradeoff we had for capabilities to be double, not quadruple address size (logically, they contain three addresses: the top, base, and address. In practice there’s redundancy between these for all in-bounds cases and so we can compress a lot, at the expense of not being able to represent all out-of-bounds cases and not being able to precisely represent all sizes of object). This means that, if you set the address to zero, you’re pretty-much guaranteed that the capability will be invalid.

              As a rule of thumb, try to avoid taking pointers out of the bounds of the original object with ptr::with_addr.

              You could set the address to the object’s base and then reset it later, that would be fine. I’m not 100% sure what is and isn’t defined with std::ptr, but my expectation is that you are responsible for ensuring that a pointer is valid at the end of the unsafe block, in which case a compiler could potentially track the address independently and combine only before dereference and at the end of the block. That would be a less obvious lowering though.

              1. 4

                but my expectation is that you are responsible for ensuring that a pointer is valid at the end of the unsafe block, in which case a compiler could potentially track the address independently and combine only before dereference and at the end of the block.

                Not so, actually. Creating a raw pointer and doing stuff with it is 100% safe Rust, but dereferencing a raw pointer can only be inside unsafe blocks. unsafe doesn’t change what the compiler does or does not track. Raw pointers in Rust don’t have to be valid, except for when dereferenced, so the unsafe block is essentially there for the programmer to promise that they acquired the pointer from a legitimate source when they try to actually use the pointer.

                Regarding out of bounds pointers, the documentation still states that using .wrapping_offset() to bring a pointer way out of bounds is A-OK as long as you bring it within bounds again before derefenencing. Probably this needs to be amended to state that this may not work on platforms that strictly enforce provenance.

                1. 2

                  Okay, that makes sense. So the strict provenance API maps well to the ISA but the strict provenance model doesn’t? i.e. in the zeroing example the ptr has the same strict provenance in Rust by the end, but a different CHERI provenance?

                  EDIT: After rereading your last paragraph, I suppose this point depends on how far the language can keep track of the ptr so it can avoid invalidating it intermediately. I think this depends on the condition that using an invalid/OOB pointer is UB so it doesn’t matter if the provenance is different in the ISA for that case.

                  1. 4

                    Yes, I’d need to reread the final version to make sure I fully understand what is / isn’t allowed in the Rust strict provenance model. There are probably corner cases beyond the one that you describe where it is specified in Rust but would not work on a CHERI system, though I would expect them to be fairly easy to avoid.

                    The weird corner cases are things like per-CPU storage. In the FreeBSD kernel, this is built by partially reassociating the arithmetic to make it faster on the common path (I suspect this is pointless on modern CPUs, but it made a difference 20+ years ago). This kind of thing doens’t work on CHERI or Rust with strict provenance.

                    Do you have a use case for wanting to set the address of a pointer to zero and then set it back to something sensible later?

                    1. 2

                      Do you have a use case for wanting to set the address of a pointer to zero and then set it back to something sensible later?

                      No, I think you’re right that this is just an edge case that’s unlikely to come up or is at least easy to work around if you know it’s happening.

                      1. 2

                        Do you have a use case for wanting to set the address of a pointer to zero and then set it back to something sensible later?

                        Wouldn’t stealing high bits of a pointer (because they are unused on 64-bit) have the same problem? That is, taking the pointer value out of bounds?

                        1. 4

                          Yes, though Morello does support this in TBI mode: the top eight bits are ignored for bounds calculation and so you can put things in them (there are also explicit set-flags and get-flags instructions to operate on these bits).

                          64-bit virtual address spaces are coming to everyone’s roadmap and MTE also uses them (which limits you to a 60-bit virtual address space), so stealing top bits is increasingly a bad idea. In the CHERI+MTE composition, four of these bits are used for the memory colour, so also can’t be used.

                          Stealing the low bits is fine, and these are under your control because you can determine how strongly aligned the object is.

                  2. [Comment removed by author]

                2. 8

                  I think @david_chisnall covered a lot of it well in the sibling thread, but I will add that I think there are a bunch of things which go into “ease of use”, and they include things like sanitizers and documenting intent. Combined with the &raw const and &raw mut syntax for raw pointers introduced in Rust 1.82, Miri can do a lot better than it could before because it has provenance available. So it depends on how you’re thinking about “ease of use”. My original comment was thinking about writing unsafe Rust as an end-to-end experience, because I think that is how you should think about it—just like that is how you should think about writing C or C++ or Zig or Odin!

                  Meanwhile, those new bits of syntax themselves are both easier to use than casts (as well as reading more like the rest of the language!). They aren’t in this release, so I didn’t mention them directly here, but they are directly related to it, and they are “ease of use” features from an authoring POV. The upside of Rust’s rolling release cadence is that these things just keep shipping; the downside is that there is not usually a big bang release where there’s a significatn shift. It’s just the steady accumulation of improvements, so when you look back to even six months or a year ago, things are significantly better on a number of these axes. I expect that to continue to be the case.

              2. 20
                • Cargo considers Rust versions for dependency version selection
                • Migration to the new trait solver begins
                • Strict provenance APIs

                A small version, but quite impactful for the future of the toolchain.

                1. 19

                  Having rust-version in Cargo.toml will become more important.

                  If you don’t know what version your crates require, check releases page on lib.rs. I guesstimate the MSRV based on cargo check, clippy, and dependencies, e.g.: https://lib.rs/crates/vergen/versions

                  And if you’re not sure how old Rust you should support, check the crates.io request stats: https://lib.rs/stats#rustc-usage

                  1. 3

                    That’s a neat graph. I think when I publish crates, I use whatever my current toolchain is (usually pull in latest stable within a week).

                    1. 1

                      Wow. This awesome! I need to show it around.

                    2. 5

                      This is curious:

                      The new resolver will be enabled by default for projects using the 2024 edition (which will stabilize in 1.85).

                      When verifying the latest dependencies in CI, you can override this:

                      $ CARGO_RESOLVER_INCOMPATIBLE_RUST_VERSIONS=allow cargo update
                      

                      If I am reading this correctly, the ecosystem property where the set of latest versions is always CIed together will weaken after it.

                      1. 4

                        I’m curious how the MSRV-aware resolver will cause crate authors’ behavior to change over time. In theory it means they can move their MSRVs a little more aggressively, though it opens up thorny questions about backporting security fixes and such.

                        1. 5

                          From a NixOS package maintainer’s point of view, it’s very appreciated if Rust applications do not immediately bump their MSRV to latest stable. We can’t always immediatley keep up with Rust releases and sometimes also cannot justify backporting a new stable release to still supported stable NixOS versions, as was the case with 1.80 which introduced a breaking change in type inference.

                          1. 5

                            There is no inherent reason you must compile different binaries with the same Rust version, though. That is a self-imposed restriction.

                            (My projects which are packaged in NixOS will stay at stable-2 for the most part.)

                            1. 3

                              Do you have tooling for automatically evaluating upgrading (and removing incompatible reverse-deps) versus staying behind (and potentially keeping the bugs that were fixed upstream)?

                              From what I have seen in the Haskell part of Nixpkgs, this is an extremely manual process and I can’t believe people are just doing this work for years on end without improving the tooling. Makes me think that Nix must have bad extensibility since it is so difficult to automate.

                          2. 1

                            Do I understand correctly that under strict provenance, the old JNI technique of converting a native pointer to a Java long, then converting that back to a native pointer, no longer works? I guess the alternative is for the long to be an index into a global array or a key into a global map. Yes, ideally we’d use the Java FFM API everywhere, but there’s Android.

                            1. 9

                              no longer works

                              Strict provenance is an “overly strict” memory model. It doesn’t say anything doesn’t work, it only says “these things definitely do work”. Like Celeritas says you can still step outside of the strict provenance and use exposed provenance exactly as you were before (and in fact code that doesn’t change to the new APIs does that by default).

                              Alternatively, at least on platforms with 64 bit pointers, you could probably have the rust definition of struct JavaLong to be struct JavaLong(*mut ()) and implement all the usual integer methods on that. The principle here being that “pointers can do everything integers can do, and optionally carry strict-provenance information”. What benefit this has when you’re effectively exposing provenance anyways the second you pass the pointer over the FFI is… unclear.

                              1. 5

                                It still works so long as use expose_provenance and from_exposed_provenance to handle the round trip.