1. 64

  2. 13

    Nice article!

    To C programmers, this can sound like a downside of Rust - there are many places in real-world code where you do in fact statically know that your index is in-bounds

    It’s worth mentioning that Rust is able to elide bounds checks in a lot of cases: but that this is a product of the optimiser, not a static guarantee.

    1. 13

      This isn’t specific to Rust. LLVM is very good at this kind of elision, it’s part of the scalar evolution framework that is able to compute the range of possible values for a loop induction variable, which then lets the check know that it will pass for all possible inputs if the upper bound can be shown to be less than or equal to the length. The same benefit is visible with C++ bounds-checked types and with other safer front ends.

      That said, the folks at Kings College London who are working on CHERI Rust reported about a 10% speed up from a compiler mode that disables all dynamic memory-safety checks, geomean across the benchmarks that they ran, so I’m not sure I believe this except for codebases where the bounds are all very static. That’s probably the case for encryption libraries where basically everything except the input and output are fixed length, but encryption libraries are not even slightly representative of other code.

      That’s not to say that you shouldn’t use Rust. If you write correct C code then you will need to manually insert bounds checks in almost all of the same places and the performance delta will be much less (and probably favour Rust because the type system can communicate more information to optimisers).

      1. 3

        It’s a huge part of a a small but important problem I have rust: it relies extensively on llvm to make what it does possible. Without llvm’s “magic” there is no way that rust could compete with any other language, let alone C and C++ (in terms of performance). The compiler emits so much unoptimized, redundant IR that’s chock-full of branches and relies on llvm to figure out what can be stripped altogether. Unfortunately every time rust updates its llvm backend, there are regressions (and wins!) in terms of codegen. There are no guarantees in the way that llvm process IR.

        1. 5

          I think that’s unavoidable for any nontrivial language. Clang has the same problem, the difference is that LLVM developers are more likely to test clang and there are some vaguely decent C/C++ benchmark suites. Rust could significantly improve this by having a nightly build not that built and ran a load of Rust code with LLVM trunk and reported regressions.

          So far, I’ve seen very little engagement from the Rust compiler folks with LLVM upstream. There have been several threads on subject like pointer provenance that will directly impact Rust, with no on saying ‘I work on the Rust compiler and this is what would work for us’.

          1. 2

            Without llvm’s “magic” there is no way that rust could compete with any other language, let alone C and C++ (in terms of performance).

            AIUI without LLVM’s “magic” there is no way C and C++ would compete in terms of performance.

        2. 1

          When you say “a lot of cases”, which cases do you have in mind? I read about Rust’s ability to elide bounds checks but so far I’ve never actually seen it happen. I don’t often program in Rust though.

          1. 8

            one case is when you have two array indexing operations in the same function where one index is known to be at least as large as the other, for example: https://godbolt.org/z/Y3sbhe9YG (note the cmpq $4, %rsi followed by the jbe to the panicking branch only appears once despite two indexing ops). At the end of the day this is “just” a pretty simple local optimization as far as LLVM can be concerned

            1. 3

              Thank you. A Compiler Explorer example too - perfect!

              Update: I wonder how often that situation comes up in practice. Do you need to have both indices be hard coded? Rust defines integer overflow in release mode as twos-complement wrapping, so you can’t be sure that x+1 is at least as large as x.

              This has two bounds checks: https://godbolt.org/z/rbGWhMrEY

            2. 1

              I have not confirmed this but I am pretty confident that if you iterate over an array you don’t have bounds checks, because the iteration is already bound.

              EDIT: here’s how this happens for vectors, this is very hard to grok but basically when iterating over a vector the code doesn’t do bounds checking (because of the idea that we have already confirmed that the vector has N elements). So a lot of code that is “do thing over X collection” just completely removes these checks!

              I do think that a lot of this code is… like maybe Rust is expecting the compiler to do a lot of work to simplify a lot of this.

              1. 1

                Yes, using iterators is the only situation I’ve ever come across before in my own Rust programming where there are no bounds checks. However, I thought that was because Rust didn’t inject bounds checks in that situation, rather than because they were added but later elided.

          2. 11

            Very nice. I wonder if we can get something similar for integer overflow checks? I know they’re enabled by default in debug builds and disabled by default in release builds, but I’m curious how big a difference they actually make or whether, like bounds checking, it’s usually optimized out?

            1. 18

              hmm i might actually do this. now that I have the setup to run these benchmarks it’s likely pretty easy to do more experiments. Thanks for the idea!

              1. 4

                I’d love to read this! Would you mind sharing it here if you decide to do it?

                1. 8

                  definitely will

                2. 2

                  Go for it! Apparently there are compiler flags to force overflow checks on or off without needing to hack the compiler itself (force-overflow-checks=off? [profile.release] overflow-checks = true?) but I don’t know exactly what the implications of those are or whether they still exist.

              2. 10

                This matches my experience - the lack of real world cost to bounds checking let us make the webkit vector types bounds check on indexed access with literally zero impact in any benchmark.

                In general loops that are small enough to make bounds checks directly problematic are rare, and the cases where bounds check can meaningfully hurt are things like autovectorisation which remains remarkably easy to break in other ways anyway.

                1. 5

                  Great article!

                  If one wanted an actual conclusive number, how might they get it? Higher sample sizes? Longer run time?

                  I also wonder how much LLVM’s own optimization and assumptions are in play, and whether one might fork LLVM to take out bounds checks instead.

                  1. 2

                    I’d start with a better representation of the results. Definitely more than one number. Criterion.rs can handle different distributions and outliers for example. Step 1 is basically “show me the distribution of the results”.

                  2. 4

                    I find the fact that things are performing worse after disabling bounds checks unsatisfying. IME, it’s not uncommon to get worse numbers after an optimization for a few runs but you should be able to at least get the same numbers as before after a few more runs, provide the optimized version does strictly less work compared to the original (as is presumably the case here). I think something else is going on here that we don’t understand.

                    EDIT: here is one theory that tries to explain the slowdown.