1. 11
  1.  

  2. 2

    C++ DoNotOptimize in the article is available in Rust standard library as std::hint::black_box, with essentially same implementation.

    1. 2

      I don’t think I agree with the premise of this article. Starting from this code:

      bench_input = 42;
      start_time = time();
      bench_output = run_bench(bench_input);
      result = time() - start_time;
      

      The compiler may not move the benchmark call before the first time call unless it can prove that this move is not observable within the language semantics. If time is a system call, it’s game over: it may modify any global that run_benchmark reads. If it has complete visibility into the benchmark and the benchmark doesn’t read any globals then that may be fine.

      The last transform, completely eliding the benchmark run because it can be shown not to have side effects and its result is unused, is far more plausible but that’s also generally an indication of a misleading benchmark. Especially in this example where the input is a compile-time constant: even if you do use the result, the compiler is free to evaluate the whole thing at compile time. Even if it doesn’t, it may generate code that assumes more knowledge of the input than is true in the general case.

      The DoNotOptimize function is doing two things:

      • It is a compiler barrier or, in C++11 atomics terminology, a signal fence. It prevents the compiler reordering things across the boundary.
      • It causes a value to escape from the compiler’s ability to analyse. This is slightly scary because LLVM actually does look inside inline assembly blocks in back-end optimisations and there’s no guarantee that it won’t in the future look there in mid-level optimisers. These would be free to observe that the instruction sequence there (no instructions) have well-defined semantics and do not capture or permute the value and so optimise this away.

      You can do both of these without inline assembly. The signal fence is in <atomic> (C++) or <stdatomic.h> (C). The second is a bit more tricky but you generally need to either call a function that the compiler can’t see (difficult with LTO) or store to, and read back from, an atomic variable.

      1. 1

        it may modify any global that run_benchmark reads.

        I think the concern is that run_benchmark might not read any globals.

        It’s also not completely out of the question that the compiler has a special understanding of time() and knows that it doesn’t touch program globals. As far as I know that’s not the case at the moment, however (and if it was you might need further steps to guarantee that nothing got moved out from between the two calls to time()).

        If it has complete visibility into the benchmark and the benchmark doesn’t read any globals then that may be fine.

        Exactly :)