See https://buttondown.email/nelhage/archive/f6e8eddc-b96c-4e66-a648-006f9ebb6678 or https://www.sqlite.org/cpu.html for an explanation of why you might want this.
Inspired by nelhage’s article, I did a writeup here: https://pythonspeed.com/articles/consistent-benchmarking-in-ci/
I have started using my own manually created Cachegrind benchmarks in an actual project, and it’s pretty good. Took a bunch of work to get consistency (e.g. consistent hash seeds, using Conda-python so it’s exact same code), and even then for some reason I’m getting 0.3% difference between running in GitHub Actions and running on my computer (different Rust version? Minor Valgrind version differences? different Linux or glibc versions?). But this is different hardware, so that small of a difference is actually quite good.
Do people still use coz? I never hear about it.
Nice! Is there a reason why cachegrind instrumentation and one-shot benchmarks couldn’t be added as a mode to criterion-rs rather than a separate project? It’d be nice to be able to freely switch between wall-clock measurements/stats and cachegrind cycle estimates.
(Author here) The design of a good Cachegrind-based benchmark is different from the design of a good Criterion.rs benchmark so I thought it would be more confusing than helpful to combine the two. I do intend to allow cargo-criterion to run both types of benchmarks in one run though (it can do so right now, but it won’t do anything special with the results from Iai benchmarks).
The author has a ticket for it: https://github.com/bheisler/iai/issues/3