The Rust version also went from 23.7 seconds to 23.1 seconds. I’m not sure whether this is a Rayon vs CpuPool thing or whether it is due to printing out the results in parallel with the computations.
A thing that I’ve seen is that if you time programs that are printing to a terminal, you can get some mildly confusing effects where writing to the terminal all-in-one-go takes a notable amount of wall clock time but negligible user and system time. Happens if the terminal is ssh over a slow network or if the terminal’s implementation is itself surprisingly wasteful of CPU. Some virtual terminal implementations like ansi-term buffers inside Emacs can burn enough CPU to be noticeable on benchmarks that print a lot of text but not enough that the CPU cost is ever noticeable in normal usage. Easy way to rule this out is to redirect output to a regular file.
I’ve definitely seen terminal I/O in a benchmark dominate over the code being benchmarked.