One question which I didn’t see explicitly addressed there is, are they all using the same random numbers?
Ideally for benchmarking I think you want every impl to use the same deterministic RNG and seed value. Otherwise you could get confounding from, say, one impl with a bad RNG that generates the same few numbers very often causing the same entries to get hit repeatedly, making the cache hit ratio better
I didn’t. The reasoning behind it is that the standard deviation is very small for almost all cases and in TruffleRuby’s case the big stddev has to do with de optimizations. So, I didn’t deem it necessary as the recorded values are so stable which I guess is because the law of big numbers (every single iteration basically does 1000 small iterations/playouts) and it kind of evens out. Couple that with lots of iterations/time and it should be fine here I think, although I agree as a best practice in general.
On the other hand, if one locked down the rng seed to a specific value it also has the problems (imo) that it might never trigger some edge cases (which would potentially benefit the JITs).
So, overall I don’t think it’d make any discernible difference here.