1. 14
  1. 2

    One question which I didn’t see explicitly addressed there is, are they all using the same random numbers?

    Ideally for benchmarking I think you want every impl to use the same deterministic RNG and seed value. Otherwise you could get confounding from, say, one impl with a bad RNG that generates the same few numbers very often causing the same entries to get hit repeatedly, making the cache hit ratio better

    1. 2

      Great question!

      I didn’t. The reasoning behind it is that the standard deviation is very small for almost all cases and in TruffleRuby’s case the big stddev has to do with de optimizations. So, I didn’t deem it necessary as the recorded values are so stable which I guess is because the law of big numbers (every single iteration basically does 1000 small iterations/playouts) and it kind of evens out. Couple that with lots of iterations/time and it should be fine here I think, although I agree as a best practice in general.

      On the other hand, if one locked down the rng seed to a specific value it also has the problems (imo) that it might never trigger some edge cases (which would potentially benefit the JITs).

      So, overall I don’t think it’d make any discernible difference here.