1. 39
  1. 5

    Store forwarding! A rare and highly unpredictable adversary.

    I wonder if it would have run as fast as the 32 byte version if the compiler emitted either a non-inlined call to memcpy or even seven 4-byte MOVs instead of two 16-byte MOVs? I would kinda expect this to be limited by memory bandwidth either way?

    1. 4

      This was a very good and in-depth post. I learned that perf had a ton of counters. Thanks!