1. 39
    1. 5

      Store forwarding! A rare and highly unpredictable adversary.

      I wonder if it would have run as fast as the 32 byte version if the compiler emitted either a non-inlined call to memcpy or even seven 4-byte MOVs instead of two 16-byte MOVs? I would kinda expect this to be limited by memory bandwidth either way?

    2. 4

      This was a very good and in-depth post. I learned that perf had a ton of counters. Thanks!