    I am not sure if the number of executed instructions is a good proxy for performance given the presence of cache misses and different execution times for different instructions. I wonder if perf shows the number of machine code instructions or the number of actual uops executed and how it handles REP-prefixed instructions?