I guess I don’t know enough about CPU architecture as I can’t understand the reason behind this change. Could someone explain why Intel would want to increase so dramatically the number of cycles a pause takes? Is it meant to be an efficiency tradeoff that means fewer explicit pauses while waiting for locks?
Indeed a timing change like this is normally due to power efficiency constraints or targets. I’d conjecture that in their internal evaluation benchmarks, Intel decided that this allowed their cores to more aggressively drop to a lower power state while seeing an acceptable performance loss (which is exactly what Intel’s docs say, shown in the article). It would seem that the .NET spinlock implementation was dependent on knowing the latency of pause instruction. I wouldn’t call this a hardware performance regression. It just looks like software didn’t support the hardware well yet, and soon there will be official support by MS. It’s still a well done exploration into the performance regression of that workload.
EDIT: as someone pointed out in the HN thread, the change in cache configuration in Skylake is another possible (and probably bigger) motivation for changing the pause latency. He points out that specifically a dirty read from another core’s L2 has increased latency compared to previous gen’s dirty hit to the inclusive L3. I’d assume a shared hit wouldn’t be that much better.
EDIT2: DVFS latencies are on the order of ms for Intel speed shift, orders of magnitude too large to be useful in this context. The “small power benefit” mentioned would just be the reduction in dynamic power from the reduction in spinning.