This is a really good overview. The one thing that might be worth adding is that some CPUs will do remote stores. Rather than claiming the cache line and having it bounce back, they will send the value over the cache network and write it directly into the remote cache line. This is a huge win for producer-consumer things where the producer is not reading the cache line that it’s reading from but makes the false sharing even harder to reasons about.
Thanks for the feedback, I’ll make sure to edit my post. The mechanism you described sounds like a more powerful version of non-temporal stores. These μarch details are really intriguing.
This is a really good overview. The one thing that might be worth adding is that some CPUs will do remote stores. Rather than claiming the cache line and having it bounce back, they will send the value over the cache network and write it directly into the remote cache line. This is a huge win for producer-consumer things where the producer is not reading the cache line that it’s reading from but makes the false sharing even harder to reasons about.
Thanks for the feedback, I’ll make sure to edit my post. The mechanism you described sounds like a more powerful version of non-temporal stores. These μarch details are really intriguing.
Which CPUs?