So are all of these microcode optimisations and little bits of info about bottlenecks and other optimisations all useless if this stuff doesn’t get into compilers…?
Well, it’ll be immediately useful to anyone to anyone who bought a Ryzen and wants to hand-code something that will run as fast as possible on the machine on their desk. ☺ It’s possible for compilers to emit code that fits within these envelopes (and knowing what the target envelopes are makes it easier to inspect generated code, even if you’re not planning to write the assembly yourself by hand).
Production compilers do end up with microarchitecture-specific optimisations in them after a while. gcc has -mtune= and -march= options for specific microarchitectures. They seem to get added pretty quickly, but I don’t know how much difference they make in practice compared to baseline -mtune=generic. (Does Intel have their own employees submitting patches to gcc so that their newest CPUs will look good when people run software compiled with gcc on them?)
It’s interesting to read about what are the differences between microarchitectures that make them achieve different amounts of work per clock cycle in practice. :)