This is all interesting, and I’m glad it’s documented because some of the gotchas are worth remembering, but I fear this often gets oversimplified and people run away from using rdtsc even when it would work just fine. Like the possibility that the cycle counter on a newly attached hotplug CPU may not be synced is something that has never once concerned me, but it’s the kind of objection that always seems to turn up.
On a platform with a gethrtime() or clock_gettime() that doesn’t need a context switch to operate, there’s precious little need to muck around with the TSC directly at all. Despite what the article says, I struggle to imagine an application that needs this operation to be faster than the 15-40ns it takes on a modern system.
If you’re doing instrumentation, it’ll be around 10-15ns to increment a counter in the most efficient thread-safe ways. If you add 15-40ns because it’s timing something, that’s quite a hit in an inner loop - though you’d need to be running it millions of times per second for it to matter.