Some of the numbers given seem pretty off – my Haswell running Linux 4.6 can do a getpid (a real kernel syscall getpid, not a libc-cached one) in about 35ns (~100 cycles), a solid order of magnitude faster than listed. (I don’t have any measurements on hand, but the function call numbers look pretty pessimistic to me too.)
Also, the discussion of __builtin_expect()’s effects w.r.t. branch prediction are, I think, kind of missing the intent. Perhaps there’s some obscure, little-used old feature lurking in there that I’m not aware of, but I don’t think there’s anything in the x86 instruction encoding that the compiler could use to explicitly “hint” to the hardware which outcome is likely for a given branch. Rather, I think the primary intent of the programmer explicitly hinting to the compiler whether or not a branch is likely to be taken is that the compiler can keep the hot/likely code paths bunched together and segregated from the cold/unlikely ones for better icache utilization (i.e. so you’re not wasting cache space pulling unlikely-to-be-executed instructions into L1 just because they happen to be sprinkled through the same cache lines as the hot path).
There are x86 branch hints but no compiler I know uses them and modern cores ignore them; they’re largely deprecated. (0x2E/0x3E)
I would love to have this as a poster. If the author is reading: Is this available as a poster or as an SVG?