    Although I haven’t finished reading this, it does seem like he’s provided ample evidence that the title is true!

    It’s not quite accurate that pushing and popping on the F18 doesn’t cause any data movement. There are dedicated registers for the top two stack elements; that’s why there are 10 stack entries, not 8. Keeping at least the top-of-stack element in a fixed register will probably produce tighter, and maybe faster, assembly. (Although I guess if you’re specializing 8 versions of every machine-code subroutine, that may not be among your top priorities!)