1. 13
  1. 2

    I think it would be better to relpace xor rax, rax with either mov rax, 0 or xor eax, eax; the former is clearer, while the latter is more optimal.

    1. 3

      The second is definitely better than the first. Most (all?) Intel CPUs have special handling for the xor of a register with itself that isn’t always (ever?) present for the mov version. The xor trick was popular on the 8086 because it was the shortest encoding and so Intel has optimised it as the canonical version. On vaguely modern Intel CPUs, it is handled entirely in decode and just updates the entry in the register rename engine to point to a canonical zero physical register. As well as being fast, this can also be used to break false loop dependencies. If a register is live at the end of a basic block (i.e. where there’s a branch) then the CPU has to store the value for all speculative execution until the branch is resolved and no longer speculative, even if the correct path does not use that value. If you put an xor at the end of the block, the CPU does not consume a rename register. This doesn’t matter in most code but it can have a huge impact in loops: If a loops is short then the CPU may be executing 10 iterations of it speculatively and needing to keep 10 intermediate values live just in case it needs to unwind execution and resume from the loop-exit path where the register might be used. Rename register exhaustion can cause serious performance impacts. In worst case of this I’ve seen, adding a single xor doubled the performance of a hot loop.

      1. 2

        The second is definitely better than the first.

        For performance, yes; indisputably. However I still suggested it for pedagogical reasons, as the OP’s article seems to be an introductory piece.

        1. 1

          for those interested, more details on the special handling of xor can be found here: https://randomascii.wordpress.com/2012/12/29/the-surprising-subtleties-of-zeroing-a-register/

        2. 1

          optimal from code size perspective because of the REX prefix needed for 64 bit regs?

          1. 1

            Right, yes.

            Since any operation on a 4-byte register will clear the upper 4 bytes of the corresponding 8-byte register, you don’t need to bother explicitly clearing the entire thing.

            (A related note is that if you do need an 8-byte register, it’s better to use r8-r15, since you need a REX to access them anyway.)