The microsecond scaling (multiply by a frequency and divide by 1 million). This 32-bit CPU lacks a dedicated division instruction, so the long division on 64-bit integers indeed generates quite a lot of code.
However, the division in hertz * us / 1_000_000 can traditionally be optimized by the compiler because the divisor is a compile-time constant. In this case, the division can be replaced by a (widening) multiplication followed by a right shift (https://en.wikipedia.org/wiki/Division_algorithm#Division_by_a_constant), which is much less code and more efficient than the traditional long division algorithm.
The proposed subtraction loop is sub-optimal, and even though this CPU lacks 128-bit multiplication (perhaps the reason why LLVM didn’t try to optimize this division), the following “magic” expression generates less code (whether optimizing for speed or for size): https://godbolt.org/z/qY9v7qh3a
fn div_by_million(x: u64) -> u32 {
((x as u128).wrapping_mul(4835703278458516699) >> 82) as u32
}
The early return in print_full_process(), where if true leads to dead code elimination. However, the existing CONFIG.debug_panics is a const expression, so setting that to false should likewise cause the whole function to be treated as dead code. And being able to eliminate dead code was part of the discussion when this config was added (https://github.com/tock/tock/pull/1443).
Here I wonder if the authors didn’t try to change this config (arguably, it’s not trivial to find how to change Tock’s config among all the Makefiles), or if the compiler failed to optimize it away (more concerning).
Two things caught my eye.
The microsecond scaling (multiply by a frequency and divide by 1 million). This 32-bit CPU lacks a dedicated division instruction, so the long division on 64-bit integers indeed generates quite a lot of code.
However, the division in
hertz * us / 1_000_000can traditionally be optimized by the compiler because the divisor is a compile-time constant. In this case, the division can be replaced by a (widening) multiplication followed by a right shift (https://en.wikipedia.org/wiki/Division_algorithm#Division_by_a_constant), which is much less code and more efficient than the traditional long division algorithm.The proposed subtraction loop is sub-optimal, and even though this CPU lacks 128-bit multiplication (perhaps the reason why LLVM didn’t try to optimize this division), the following “magic” expression generates less code (whether optimizing for speed or for size): https://godbolt.org/z/qY9v7qh3a
The early return in
print_full_process(), whereif trueleads to dead code elimination. However, the existingCONFIG.debug_panicsis aconstexpression, so setting that tofalseshould likewise cause the whole function to be treated as dead code. And being able to eliminate dead code was part of the discussion when this config was added (https://github.com/tock/tock/pull/1443).Here I wonder if the authors didn’t try to change this config (arguably, it’s not trivial to find how to change Tock’s config among all the Makefiles), or if the compiler failed to optimize it away (more concerning).