Nice! I read about this trick of eliminating carries in Parhami’s ‘Computer Arithmetic’, but I didn’t think it would be actually useful in programming (I thought it was more for extremely niche circuits).
(At least, it looks like the same trick on first glance, I’ll have to have a better look later)
Some of these tricks are done automatically by compilers. I was curious when I read the SuperMalloc paper that the author spent some time discussing some Python code that he’s written to generate C that applied some of these tricks to a division on the fast path. It turns out, clang did the exact same transform if you just wrote the naive C code.
I think you mean the trick where the compiler replaces a division by a constant by a multiplication with a constant and a bit shift? I think this trick is a bit more common (I actually implemented it myself for a compiler) than the one described in the OP.
One problem is that there’s no way to describe a 256-bit addition in C, so support for this usually feels a bit clumsy, similar to the support for the popcnt instruction (i.e. the compiler needs to detect that you are trying to do the same thing as the instruction does, and it’s debatable how much the compiler is allowed to optimize).
Multiple-word arithmetic is often easier to write in assembly, because C has no support to detect overflow. In assembly, there is usually some equivalent of
add a0, b0
adc a1, b1
In C, you would need to do something clumsy like
a0_new = a0 + b0;
carry = has_overflown(a0, b0, a0_new);
a1 = a1 + b1 + carry;