I never really understood how you’re supposed to sanely make use of these CPU features in general and specifically from C - I mean, if you assume everyone’s a Gentoo user, sure, these flags will be optimally tuned specifically to the user’s machine. But normally you’d want to distribute general-purpose binaries and/or have a build process that’s as portable as possible and makes as few assumptions as possible.
The attribute was meant to chose the optimal codepath at runtime based on the end user’s machine (not the build machine), by compiling the same function with multiple optimization flags (in this case selecting different instruction sets).
So then regular users who don’t use Gentoo can benefit from using the newer instructions their CPUs provide without producing a binary that requires that particular CPU as a minimum.
However as the article points out that is not so simple because that only works reliably if you use gcc+glibc. Other toolchains (clang, musl-libc) have various issues.
Is there a reason you would implement modulo like this rather than using an actual relevant CPU instruction?
The entire article was talking about how different architectures and arch versions have different instruction support. The given definition of modulo is more mathematically stable, that’s why you want it.
Right, I was more wondering about how x86 DIV and IDIV produce a remainder, and as far as I know mod is the same as abs(remainder). Is that not true? Is there no equivalent for floats?
In number theory and in euclidean division, yes, but there are several definitions in common use. https://en.wikipedia.org/wiki/Modulo_operation#Variants_of_the_definition
The two main ones are truncated division (never found a case where I want this) which takes the sign of the remainder from the dividend (numerator) and floored division (which is the definition in the article and is almost always what I want) which takes the sign of the divisor (denominator). Most programming languages provide truncated division. C, in its wisdom, lets the implementation decide.
Edit: In Julia and some other languages that care about this sort of thing you can use a range to make it unambiguous: mod(-25, 1:10)