1. 55
  1.  

  2. 3

    This is one of the reasons I’m a big fan of scoped denormals, you do something like this :

    defn no-denormals<?T> (body: () -> ?T, daz?:True|False) -> T :
      val mxcsr = get-mxcsr() ; retrieve mxcsr register (x86)
      set-denormals-are-zero(daz?) ; set DAZ/FTZ mode
      val ret = body() ; execute the function body
      set-mxcsr(mxcsr) ; reset MXCSR
      ret
    
    ; in context
    within no-denormals(true) :
      do-expensive-math()
    
    within no-denormals(false) :
      do-precise-math()
    

    Or use RAII like ScopedNoDenormals from JUCE. I think you could do the same thing via a custom with in Python, but it’s only as good as the code you’re calling (callees can overwrite MXCSR themselves) so YMMV. Automatically auditing code to make sure MXCSR is consistent is definitely an interesting approach to finding the problem.

    1. 1

      The danger here is that that mode switch can be annoyingly expensive (maybe less so on modern (<5-10 years old) cpus?)

      1. 4

        ~20–30 cycles

        On nvidia gpus, afaik, it is controlled by a prefix which can be applied to any fp instruction—no implicit state—so it is free.

    2. 2

      Ok, so I’m super surprised that clang is doing this as the -ffast-math docs (https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffast-math) do not list any changes to the way denormals are handle, nor in fact any change in runtime behavior - everything is compile time assumptions about fp math behaving like infinite precision math, etc.

      That the clang docs explicitly enumerate the additional fp flags fast-math implies, and does not include the separate -fdenormal-fp-math option makes it even weird.

      Finally I agree with the author that it is bananas that a shared library thinks that it is reasonable to change global cpu state.

      1. 7

        The documentation is probably accurate, but the setting of the CPU flags is coming from the system startup code, not the compiler directly.

        On my Debian Bullseye machine, when I build a trivial shared object using the command

        gcc -fpic -shared -o t.so t.c
        

        it links with /usr/lib/gcc/x86_64-linux-gnu/10/crtfastmath.o. This is the file with the set_fast_math function that does the CPU flag setting. If you build with clang it will link with the same file. This happens when -ffast-math is in effect.

        I tried with -rtlib=compiler-rt, -stdlib=libc and -fuse-ld=lld and all combinations link to that file when -ffast-math is turned on. I haven’t tinkered enough to know how to avoid linking with it, unless you write your own and insert it in the link at the correct place.

        Incidentally, here’s the offending code in GCC (actually, libgcc). It’s been around for a long time.

      2. 2

        Ah yes, -ffast-math strikes again.