1. 5
  1. 2

    An excellent example of C/C++ not actually giving programmers “close-to-metal” or “full control” over CPUs.

    1. 2

      The fault is with the standard library, not the language itself. But in this case there’s no way to portably implement this functionality, so effectively it is a language issue.

      The workaround would be for someone to implement a library that provides this, which would be implemented with a bunch of OS-, compiler-, and CPU-specific code to insert the appropriate assembly instructions.

      (The SIMDJSON library appears to do something similar: it has several different implemendations for some of its core API, which are selected at both compile- and runtime based on the CPU capabilities.)

    2. 1

      I don’t use c++, so there may be some subtleties I am missing. But why is it necessary to reimplement everything from scratch? It seems simpler to make a custom class inheriting from the standard atomic one, and then override custom_atomic_class<int128_t>::cas.

      Also:

      no lock-free DWCAS on Linux.

      It can be done with inline assembly. Or with __sync*, though there are some compromises there.