The fault is with the standard library, not the language itself. But in this case there’s no way to portably implement this functionality, so effectively it is a language issue.
The workaround would be for someone to implement a library that provides this, which would be implemented with a bunch of OS-, compiler-, and CPU-specific code to insert the appropriate assembly instructions.
(The SIMDJSON library appears to do something similar: it has several different implemendations for some of its core API, which are selected at both compile- and runtime based on the CPU capabilities.)
I don’t use c++, so there may be some subtleties I am missing. But why is it necessary to reimplement everything from scratch? It seems simpler to make a custom class inheriting from the standard atomic one, and then override custom_atomic_class<int128_t>::cas.
Also:
no lock-free DWCAS on Linux.
It can be done with inline assembly. Or with __sync*, though there are some compromises there.
An excellent example of C/C++ not actually giving programmers “close-to-metal” or “full control” over CPUs.
The fault is with the standard library, not the language itself. But in this case there’s no way to portably implement this functionality, so effectively it is a language issue.
The workaround would be for someone to implement a library that provides this, which would be implemented with a bunch of OS-, compiler-, and CPU-specific code to insert the appropriate assembly instructions.
(The SIMDJSON library appears to do something similar: it has several different implemendations for some of its core API, which are selected at both compile- and runtime based on the CPU capabilities.)
I don’t use c++, so there may be some subtleties I am missing. But why is it necessary to reimplement everything from scratch? It seems simpler to make a custom class inheriting from the standard atomic one, and then override custom_atomic_class<int128_t>::cas.
Also:
It can be done with inline assembly. Or with __sync*, though there are some compromises there.