…this is a sort of weird quest. What’s wrong with memset? The conclusion implies that there’s something wrong/incomplete with a language that can’t implement something as fast as memset without compiler help, but C compilers commonly recognize memset and optimize it specially anyway, last I checked. Or the libc implementation involves hand-optimized C or assembly, which amounts to almost the same thing. Am I missing something about C++ and std::fill() in particular?
I didn’t quite follow your point.
C (and C++) compilers both recognize something that looks like memset (e.g., a loop setting bytes in a linear pattern) and replace it with a call to memset, and they they also sometimes replace explicit calls to memset with something else (like inlined loads and stores).
In the case of std::fill, there is an additional layer of optimization: at the source (standard library) level, which tries to replace cases which are known-safe for memset with memset, explicitly. In the case described in the post, all of these fail: the stdlib optimization fails as described, and gcc doesn’t do the “replace byte store loop” with memset, at least at O2, so the performance is seriously worse.
If by “what’s wrong with memset”, you mean why you can’t use it freely in C++, there are several reasons.