I’ve no experience in this domain. Is using compiler intrinsics that bad?
It would be interesting to see how this compares with glm in terms of generated code and breadth of API.
I was wonder how they would do swizzling, even in C++14 (17?) it isn’t a solved problem, but it is pretty much as bad as I feared:
#define wzyx shuffle4_rw4<_MM_SHUFFLE(0,1,2,3)>()
#define zwyx shuffle4_rw4<_MM_SHUFFLE(0,1,3,2)>()
It’s just a pity the language doesn’t support swizzling natively. But I guess unless you are trying to copy the GLSL functionality swizzling isn’t that useful.
Do we need SIMD vector libraries any more? Can’t we just use float or glm and turn on automatic vectorisation?
FWIW clang adds swizzling to C and C++ as a language extension and it’s quite pleasant.
Wow, did not know that!
Unfortunately it looks more like the expanded part of the macro, rather than the wzyx part :(
// Reverse 4-element vector V1.
__builtin_shufflevector(V1, V1, 3, 2, 1, 0)
I’ve heard more than once that autovectorisers do more harm than good.
I’ve not run into it myself, but GCC occasionally surprises me with how bad it is at optimising code that should be no problem, so I can believe it. My favourite recent example is moving decompressor state from a struct to loose variables in an LZ4-alike making it run 2-3x faster. :|