I would like to grumble about making useless abstactions here. If performance is a concern, then math libraries are such a thing. As this article points out, you basically have to babysit the compiler to make sure it produces good code. Good luck and have fun – I feel like there are far too many opportunities for things to break without notice (compiler versions, flags, minor changes in code).
Using sse registers as a representation for 3d or 4d vectors is not the way to do things in 2016. You end up with a bunch of unnecesary shuffling and redundant computations because sometimes you need to operate on, say, just x and y.. and re-packing and then again unpacking your vectors is not going to help. The problem gets worse still when you try to scale up for 256 bit AVX (and beyond).
Then after the useless abstraction is done and used everywhere in performance critical code, people complain that moving to a SoA representation (which requires less shuffling, allows you to selectively operate on specific components of a vector while utilizing the full width of a simd register, all while easily scaling to wider simd extensions) is difficult. Yes, it is difficult, after you make it so :)
Yeah it’s not super great, but we have that other SIMD vector story up so I thought I’d post this to go with it.
You’re right that this is probably not going to give you the fastest code, but you can drop it in with almost no effort and see if it’s good enough.
The problems I can see are getting it to work at all like you mentioned, e.g. GCC has no __vectorcall (but IIRC it works out ok there anyway), 33% extra memory usage is non-trivial and I can imagine it hurting perormance in some cases, and having to swizzle back to three components when you want to upload to the GPU.
All this is good, but it’d be great to see better mention of testing. For basic stuff like this, you both can and should have complete test coverage.
That’ll also save your ass when refactoring for speed experiments.