I for one would be fascinated by a more extensive treatment of the subject.
I’ll second that. I love reading about clever trickery to make code go fast, and I’ve just recently gotten interested in GPGPU stuff and this seems like both.
Same, I would love reading about this. I’ve just begun dabbling in GPGPU (and SIMD) and reading about the journey and though process of others is very enlightening.
If you end up writing this post, I’d like to request that you include some of your mistakes/dead paths as well. Reading about how others have failed helps me to learn.