I’m curious, is the SIMD code in Pillow-SIMD all written with x86-specific SIMD intrinsics? It sounds like it presumably is, but the article didn’t explicitly mention so.
I’m kind of wondering a) could the same be done on phones, which means ARM rather than x86, and b) would phones rather do the convolution on their GPUs instead anyway?
Actually, come to think of it, c) if you shove even a modest GPU into a server and did the convolution with OpenCL, could that be faster again?
I didn’t look very hard but the article talks about AVX2, the commit log talks about SSE4.1/AVX, and this file is all intrinsics.