Thanks for this, Raph. GPGPU, to me, has seemed fairly difficult to get into (especially with the fragmentation).
I’ve been interested in the work done under the linebender umbrella for a while now, too. It’s nice to see some good lessons come out of some already-good development work.
I have very little knowledge about gpu programming and the various toolchains to do this kind of programming, but I’ve worked on implementing CUDA support in GNAT (the Ada compiler based on GCC) and in my experience the single-source/compute shader distinction made at the beginning of the video makes little sense.
What NVCC (Nvidia’s CUDA compiler) does when you compile your program is that it calls the system’s GCC to pre-process your files, then it’ll split your sources into multiple files files, separating what should get compiled for the host from what should get compiled for the device. After that, it’ll use CICC (nvidia’s llvm-based compiler that reads C and emits PTX assembly) to turn your shaders/kernels into GPU assembly code, then it’ll turn that assembly into a GPU object file, pack it along with its assembly into a fat binary which will be turned into a host object file and then compile your source file for the host, inserting a tiny bit of startup code in your binary to load the GPU code into the GPU and turning your kernel<<<128,1>>>(...) calls into regular function calls to dispatch the kernels.
NVCC’s separate compilation process (which allows linking kernels together) is slightly different but follows the same idea. Clang and GNAT do basically the same thing, and if my recollections are correct this is also what happens when you use OpenAcc in GCC.
So the single source approach is actually just macro processors and toolchain drivers to automate regular compute shader programming :).
P.S.: I’m a huge fan of your work, thank you for all the articles you wrote on your blog!
I’m not sure how much we disagree here. Sure, once compiled there are very similar things going on under the hood, but from the point of view of developer experience things are pretty different. The fact that shaders are written in the same language as host code is significant. The fact that they’re in the same source file, and the splitting and separate compilation is taken care of by the toolchain is significant. The fact that there’s a runtime that takes care of establishing the GPU connection is significant. The fact that it’s unified memory, so pointers can be shared between host and device code is significant.
One of the main practical consequences of all these distinctions is that you can pretty easily develop and debug your logic using host code, then move it to GPU execution just by changing a few attributes.
There are efforts to do single-source on top of vulkan (clvk is probably the most notable), but it’s not really working yet, and I think the jury is out on whether it’s better to try to replicate the single-source approach or try to do explicit compute shaders. Certainly for the goals of piet-gpu the choice is pretty clear.
In any case, thanks for the kind words, and if there are ways to improve my materials I’m open to suggestions!