Lately, I’ve been learning about gpgpu programming.
When I search for Erlang as an aid for this, it seems that while Erlang is well designed for the SMP model (as noted by the article), it works poorly for SPMD - and given the rise of gpu computing, it seems like when a supercomputer in a box arrives (if it hasn’t arrived already) it will arrive as an enhanced gpu co-processor that uses a SPMD model.
I’ve also been fitfully learn functional programming techniques and it seems like monad idea could be a great fit for gpu computing - especially, having actions generate a sequence of commands in a little language which would then be run on the gpu. I intend to slowly go through Bartosz Milewski’s Monads in C++  and possibly create some variant of boost-proto that serves my purposes.
Erlang is really not designed, or implemented, for problems that currently would run on a GPU, though. Erlang is about things that are bounded on I/O.
I also thing monads are probably not that meaningful for GPUs. Instead applicatives, or just array operations, probably make more sense.
GPUs are pretty much vector-operation-machines.
If I understand correctly, the challenge is that performing a bunch of vector-functions within the GPU is faster than moving data into and out-of the GPU. If you have a bunch of vector operations to perform on the data that’s too big to fit into the GPU all at once, you wind-up with a complex scheduling combinatorial problem (what to add or take out when). If you aren’t automatically going to perform the operations on all this data but rather will conditionally perform based on other values, the problem is again more complicated.
What I’d like to do is have a way to isolate the scheduling problem from the statement of each vector operation. I’m thinking of something akin to the way proto is described in Bartosz’s article; writing a function would allocate space for the variable one place and run the procedure in another place.
What I’d like to do is have a way to isolate the scheduling problem from the statement of each vector operation.
Check out Haxl, it is doing something similar (via Applicatives) where one can write sequential looking code and it goes and does some analysis and can batch data requests up to reduce round trips.
The Haskell community has put a lot of effort into doing some GPU stuff already which might be worth digging into depending on how serious you are.
Looks like it could be useful for this case,
I will look into it.
I’ve wondered about how applicable comonads would be, but I don’t understand them well enough to really answer. But I’m not sure how much types can address the hard part, which is really the code generation.