I only read the abstract and introduction, but this looks really interesting.
I don’t know a tonne about the semantics of early APLs (they mention APL\360). The basic strokes are, of course, still the same; but from what I hear there were some fundamental simplifications and improvements over the course of the 70s and 80s.
In the abstract, they mention that one component (the ‘D-machine’) prepares instructions for another (the ‘E-machine’) to execute. This looks remarkably similar to a modern architecture, with a CPU and GPU! GPU acceleration hasn’t been used a ton for APL, but it’s an area of active interest and research. The most interesting development is probably co-dfns, a bootstrapped APL compiler which itself runs on the GPU.
So I don’t know anything about CPU/GPU joint architectures, though I seem to recall the Dyalog folks got something working that way with their APL implementation.
I read the paper in question over the weekend. I wouldn’t say I grok it yet, but it’s started percolating. The D-machine is almost like an optimization stage in a modern JIT system. It’s pretty interesting. The way that arrays are described in memory is very similar to how ngn/apl represents its arrays (I think that this paper might have been the source of inspiration for that implementation, but I don’t know).
I do think that the array implementation described in the paper would made vectorization for SIMD instructions more difficult and would introduce cache misses, but I’m at the limit of my understanding here so I don’t know if that’s true.
Credit to @marmoset for showing me this.
Thanks for posting this! Looking through it again really makes me want to try coding a toy version of the D and E machines - chapter IV seems very detailed with lots of examples (and pseudo-APL for the APL machine in an appendix - what a lark!). Also I just spotted that the D and E machines are described in sections D and E of chapter IV which is a nice touch :p