1. 14
  1.  

  2. 2

    Ok, now run on a gpu :)

    I’m not very familiar with Fortran (I’ve done some accursed Fortran business programming in an unspecified version (Fortran 70 or its successor)) but I assume modern Fortran code can also run on a gpu without a rewrite?

    1. 5
      1. 1

        And if it wasn’t, you can (usually) call C from Fortran[1]. Whether or not it would still be performant would need to be measured.

        [1] https://docs.oracle.com/cd/E19059-01/stud.8/817-5066/11_cfort.html

      2. 2

        Many, many years ago, I worked on a Fortran compiler that could extract loops and run them on GPUs. Making Fortran code run on GPUs was easy, working out when it was beneficial to shove some state across the bus, run a shader, and wait for the result to come back was the hard part. Since then, NVIDIA bought PGI, who made one of the most mature Fortran compilers. They initially tried to contribute it to LLVM, but the code was pre-C99 C and so it ended up being used as a reference for flang, rather than simply being flang. I believe NVIDIA did some work to target GPU clusters for Fortran with it, but I haven’t followed the progress for about 10 years.

        1. 4

          I would like to describe the problem “it is ambiguous whether or not it would be profitable to spend an entire microsecond/millisecond sending this data to a separate processor so it can be processed in parallel” as the fundamental problem with automatic parallelisation. A lot of efforts seem to have foundered upon it specifically.

        2. 2

          I think, though may be incorrect, that at least with CUDA that’s pretty old-hat in the HPC world.

        3. 2

          Would be interesting to see how this (idiomatic implementation of gpt) would look/work in Julia…