1. 6

By William F. Godoy, Pedro Valero-Lara, Caira Anderson, Katrina W. Lee, Ana Gainaru, Rafael Ferreira da Silva, Jeffrey S. Vetter.

Abstract:

We evaluate Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy’s first exascale supercomputer. We evaluate the performance, scaling, and trade-offs of (i) the computational kernel on AMD’s MI250x GPUs, (ii) weak scaling up to 4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive analysis. Results suggest that although Julia generates a reasonable LLVM-IR, a nearly 50% performance difference exists vs. native AMD HIP stencil codes when running on the GPUs. As expected, we observed near-zero overhead when using MPI and parallel I/O bindings for system-wide installed implementations. Consequently, Julia emerges as a compelling high-performance and high-productivity workflow composition language, as measured on the fastest supercomputer in the world.

  1.  

    1. 2

      Nice - This (Julia, jupyter) looks like such a nicer environment than when I was looking at “HPC Workflows” ~15 years ago.

      Not having really watched the space since then, I’m curious what the process looks like in total, including submitting to job queues and moving data between systems - they show that you can use Julia to get good performance on GPU based systems, and then use Julia code to access the output via shared parallel IO, but where does the analysis run? It’d be really cool if you could coordinate batch submission, data movement, and large vis jobs (do any vis jobs still require large shared systems?)