1. 3
  1.  

  2. 1

    I wonder how this is different from the existing NAS CG conjugate gradient benchmark. From what I can tell, it might be mostly packaging and license?

    Also, while the slides are pretty readable, I was a little curious what exact points they were trying to make on the slides with results. The first one is clear though- always entertaining to see the difference between peak flops and app performance. Less than one percent of peak!

    1. 2

      I wonder how this is different from the existing NAS CG conjugate gradient benchmark. From what I can tell, it might be mostly packaging and license?

      After reading your comment I got curious about this myself and did a little hunting. From the original paper on HPCG[0]:

      The NAS Parallel Benchmarks (NPB) [3] include a CG benchmark. It shares many attributes with what is propose here. Despite the wide use of this benchmark, it has the critical flaw that the matrix is chosen to have a random sparsity pattern with a uniform distribution of entries per row. This choice has led to the unfortunate result that a two-dimensional distribution of the matrix is optimal. Therefore, computation and communication patterns are non-physical. Furthermore, no preconditioning is present, so the important features of local sparse triangular solve is not represented and is not easily introduced, again because of the choice of a non-physical sparsity pattern. Although NPB CG has been extensively used for HPC analysis, it is not appropriate as a broad metric for our effort.

      Also, while the slides are pretty readable, I was a little curious what exact points they were trying to make on the slides with results.

      This is always the problem with a “bare” slide-deck. :P I think they’re mostly trying to show results on clusters that have well-known perf results in the HPC community, but it’s definitely hard to tell.

      Though I thought slide 34, which showed the substantial the improvement from tuning on the K computer, was also interesting. I’d be curious how much tuning is necessary to get good perf on HPCG vs HPL… that might have some bearing on its future adoption.

      [0] http://www.sandia.gov/~maherou/docs/HPCG-Benchmark.pdf

      1. 1

        Interesting, thanks for digging. That proposal you linked is shorter and more readable than I expected :)

        So it sounds like NAS CG is more synthetic and un-representative than I would’ve thought. That’s interesting.

        That reminds me that a totally impossible but really interesting study I always used to dream of doing is to go through the back literature and look for any results that depend on suspect benchmarks, and attempt to reproduce the results with real workloads. It’d be like that recent drug study that went back and reviewed old results: Economist: “Trouble at the lab”. Maybe I’m a pessimist, but I’d expect trouble at the computer lab too…

        I was always fond of the “Flash vs. Simulated Flash” paper for the same kind of reason. Wish there was a forum for more things like that.