Interesting because it’s examining not just program size or number of instructions, but the Critical Path Length (the longest path of instructions where each instruction uses the result of the previous one) and the Instruction Level Parallelism (how wide a CPU you need to get the fastest execution time – i.e. equal to the CPL – assuming perfect branch prediction).
As such this is an ideal measure, unrelated to current (or near future) real world CPUs.
They also repeat the analysis with a limited window of instructions (4, 16, 64, 200, 500, 1000 and 2000) effectively adding the effects of a limited size ROB (Re-Order Buffer) and limited decode&commit widths (for which they somewhat unrealistically only consider half the ROB size).
In all cases, the ISAs track each other closely. The largest difference is for CloverLeaf at a window size of 2000, where RISC-V has 12% less ILP available. The only case where RISC-V has more ILP at large window sizes is STREAM with a 5.8% advantage. In every case however, at lower window sizes (500 or less), RISC-V has more ILP available with AArch64 overtaking at higher window sizes.
An odd comment. On aarch64, a xor-selfie is architecturally not allowed to be dependency-breaking. (Not technically true, but close enough, and unlike x86 there’s no reason for the microarchitecture to treat it specially.)
Another odd comment. Because:
More generally, I found the paper somewhat shallow. I would have liked to see an exploration of where the differences come from. The applications they considered are also quite domain-specific.