The speed the author is able to achieve is a direct result of the efficiency of the underlying random number generator algorithms. But, the author only references the underlying algorithms once, by saying this:

I won’t bother you again other than stating that Neanderthal uses Philox and/or ARS5 RNG which is much, much, better than the stuff you get from the built-in rand.

From my perspective, the theory and implementation of the particular PRNG algorithm deserves most of the credit here.

The algorithms the author relies upon have hardware implementations that can generate gigabytes/sec of random numbers.

The speed the author is able to achieve is a direct result of the efficiency of the underlying random number generator algorithms. But, the author only references the underlying algorithms once, by saying this:

From my perspective, the theory and implementation of the particular PRNG algorithm deserves most of the credit here.

The algorithms the author relies upon have hardware implementations that can generate gigabytes/sec of random numbers.

Here are two quick links:

Overview of CBRNG on Wikipedia, which includes ARS5 RNG and Philox: https://en.wikipedia.org/wiki/Counter-based_random_number_generator_(CBRNG)

Benchmarks of the CBRNG in MKL: https://software.intel.com/en-us/articles/new-counter-based-random-number-generators-in-intel-math-kernel-library

Furthermore, if the author used a cryptographically-secure random number generator, wouldn’t this whole thing be much slower?

This reminds me of an old paper on using GPUs to generate white noise.