I was disappointed that they didn’t include PyPy and that the overhead of Numpy is a confounder. PyPy’s autovectorzing JIT is half a decade old and it would be nice to compare it to Numba. I tried to make an apples-to-apples version of the benchmark, but getting versions of everything to line up is not easy, and the author didn’t include their particular versions. I tested this Python script, which has two hooks; the second hook is the actual benchmark, and the first hook shuffles the input list to prevent JITs from inlining its contents. I incanted timeit like so:
$RUNTIME -mtimeit -s 'from test import a, b' 'a(); b()'
For both hooks, or just a() or b() for the first or second hook respectively. I did run each hook on its own and also both in sequence, to verify that running both hooks is not faster than running the first hook (i.e., that the shuffle is not being elided!) Here’s the time taken just to run the second hook:
PyPy 3.6.12: 132µs
PyPy 3.6.12 with --jit vec=1: 133µs
CPython 3.6.13: 5.1ms
CPython 3.8.8: 5.31ms
CPython 3.8.8 with Numba 0.52.0: 128ms
On one hand, it seems like PyPy is unable to autovectorize this benchmark. On the other hand, the existing removal of interpretative overhead seems to be enough to pull ahead when Numpy is not involved.
I was disappointed that they didn’t include PyPy and that the overhead of Numpy is a confounder. PyPy’s autovectorzing JIT is half a decade old and it would be nice to compare it to Numba. I tried to make an apples-to-apples version of the benchmark, but getting versions of everything to line up is not easy, and the author didn’t include their particular versions. I tested this Python script, which has two hooks; the second hook is the actual benchmark, and the first hook shuffles the input list to prevent JITs from inlining its contents. I incanted
timeit
like so:For both hooks, or just
a()
orb()
for the first or second hook respectively. I did run each hook on its own and also both in sequence, to verify that running both hooks is not faster than running the first hook (i.e., that the shuffle is not being elided!) Here’s the time taken just to run the second hook:--jit vec=1
: 133µsOn one hand, it seems like PyPy is unable to autovectorize this benchmark. On the other hand, the existing removal of interpretative overhead seems to be enough to pull ahead when Numpy is not involved.