1. 17
  1.  

  2. 4

    Impressive benchmark, thank you. I want to use Neanderthal at work, just didn’t have the opportunity yet.

    On a slightly unrelated note, I would be interested in implementing something like FAISS (https://engineering.fb.com/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) in Clojure, Neanderthal should definitely be appropriate for all the similarity calculations, correct?

    1. 2

      I don’t know. I don’t know enough of the method they use, to be able to say yes or no.

    2. 3

      I enjoy working in Clojure’s REPL, so I’ll use it for calling NumPy via libpython-clj. You may object again, since that might be a cause of NumPy’s troubles, but I claim it’s not, and I do it on purpose, to motivate you to fire up your favorite Python dev tool and try to prove me wrong.

      Have you checked if it makes a difference? It’s a pretty big confounding factor here.

      Put another way, if it turns out to add 500ms of latency and the “native numpy” is faster than your optimized clojure, does that change your argument at all?

      1. 2

        Yes, I have, and no, it doesn’t make a difference. But, as I said, if I’m wrong, you could easily debunk my argument by firing up your Python interpreter and doing this yourself.

        1. 3

          It’s less that “I’m debunking your argument” and more “I’ve seen way too many benchmarks completely ruined by problems like this, so it bothers me that you’d intentionally call out a problem and then not address it.” Like you’ve presumably used a pure python script to confirm it doesn’t make a difference, so why not include that too?

          1. 1

            Because even if I did do that, someone would demand that I prove that I did not run 30 YT videos and a 3D shooter in the background. Why would you have to trust me at all? The point of my article is that you should not trust any blanket advertising, and should check whether what someone claims, be it me, the CuPy project, or Google, really works as advertised when /you/ use it on /your/ hardware.

            But yeah, I know my tools, and I know that you’d get identical results from a pure python script. I’ll write a follow-up post with additional interesting results, and I’ll include a profiler report that clearly shows where the time is spent.

            1. 2

              That may have been your intention, but the article reads like the point is “look how great Clojure and Neanderthal are!” Which is totally fine! We’re all proud of the tools we make. But it also means you shouldn’t be idly brushing off any issues that make the comparison look better for you than they actually are. It shows bad faith. You say

              In case you have any comments and questions, please contact me. I would love to improve any part of this article, if possible!

              Doing your absolute best to eliminate any possible confounding factors, and showcase why there’s a difference between Numpy and Neanderthal in the end, would be a significant improvement to the article.

              (And if your goal was not “Neanderthal is great” but “don’t blindly trust benchmarks”, then making that explicit in the first paragraph would be a significant improvement, too. As it stands, opening with “Clojure kicks ass on CPU” makes it pretty explicit that this is a post about how great Clojure is!)

              1. 1

                But look, these NumPy and CuPy measurements are faster when run from Clojure REPL via libpython-clj, than when run from Python console or editor. If I included that info, would that make people leave these tangential arguments aside? Probably not. Someone would point out that of course that Python interpreter introduces lots of overhead, but that I need to go out of my way to circumvent that by using some special tool XYZ to mitigate that noise.

                It’s a lot simpler. NumPy and CuPy are very useful tools. They are definitely infinitely more performant than the Python code normal users would write themselves. But compared with better programming environments, they also leave some improvements on the table. Maybe NumPy users don’t care about that. It’s up to them. But there are many programmers who would love to be able to have similar functionality without Python.

                It is interesting to me to compare my favorite tools to these libraries since many (even non-python) programmers blindly think that everything you write with NumPy/CuPy/PyTorch/TensorFlow would be perfect. It is not my job to fix Python’s issues.

                And, I seriously mean “send your comments”. I will write a follow up article in which I will clarify this additionally. But I can’t make NumPy/CuPy better than they are…

      2. 3

        I’ve been itching to try closure again and this is very interesting. I need to put these books on my reading list.

        1. 4

          This benchmark is probably inaccurate; NumPy is plausibly both faster and slower than the article claims!

          The issue is that by default NumPy’s BLAS backend is multi-threaded (see https://pythonspeed.com/articles/parallelism-slower/). So wall clock time isn’t a good measure.

          I ran the same Python code, once with normal settings, which is multi-threaded BLAS:

          CPU times: user 3.84 s, sys: 517 ms, total: 4.36 s
          Wall time: 1.52 s
          

          And once with export OMP_NUM_THREADS=1 so that it uses single-threaded BLAS.

          CPU times: user 2.89 s, sys: 128 ms, total: 3.02 s
          Wall time: 3.05 s
          

          Notice that in single-threaded mode the wall clock time is double but total CPU time is much lower. So NumPy could potentially be both faster and slower than this article’s benchmark says.

          (There can also be benchmarking artifacts from hyperthreading, which I have turned off, and turboboost, which I haven’t turned off yet.)