1. 12
  1. 2

    I just played with the Ocaml code a bit and got the run time halved. The performance hit comes from using a regex to split the string rather than just splitting it. The Python version uses just string splitting.

    I generated a CSV with:

    awk 'BEGIN { while (count++<30000000)  print rand()","(rand()*10)","(rand()*100) }' > /tmp/data.csv

    I moved the Ocaml code over to Core in order to have access to String.split, which just splits on a character instead of a regexp, which is pretty expensive.

    The new code is here: https://gist.github.com/orbitz/05afcda28a33f784d2fe


    $ ocamlfind ocamlopt -thread -package core,str -linkpkg -o sum-ocaml foo.ml
    $ time ./sum-ocaml data.csv 
    real    1m0.173s
    user    0m57.158s
    sys 0m2.403s


    $ time ./sum-ocaml data.csv 
    real    0m26.482s
    user    0m25.351s
    sys 0m0.912s

    And as comparison, the Python version on my system with my file:

    $ time python3 foo.py data.csv 
    real    1m58.709s
    user    1m57.410s
    sys 0m0.796s
    1. 2

      Well done. That’s more like what I would expect. Thanks!

    2. 1

      I’m surprised Python is so fast and OCaml isn’t even faster than it is!

      1. 2

        Looks like the performance closeness was due to apples and oranges. Changing the Ocaml code to a split function puts the Ocaml a more expected bit faster than the Python (on my system at least).


        1. 1

          Hrm? The Ocaml one was 3 seconds faster.

        2. 1

          By skimming through the python code I don’t know what this code is expected to do. I only guess it is not very pythonic (for example it doesn’t use the built in csv module and does many things strangely)