I just played with the Ocaml code a bit and got the run time halved. The performance hit comes from using a regex to split the string rather than just splitting it. The Python version uses just string splitting.
I generated a CSV with:
awk 'BEGIN { while (count++<30000000) print rand()","(rand()*10)","(rand()*100) }' > /tmp/data.csv
I moved the Ocaml code over to Core in order to have access to String.split, which just splits on a character instead of a regexp, which is pretty expensive.
Looks like the performance closeness was due to apples and oranges. Changing the Ocaml code to a split function puts the Ocaml a more expected bit faster than the Python (on my system at least).
By skimming through the python code I don’t know what this code is expected to do. I only guess it is not very pythonic (for example it doesn’t use the built in csv module and does many things strangely)
I just played with the Ocaml code a bit and got the run time halved. The performance hit comes from using a regex to split the string rather than just splitting it. The Python version uses just string splitting.
I generated a CSV with:
I moved the Ocaml code over to Core in order to have access to
String.split
, which just splits on a character instead of a regexp, which is pretty expensive.The new code is here: https://gist.github.com/orbitz/05afcda28a33f784d2fe
Original:
Modified:
And as comparison, the Python version on my system with my file:
Well done. That’s more like what I would expect. Thanks!
I’m surprised Python is so fast and OCaml isn’t even faster than it is!
Looks like the performance closeness was due to apples and oranges. Changing the Ocaml code to a split function puts the Ocaml a more expected bit faster than the Python (on my system at least).
https://lobste.rs/s/uk2o1q/four_mls_and_a_python/comments/64raes#c_64raes
Hrm? The Ocaml one was 3 seconds faster.
By skimming through the python code I don’t know what this code is expected to do. I only guess it is not very pythonic (for example it doesn’t use the built in csv module and does many things strangely)