1. 16

  2. 11

    I have used Rust for machine learning + natural language processing for the last two years or so, basically after the route C++ (5 years) -> Go with C (3 years) -> Rust (2 years) and the author is spot on.

    Rust has a lot of potential benefits:

    • Compiled Rust code is obviously much faster than pure CPython, but I have also found it to be 2-3x faster than similar Go code for e.g. data processing mostly due to static polymorphism and more inlining/optimizations. Usually, it is in the C or C++ ballpark.
    • Rust makes it much easier to refactor large code bases. I do this frequently – you just change the types and let the compiler guide you. In this respect, refactoring Rust is very similar to Haskell or OCaml.
    • Rust makes it much easier to deploy software, similarly to Go due to producing largely static binaries. E.g. for colleagues who use our software, create Docker containers with Nix, which basically just contain glibc, libstdc++, Tensorflow, the binary, and a model.
    • A lot of linear algebra can be expressed and type-checked nicely with parametric polymorphism + traits e.g. Array<f32, Ix2> or ArrayView<f64, Ix3>. You can’t go as far as Haskell, but typing makes a lot of Python + numpy runtime errors compile-time errors.

    However, as the author says, the ecosystem for machine learning and data science is still very limited. There are some great crates, such as ndarray and petgraph. There are some promising crates such as tensorflow (good enough for running graphs defined in Python) and tch-rs (which allows you to build neural nets in Rust, but it still in early development). But outside that, there are a lot of incomplete or discarded crates. E.g. for a long time the HDF5 crate couldn’t actually read or write data (I forget which one). In order to use Rust for NLP, we had to implement a lot of functionality ourselves, e.g. crates for training word embeddings, using word embeddings, vector quantization, etc.

    I am convinced that Rust will eventually get there for machine learning and data science. But if you jump in now, you have to accept that you have to write a lot of functionality yourself.

    Some projects that we did in Rust besides the aforementioned crates:

    1. 2

      This matches my experience with Rust so far - there’s a whole ecosystem of tasks that could be done in it, but are difficult or time-consuming due to the immaturity or non-existence of various crates. Hopefully this will improve with time and increased general usage.