1. 9

Again, I wish for a D tag, but I included C and C++ because they’re closest to D’s syntax.

  1.  

  2. 2

    Python code in the motivation is horrible. No one would write it this way.

    Of course data science is a fuzzy topic without a clear definition, but from the article it seems OP is most concerned with some data-processing / plumbing code, so business logic applied to large chunks of data.

    I’d say if I advocate a programming environment as a good data-science platform, it should at least present some scientific libraries. So I’d wonder: Does D have machine learning libraries or bindings? What about statistic functions? Estimators / matrix operations.

    It would be fun to see a blog post reimplementing stuff from this blog post in Fortran 2008. People would be surprised.

    1. 1

      A lot of code related to managing data involves just massaging the data into the right shape (figuring out the right format) deciding what to do about warts in it (missing values, incorrect values), and so on. I’m sure I don’t have to convince you of this, but I do want to emphasise that this boring part is an important part that also deserves speed and attention.

      So I’d wonder: Does D have machine learning libraries or bindings?

      It’s a bit young, but it’s already quite useful:

      http://docs.mir.dlang.io/latest/index.html

      Not to mention speedy!

      http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

      What about statistic functions?

      Haven’t used it, but at a glance, this looks useful:

      https://github.com/DlangScience/dstats

      1. 1

        Thanks for the links. Glad some kind of eco system is in development.

        A lot of code related to managing data involves just massaging the data into the right shape (figuring out the right format) deciding what to do about warts in it (missing values, incorrect values), and so on. I’m sure I don’t have to convince you of this, but I do want to emphasise that this boring part is an important part that also deserves speed and attention.

        Okay, maybe this is a little out of square with the article, but nevertheless related. Languages like C++, Rust, and D are of course suitable for such data processing tasks. Even performant. Nevertheless, I have made the exprience, that by operating on the single-value level, it is also easy to get all tied up “in the inner loop”. On the other hand a “glue language” like Python (or probably also Octave/Matlab if used at the right level) makes it kind of easy to write code on a higher level which is very useful for at least 95% of data processing tasks. Logic implemented with numpy has a fair chance to be ported to PySpark/Spark or Dask easily. With lots of C++ code I have seen, its usually a rewrite from the ground up.

        So nothing in the blog post really makes me think “it would be so useful to use D for my work”. When I read blog posts on R, Julia, API/J, I usually get the “oh, that is neat” feeling.