1. 14

I’ve noticed that dynamically typed programming languages, such as Python and R, are used much more often for statistical & machine learning tasks than languages with more stringent type systems. Statically typed languages tend to have a scarcer supply of feature-complete libraries for this class of application.

As I understand it (which is to say, not very well), dynamically typed languages often have robust facilities for procedurally generating, updating, and visualizing models—features that aren’t nearly as mature or extensible in statically typed implementations. All the same, many statically typed languages allow the programmer to directly represent properties of, and relationships between, objects within the program, instead of just leaving these implicit. One might think that this could act as a strength when it comes to domains such as machine learning… apparently not, or maybe not enough.

Why is this? What is the correlation between “loose” typing and ease of implementation for machine learning data structures & algorithms? I would greatly appreciate it if someone with experience in AI/ML—potentially in a statically typed language!—could give their two cents. But of course, anyone is welcome to take a stab.

  1.  

  2. 14

    There are kind of two things here. The high-level APIs for machine learning are typically in Python or R. The actual implementations of the data structures and algorithms mostly are in C, C++, or Fortran, though, with Python/R bindings. The usual reasons given for this split approach are that pure Python/R libraries are too slow, but using C/C++/Fortran libraries directly is too tedious and low-level.

    1. 4

      Judging from what I’ve seen in other domains, that depends. There’s no doubt that in a proper ML library, most of the fundamental parts are heavily optimized by, e.g., implementing them in a lower-level language and using FFI bindings. However, anything built on top of them is not. And that can include additional performance-sensitive components, written outside the performance haven of C/C++/Fortran. If these are useful enough for other devs to make use of them, inefficiency can spread through the ecosystem.

      No matter how much performance boilerplate you handle in a library, developers will always have their own data structures in mind. But perhaps in ML, people use mostly pre-made components, to a degree such that most of the performance-sensitive logic really does happen in more optimized code. Do you find this to be the case?

      1. 3

        people use mostly pre-made components

        Varies by area, but I think this is true in many of them. This is probably best established in the R world, since it’s older (going back decades to its predecessor, Bell Labs S). Far more people use the system as data analysts than implement new statistical methods. In fact many S/R users traditionally didn’t see themselves as programming, but as using an interactive console, which is why it has features like saving/reloading the current workspace supporting a purely interactive usage without ever explicitly writing a program/script in a text editor. (It’s gotten a little more common to write explicit scripts lately though, with the reproducible-science push.)

        Python/ML people are more likely to see Python as a programming language rather than just a data-analysis console, but if you look at what most people use it for, it’s the same kind of workflow of loading and cleaning data, choosing a model, setting model parameters, exporting or plotting results. The heavy lifting is done by something like TensorFlow which your average ML practitioner is not going to be contributing code to (even most ML researchers won’t be).

    2. 19

      Seems like it’s mostly just “NumPy happened”. And people started building things on top of NumPy, and then things on top of these things…

      Also, machine learning doesn’t need types as much as something like compilers, web frameworks or GUI apps. The only type that matters for ML is matrices of floats, they don’t really have complex objects that need properties and relationships expressed.

      1. 11

        Types can be used for more than just stating that layers are matrices - have a look at the Grenade Haskell library which lets you fully specify the shape of the network in types, and you get compile time guarantees that the layers fit together so you don’t get to the end of a few days of training to to find your network never made sense.

        1. 2

          I’ve always thought that Idris would be the best language ever for ML

          1. 1

            Sadly Idris is not great wrt. unboxed data types. Lots of the dependent stuff it implements involves lots of boxing and pointer chasing… not the greatest for high performance computing. That’s not inherent to dependent types, but it’s something language designers need to tackle in the future if they want to meet the needs of high performance computing.

            1. 2

              Ah well yes, I was thinking more about an API layer for stuff like Tensorflow or Torch, where the Idris type system validates a DAG of operations at compile time and then it’s all translated with the bindings.

            2. 1

              The exascale, project languages like Chapel were my guess since they (a) make parallelism way easier for many hardware targets and (b) were advertised to researchers in HPC labs. Didn’t happen. Still potential, there, as multicore gets more heterogenous.

          2. 3

            The only type that matters for ML is matrices of floats, they don’t really have complex objects that need properties and relationships expressed.

            Is this fact inherent to the study of AI & ML, or is it just how we’ve decided to model things?

            1. 3

              I guess it’s inherent to the modern hardware. The reason for this deep learning hype explosion is that processors (fully programmable GPUs, SIMD extensions in CPUs, now also more specialized hardware) have gotten very good at doing lots of matrix math in parallel, and someone rediscovered old neural network papers and realized that with these processors, we can make the networks bigger and feed them “big data” and the result is pretty good classifiers

              1. 1

                On top of cheaper. You can get what used to be an SGI Origin or Onyx2 worth of CPU’s and RAM for new car prices instead of new, mini-mansion prices. Moores law with commodity clusters lowered barrier to entry a lot.

              2. 2

                It is inherent to problems that can be represented in linear algebra. But many problems have different representations, like decision tres for example. Regression and neural networks can be written as matrix operations mostly.

                1. 1

                  I concede that matrices are the most fundamental and optimizable representation for ML. They are literal grids of values, after all; you can’t get much denser than that! However, is it still possible that they do not always lend themselves to higher-level modeling?

                  For instance, any useful general-purpose computation boils down to some equivalent of machine code—or a Turing machine, for the theoretically-minded. Despite this, we purposefully code in languages that abstract away from this fundamental, optimizable representation. We make some sacrifices in efficiency* in order to enables us to more effectively perform higher-order reasoning. Could (or should) the same be done for ML?

                  (*Note: sometimes, by letting in abstractions, we actually find new optimizations we hadn’t thought of before, as they require a higher-level environment to conceive of and implement conveniently & reasonably. See parallelism-by-default and lazy streams, as in Haskell. Parsing is yet another example of something that used to be done on a low-level, but that is now done more efficiently & productively due to the advent of higher-level tools.)

                  1. 2

                    ML is not limited to neural networks. Other ML models use different representations.

                    Matrices are an abstraction as well. It doesn’t really say that they are represented as dense arrays. In fact many libraries can use sparse arrays as needed. And performance comes not only from denser representation but from other effects like locality and less overhead compared to all the boxing/unboxing of higher level type systems or method dispatching from most OOP languages.

                    There is more abstraction at various levels. Some libraries allow the user to specify a neural network in terms of layers. Also matrices are algebraically manipulated as symbolic variables and that makes formulas look simpler.

                    I guess a few libraries support some kind of dataflow-ish programming by connecting boxes in a graph and having variables propagate as in a circuit. That is very close to the algebraic representation if you think of the formulas as abstract syntax trees for example.

                    Maybe more abstraction could be useful in defining not only the models but all the data ingestion, training policies, and production/operations as well.

            2. 14

              As someone who does ML in C++ and built a production, petabyte level, computer vision system, and interviewed many ML people and the reason is pretty obvious. ML people can’t code. seriously , many are math people, who are academic, and have very little experience building production systems. It’s not their fault, it’s not what they trained for or interested in. These high level apis exist to address their needs.

              1. 5

                I want to emphasize one thing that might make my previous comment more clear. The hard part of ML isn’t programming.

                The hard part of ML is data collection, feature selection, and algorithm construction.

                The only part where programming matters is building the training software and execution software. However most ML people care about the former, not the latter.

                1. 2

                  ML has certainly been growing fast. I see this as mirroring what’s happening in CS in general nowadays, with ML simply being the foremost frontier, and a hype word to boot.

                  However, I would temper your statement with an aspect of what u/zxtx said. Even those who are capable of building a project in a low-level language, won’t always want to. It’s nice to be able to dodge the boilerplate while hacking on something new. And that goes for those who understand the low-level stuff, too. So I’m not too surprised that people aren’t using low-level languages for everyday ML development. (Libraries are another story, of course.)

                  Perhaps you know more about how to write ML projects in C++ without getting mired in boilerplate, though. Was this ever a problem for you? Or does low-level boilerplate generally not get in your way?

                  1. 1

                    Great question. I am not mired in boilerplate because C++ is a high level language. I use it because the transition from prototype to production is very smooth and natural.

                    I think ultimately it’s not fashionable to learn. I’m finding most younger programmers simply don’t have proficiency in it. Meaning they haven’t developed the muscle memory so it feels slower. The computer field is all about pop culture , which I believe is the actual answer to OP’s question now that I think about it. In other words, python and R are fashionable and that’s why they are being used.

                    1. 4

                      I think ultimately it’s not fashionable to learn

                      It’s not just fashion: it’s an incredibly complicated language. It’s so complicated that Edison Design Group does the C++ front-ends for most commercial suppliers just because they know they’ll screw it up. Some C++ alternatives are easier to learn or provide extra benefits for extra efforts.

                      On top of that, it had really slow compiles compared to almost any language I was using when considering C++. That breaks developer’s mental state of flow. To test the problem, I mocked up some of the same constructs in language designed for fast compiles with it speeding way up. It was clear C++ had fundamental, design weaknesses. Designers of D language confirmed my intuition with design choices that let it compile fast despite many features and being C-like in style.

                      1. 1

                        It’s true the compile times are slow , but it doesn’t kill flow because you don’t need to compile while you program, only when you want to run and test. I would argue any dev style where you quickly switch between running and coding slows you down and takes you out of flow anyway.

                        In regards to it being complicated, this is true. However c++17 is much more beginner friendly. Even though 1980s C++ was arguably harder to learn than today, millions learned it anyway because of fashion. Don’t underestimate the power of fashion.

                        And lastly , D has it’s own design flaws like introducing garbage collection. Why in a language that has RAII do you need or want garbage collection ? Nobody writing modern C++ worries about leaking memory.

                        1. 1

                          “Don’t underestimate the power of fashion.”

                          You just said it’s out of fashion. So, it needs to be easier to learn and more advantageous than languages in fashion. I’m not sure that’s the case. Hardest comparison being it vs Rust where I don’t know which will come out ahead for newcomers. I think reduced temporal errors is a big motivator to get through complexity.

                          “And lastly , D has it’s own design flaws like introducing garbage collection.”

                          You can use D without garbage collection. Article here. The Wirth languages all let you do that, too, with a keyword indicating the module was unsafe. So, they defaulted on safest option with developer turning it off when necessary. Ada made GC’s optional with defaulting on unsafe (memory fuzzy) since real-time w/ no dynamic allocation was most common usage. There are implementations of reference counting for it and a RAII-like thing called controlled types per some Ada folks on a forum.

                          So, even for C++ alternatives with garbage collection, those targeting the system space don’t mandate it. Feel free to turn it off using other methods like unsafe, memory pools, ref counting, and so on.

                          1. 2

                            Sorry, I had a very hard time groking your response. What I meant was that python and R are used for ML, not because of technical reasons, but because it’s fashionable. There is social capital behind those tools now. C++ was fashionable late 80s to late 90s in programming (not ML). Back then lisp and friends were popular for ML!

                            Do you mind clarifying your response about fashion ?

                            In regards to D, I still think garbage collection, even though it’s optional , is a design flaw. It was such a flaw that if you turned it off, you could not use the standard library, so they were forced to write a new one.

                            C++ is such a well designed language that you can do pretty much any kind of programming (generic, OOP, functional, structural, actor) with it and it’s still being updated and improved without compromising backwards compatibility. Bjarne is amazing. By this time, most language designers go off and create a new language, but not Bjarne. I would argue that’s why he is one of the greatest language designers ever. He was able to create a language that has never stopped improving.

                            Now WebAssembly is even getting we developers interested in C++ again!

                            1. 2

                              I was agreeing it went out of fashion. I dont know about young folks seeing as it happened during push by managers of Java and C#. They kept getting faster, too. Even stuff like Python replaced it for prototyping, sometimes production. Now, there’s C/C++ alternatives with compelling benefits with at least one massively popular. The young crowd is all over this stuff for jobs and/or fun depending on language.

                              So, I just dont see people going with it a lot in the future past the social inertia and optimized tooling that causes many to default on it. The language improvements recently have been great, though. I liked reading Bjarne’s papers, too, since the analyses and tradeoffs were really interesting. Hell, I even found a web, application framework using it.

                          2. 1

                            I would argue any dev style where you quickly switch between running and coding slows you down and takes you out of flow anyway.

                            I have to disagree. REPL-driven development has only grown more & more useful over time. Now, when I say this, you may think of languages with high overhead such as Python and Clojure, and cringe. But nowadays you can get this affordance in a language that also leaves room for efficient compilation, such as Haskell. And don’t forget that even Python has tools for compiling to efficient code.

                            If you feel like having your mind opened on this matter, Bret Victor has done some very interesting work liberating coding from the “staring at a wall of text” paradigm. I think we’re all bettered by this type of work, but perhaps there’s something to be said for keeping the old, mature standbys in close proximity.

                            1. 2

                              Sorry, just to clarify. I LOVE repl development! Bret Victors work is amazing. What I mean is anything that takes you out of the editor. For example , if you have to change windows to ccompile and run.

                              REPLs are completely part of the editor and live coding systems don’t take you out of the flow. But if you need to switch out of the editor and run by hand, then it takes you out of flow because it’s a context switch.

                              1. 2

                                100% agreed. Fast compilation times can’t fix a crappy development cycle.

                                1. 2

                                  I think the worst example of anti-flow programming is TDD. A REPL is infinitely better.

                                  1. 1

                                    Do the proponents of TDD knock REPLs? In my opinion, REPL-driven is just the next logical step in TDD’s progression.

                                    1. 2

                                      no, I’m knocking TDD ;-)

                        2. 3

                          How do you feel about platforms like https://onnx.ai/ which make it easy to write a model in a language like python, but have it deployed into a production system likely written in C++?

                          1. 2

                            I think they are great but don’t go far enough. because we are entering a new paradigm where we write programs that write programs. People need to go further and write a DSL and not just a wrapper in python. I think a visual language where you connect high level blocks into a computation graph would be wonderful. And then feed it data and have it learn the parameters.

                            1. 2

                              So intuitively a DSL is the correct approach, but as can be seen with systems like Tensorflow it leads to these impedance mismatches with the host language. This mismatch slows people down and ultimately leads them systems that just try to extend the host language like pytorch.

                              1. 1

                                I guess when I think of DSL’s, I don’t think of host languages. I’m thinking more about languages that exist by themselves specific to a domain. In other words, there isn’t a host language like in Tensoflow.

                                1. 1

                                  Well for Tensorflow, I mean something like python as the host language.

                                  1. 1

                                    wow, rereading my last sentence I can see how it had the opposite meaning than I intended. I meant I was thinking of DSLs without a host language , unlike tensorflow.

                      2. 2

                        Would you mind shedding some light on what this petabyte level computer vision system is? I’m very curious!

                        1. 3

                          It was a project within HERE maps to identify road features and signs to help automate map creation. Last time I was there it processed dozens of petabytes of LiDAR and imagery data from over 30 countries . It’s been a couple years so can’t tell you where it’s at today.

                      3. 6

                        There are two different worlds here: prototyping and production. In production machine learning systems today you can easily find C++, Java and even Scala, Haskell and Ocaml, but not for prototyping: C++ and Java are too low level and lack REPL, and all of these languages lack even decent libs for visualization.

                        For prototyping, you need REPL, dynamic code loading and similar things. Only advanced static languages like Haskell and Scala has these features (ghci still has lots of problems), and maybe if such languages were relatively mainstream at the time when Matlab and R appeared, maybe Matlab and R wouldn’t have their own languages but used some existing static language (there was no even decent popular dynamic languages, only maybe Perl). Dynamic language is more obvious choice here (at least for writing their own language, as authors of Matlabs had no resources for PLT research): dynamic language is much simpler to implement.

                        Then, for those who can’t stand horrible languages of Matlab and R, the same tools were developed, but now for existing language Python. Numpy is still very matlab-y and you can’t even just easily do map over data points for feature engineering.

                        Now, we have Python and it’s even usable in production, so why we might need types? Maybe for building large systems at high level, types would be useful, but for algorithms — not much. You can’t even encode sizes of matrix in most practical languages, maybe except that with some shady type hackery which is not very promising.

                        1. 5

                          The main reason is machine learning until very recently had mostly existed in a research context. In that context development speed is everything. What matters is how quickly can you design and run experiments. With a good ffi to C, both R and Python hit a good enough place for that. Historically lots of the numpy/scipy effort was to have an open-source Matlab, another dynamic language. Most of R’s growth is also fairly recent

                          Although now it can seem all the action now is in Python, many machine learning tools were written for Java (Weka, Mallet, etc). I’d love to see a data science ecosystem in Haskell or Ocaml or Rust.

                          1. 4

                            Funny you should mention Haskell, as that was exactly what I had in mind! I personally keep an eye on this doc to track whether Haskell’s ML picture has improved. It’s updated regularly, though some progress might not make it into the doc.

                            1. 5

                              Well there is also the https://www.datahaskell.org/ initiative

                          2. 4

                            Swift for TensorFlow is one attempt to close the prototype/production gap. It’s a modified version of the Swift compiler. IMO, Swift is about as easy to write as Python, but catches shape mismatches and other things at compile-time, so maybe it’ll catch on.

                            1. 2

                              shape mismatch is caught on first run in most frameworks – that certainly isn’t much of a value add.

                            2. 4
                              Answer wiki

                              (In order of how underrated I think they are)

                              1. 6

                                I’ve noticed that dynamically typed programming languages, such as Python and R, are used much more often for statistical

                                Sampling bias. Almost everyone I know uses Fortran or q/kdb+. If you look at blogs and job postings, you’re going to see a different distribution than if you try e.g. working with ocean data.

                                dynamically typed languages often have robust facilities for procedurally generating, updating… models … What is the correlation between “loose” typing and ease of implementation for machine learning data structures & algorithms?

                                This just … isn’t true. Almost every implementation “in Python” is using tensorflow or pandas or numpy wrappers which are basically C/C++. Nobody implements that crap in Python because Python is a terrible language. That is not going to change: You are unlikely to ever see machine learning (beyond a toy) done in Python. Python isn’t going to become a better language because being a good language isn’t one of their goals.

                                Why is this?

                                Notwithstanding the sampling bias, and at the risk of this turning into worse is better: when you see a Python tutorial for machine learning, you’re not seeing machine learning, but everything else. Loading your data from disk, comparing it to test values, writing images, plotting things (visualising).

                                These are necessary things, and they’re things that your machine learning expert isn’t able to do (or isn’t able to do quickly; or they think it’s beneath them).

                                Our expert seeks accessibility for these things, not performance or goodness (the propensity towards correct and useful code quickly), and comparing a jupyter notebook versus a Java or Haskell environment, and it’s instantly game-over: Loading files, tweaking knobs, processes, visualisations: They’re all here and accessible.

                                But this only applies within our data scientist/machine learning expert-space. Now that we’ve got data scientists with a few years experience, they’re becoming the defacto/standard toolchain for these jobs. The Fortran programmers and the q/kdb+ programmers don’t call themselves “data scientists” or “machine learning” experts, even though they can do the same maths, do modelling and so on.

                                1. 3

                                  There is a lot to say about this, but I believe industrial grade production systems run in typed languages, even though the model development can be done in Python. For example you can use Python to explore in tensor flow, save the model and use it in a typed application.

                                  1. 2

                                    Ease of learning and ease of experimentation. I would argue that most machine learning researchers are mathematicians first and programmers second (no judgement implied there). Secondly, frequently dynamically typed languages typically have a good REPL. That doesn’t mean you can’t have a good REPL with a statically typed language but it’s not the culture.

                                    1. 2

                                      Subjectively, Python is more fun to write.

                                      It’s really nice to write something simple and have it do something very complex correctly. Since the libraries are in C anyway, the performance penalty isn’t even that bad.

                                      1. 2

                                        Most ML projects are small (in terms of source code size) and exploratory. Types don’t usually provide much benefit on small projects, and can actually get in the way on exploratory projects.