1. 31
  1.  

  2. 6

    I hear there’s a lot of gold to be had in alchemy, though. It’s even better than a science if your goals line up with what it’s good at.

    1. 8

      Raising money from venture capitalists?

      1. 4

        Exactly! :)

    2. 1

      As I said elsewhere, to turn AI into a serious field we should start by fixing its parlance.

      How could non-engineers understand what we are really talking about, if we use terms like “learning rate” and “training” that are strongly bound to their own personal experience?

      We are fooling them (and often ourselves) with such an anthropomorphic language.

      1. 6

        I am not sure how this is related to the article. The article seems to criticize (though very vaguely):

        • That the large number of parameters and non-linearities make the models uninterpretable (black boxes). Techniques like ablation can make models more interpretable.
        • That the optimization problem is not convex/concave, with a lot of local optima, saddle points, etc. and that the field has not come up with a better optimization method than than variations of SGD.
        • That a subset of machine learning practitioners do not understand the underlying theory and as a result cargo-cult everything from non-linearities, learning rate schedules, to attention mechanisms, etc.

        It is also strange that the article suggests that machine learning is alchemy, since many machine learning algorithms are well-understood (e.g. logistic regression, linear SVMs, KNN classification).

        The issue that you raise, if we use terms like “learning rate” and “training” that are strongly bound to their own personal experience, is more related to explaining machine learning to people outside the field and newcomers. For anyone who has done an introductory course to machine learning, learning rate has a mathematical definition.

        Also, anthropomorphic language is used in a lot of technical fields. We are talking about mothers/daughters or parent/children when talking about trees, sink/swim for restoring the heap property in heaps, collisions in hash tables, semaphores in concurrency, etc. We do treat data structures and algorithms research as serious fields.

        1. 6

          I am not sure how this is related to the article.

          The point of my talk at the seminar was that the language we use is “Good for literature and thus Bad for Science” (slide 39). After the talk one of the listener told me: “Yeah, current AI hype make it seems like we are alchemists, playing with something we do not really understand and without caring much about ethics”.
          Note that it was before that Uber-driven and the Tesla-driven cars killed anyone.

          The language we use forges our understanding of the world: we think with that language, our brain establishes relations through it even when we do not consciously express them.

          Researchers that call a ANN as intelligent, fool themselves.

          Also, anthropomorphic language is used in a lot of technical fields. We are talking about mothers/daughters or parent/children when talking about trees, sink/swim for restoring the heap property in heaps, collisions in hash tables, semaphores in concurrency, etc. We do treat data structures and algorithms research as serious fields.

          The difference is that there is a clear structural relation between the parent/children metaphore in, say, processes and the experience people have about it, or about collisions, or about semaphores…

          On the contrary, there is no evident structural relation between the computation of an ANN and an intelligence. It’s just an approximation of a function. The fact that it can approximate functions that we do not know just means that we cannot trust its outputs.

          In other terms, if you cannot establish if a software is correct, it’s not even broken! It’s… a toy? :-D

          Selling such unexplainable software as “intelligent”, is one of the most stupid error we are doing this decade.

          1. 3

            The difference is that there is a clear structural relation between the parent/children metaphore in, say, processes and the experience people have about it, or about collisions, or about semaphores…

            On the contrary, there is no evident structural relation between the computation of an ANN and an intelligence. It’s just an approximation of a function. The fact that it can approximate functions that we do not know just means that we cannot trust its outputs.

            You start from a hyperbolic description of machine learning (intelligence) that most practitioners in the field would never use or agree with.

            I work in computational linguistics and deep learning has taken over our field in the last 5-10 years. However, neither I, nor one of my colleagues, would call their models ‘intelligent’. It is abundantly clear what happens in most networks - e.g. in classification the last layer is typically a softmax, which is more-or-less off-the-shelf logistic regression. The non-linear layers are used to transform the input space such that a problem that is not linearly separable becomes linearly separable. Nobody would dare to call that intelligence.

            Outside the hyperbole, a lot of anthropomorphic terms are quite intuitive. An attention layer divides the network’s attention over inputs. Adversarial training uses examples specifically crafted to fool the model. A forget gate determines how much information the model should retain from a previous time step.

            The outlandish hyperbolic claims, such as `ANN is intelligent’ come from a small subset of people who want VC money, more funding, or whatever. Or they are Google, Facebook, etc. and do it for publicity. Unfortunately, some well-known practitioners from the field have made such claims, but I suspect that in many cases there are ulterior motives. But you are making a caricature of the field in general, I would recommend you to attend some conferences on computational linguistics (ACL, EMNLP), information retrieval, machine vision, etc. And you will see that most practitioners actually live outside the hype bubble with its outlandish claims and would definitely not compare machine learning to human learning.

            1. 3

              I upvoted your answer because, while I don’t think you understood what I mean (my fault), I can understand your point of view.

              Still let me use your response to explain it in a better way and with some example.

              You start from a hyperbolic description of machine learning

              Learning in itself is a term strongly tight to people’ experience. People learn themselves.
              They tend to appreciate who is able learn and consider her smart.
              Anything that can learn (a cat, a dog…) is qualified as intelligent.

              But machines do not learn.
              You are just approximating a function by calibrating the weights in a long chain of logistic regressions (or whatever).

              You, just like most researchers and people, get the impression that the network is learning just because with a better calibration you get a better approximation of the target function. As I said at the University of Milan, this is pretty similar to what happened to people facing “L’Arrivée d’un train en gare de La Ciotat” for the first time.

              And the term “learning” is helping in this illusion. If you use “calibration” the illusion instantly disappears.

              An attention layer divides the network’s attention over inputs.

              This say nothing practical about what the “attention layer” do.

              If I say “the parent process has killed the child process”, anyone understand that the child process was somehow generated by the parent process and that the child process had something in common with the parent one. They also understand that the child cannot react anymore to input, that it is dead.

              Your phrase can at most describe the goal of the attention layer, but what it actually does to input?
              Also, you need to apply your insight of “intelligence” to get an insight of what “network’s attention” is.

              Adversarial training uses examples specifically crafted to fool the model.

              I’d say that examples designed (more properly, approximated) to maximize the error of the approximation computed by the ANN are used to minimize such error.

              Again to turn this into an insight along the line of “Adversarial training uses examples specifically crafted to fool the model” you need

              1. to apply your insight of an intelligence that can be fooled
              2. to apply your insight of adversary
              3. to apply your insight of training

              Neither 1, 2 or 3 provide useful info about the mechanics what is happening.

              A forget gate determines how much information the model should retain from a previous time step.

              Again, “forget” assume someone who remember.
              A set of weight in a graph is not a memory. It’s just a set of weight.
              Memory is your interpretation of such weights.

              I guess you could easily find more descriptive and less anthropomorphic names for all these things.

              But, they forged your mind so much that even if you know that there is no intelligence in a ANN, even if you know that it’s not intelligent at all, any external observer (including your peers) will listen your talk in terms that assume an intelligence. Those that listen you will get an insight of what you are talking about only if they assume an intelligence in the machine.

              Indeed at the very beginning, you speak about “Artificial Intelligence”. About “Machine Learning”.

              I agree: it’s a very hyperbolic description of the status of the field. ;-)