1. 26

  2. 16

    Wait, you mean the human brain isn’t just doing massive vector multiplications and calling it “learning”?

    I think modern machine learning is simply leveraging how cheap computation has got. The underlying techniques aren’t actually new- it’s just become economical, and it’s gotten commoditized enough that you don’t need a PhD in statistical computing to actually build a useful machine learning system.

    1. 5

      Worth noting that Jeff Hawkins has been advocating this approach for years now. Numenta has some interesting papers published as well https://www.biorxiv.org/content/biorxiv/early/2017/09/28/162263.full.pdf

      1. 4

        Thanks! Haven’t seen this one.

        This particular bit stood out to me.

        “Grid cells in the entorhinal cortex (Hafting et al., 2005; Moser et al., 2008) encode the location of an animal’s body relative to an external environment. A sensory cortical column needs to encode the location of a part of the animal’s body (a sensory patch) relative to an external object.”

      2. 4

        Did they really have to write down all the “uh”s ? It makes the text unreadable.

        1. 1

          Thank you for the criticism. It appears my brain works like this.

        2. 3

          I noticed this philosophical shift, somewhat ironically, after reading Norvig’s old book “Paradigms of Artificial Intelligence Programming.” I read “PAIP” to learn Lisp, but it piqued my interest in AI, and afterwards I searched for more books on the topic, and that’s when I noticed the difference.

          Most techniques in PAIP are outdated by today’s standards, but what they all have in common is that they generally rely on clever algorithms or crudely simulating how a person might solve a problem. The modern books, on the other hand, are all about building up probability models and filling them with training data.

          I’m far from an expert, but it seems the best solution is probably somewhere in between.

          1. 2

            However, most researches in the know will tell you that Deep Learning is highly problematic because it requires a huge amount of data to train a good system. I have believed that because of how these systems are trained is so different from how the brain learns they simply cannot be evidence of a scientifically correct model.

            Why are they not scientifically correct? They may not be scientifically correct models of the brain, but they can be scientifically correct models of some phenomenon that you try to model. Also, what is an optimal way for tackling problems may differ between wetware and computers. From my perspective there are two types of computational models:

            • Computational models that attempt to do prediction as well as possible, without aiming to simulate the human brain. You try to get the best possible model and then try to interpret the model (what does it learn?). Such models are not arbitrary, but well-founded. For instance, one of the earliest motivations for RNNs was to capture longer-distance dependencies in natural language.

            • Computational models that attempt to simulate the human brain.

            Despite their name, for me most deep learning models are squarely in the first category.

            Also, as kghose pointed out, for many tasks the amount of data is not large. E.g. in many NLP tasks, supervised training sets are typically tens of thousands of sentences and models are often competitive with non-experts (parsing) and experts (part-of-speech tagging). The problem is more that the current models are not very robust at domain and genre shifts. In NLP, you don’t need to construct adversaries, the average Twitter feed or SMS corpus is adversarial enough ;).

            1. 1

              Not sure what to make of this: the blog post names (or claims to name) the author of a double blind paper.

              Also, I’m not onboard with this kvetching about the amount of data needed for deep learning - humans get a lot of exemplars when learning something, and we do over fit data - except we call it specialization

              1. 3

                Good catch about the paper. I updated the link to the correct one.

                In regards to needing lots of exemplars to learn something, let’s do a quick experiment. Image in your mind a red circle with vertical lines inside it and a white outline of a heart in the middle. Do you recognize this image as what I just described to you?

                It should be pretty damn fascinating that the example I gave you is the one you came up with in your head, and you only needed one. Hinton’s basic complaint about CNNs is that they use max-pooling to focus on important bits. I think its obvious we take parts and use evidence of parts to decide about the whole, which is Hinton’s intuition about this. In other words our mind constructs, in computer graphics terms, a scene graph from what the eyes see. Hinton’s work is about how to model this scene graph using unsupervised learning.

                1. 6

                  Do you recognize this image as what I just described to you?

                  No, because it says “NO HOTLINKING”.

                  1. 2

                    Haha, I guess I needed a different image. I just picked a random one from google images. How embarrassing…

                    I took the time to draw a crude version of the picture in gimp so you can see it.

                2. 2

                  What do you mean by “a lot of examplars”? Between human learning and state of the art machine learning I think we’re talking about very different definitions of “a lot.”

                  1. 1

                    My daughter, for example, will generalize birds to a certain extent, but she has to be told that penguins are birds, that geese are birds and so on (todo: I need to tell her that an ostrich is a bird). However, the set of birds itself is not that large in terms of things that are very different, so relative to the set of different things, she gets a lot of samples.

                    She’ll in general make mistakes where I’ll look at the picture and say, yeah, I see how you can think this is that, though it isn’t. However, she will learn very quickly from one instance of me telling her something. I know that is no longer true for me, when repetition is key.

                    That’s a general feature, younger brains are more “impressionable” or “plastic” in that few instances of feedback will imprint very quickly, whereas for older brains it takes more repetition.

                    1. 1

                      I was thinking about this myself, but I’m not sure how simple it is. As the parent of a toddler, it is incredible how quickly she generalizes concepts like “dog” from like a handful of weird, distorted cartoon versions in books and a handful of dogs in real life. But then again, her visual system is in learning mode every waking hour and is receiving a ton of data about the world that might help when it comes down to each individual class of object (“dog”). Also, when she sees a dog in real life, she sees it from a thousand subtly different angles and in a bunch of different poses in the course of a few seconds. So is that 1 data point or is it a 50,000 item training set?

                      Image processing is an interesting example in general because it’s something computers do so well now. But the existence of single-pixel attacks and the inscrutable nature of the model itself is certainly a disappointment. A system that was more capable of introspection would be easier to maintain, extend, and just teach us how to build systems in a less brute-force manner.

                      1. 2

                        That’s a good point about how you can’t really quantify how much training data the brain gets from the visual system. Language is much more easily quantified, and there has been a lot of work to show how much of the language faculty must be biologically determined - a big part of that is our ability to learn languages with very little training data. I guess it’s up to our intuitions to decide whether this applies to other areas of brain function; I suspect it does.