1. 23

  2. 3

    The fundamental problem is that current neural networks lack a way to express uncertainty about their answers, which would help refining the output, and there’s current research going on in this regard. However, there’s no denying that we’ll be there in 10 years given how primitive chatbots used to be 10 years ago.

    1. 2

      TL;DR: Turing tests can still illuminate AI capabilities but they are becoming obsolete and should be phased out.

      This is a fascinating article and shows well many of the strengths and weaknesses of GPT-3. It also shows many of the problems with Turing tests. I feel like we are moving away from the Turing test as we progress in AI research. Already there are many subjects where a computer would fail the test because it is too competent. The trivia questions are the best example. I am a human, and I don’t know any of the answers to any of those questions. Even after seeing them less than a minute ago, I did not bother remembering them as they were not useful information. It seems likely that as AI research progresses this will extend to more and more areas of cognition. Rather than failing because of inferiority to humans, they will fail because of superiority.

      You might say we could make an AI that is able to learn what kinds of things a human might know, and even respond with answers like “I am from Europe, we don’t memorize the US presidents in school”. But why would we want that? Not only would that involve spending significant resources making an AI that deliberately underperforms (we could even build a nuclear reactor controller program that has pride, personal problems and sometimes comes to work hung over :-p), we are also prioritising deceiving humans over helping them. If machines are failing the test because they do their job well, then the test is broken. Anyway, making a neural network that is indistinguishable from a human is trivial, people have been doing it for thousands of years, often by accident. (Obligatory xkcd). The idea that humans are the pinnacle of cognitive ability seems to undermine itself by existing.

      Suggestions for follow up articles: Evaluates the system’s ability to document code, summarise news articles, regularly tweet about some process it is monitoring, write short stories.

      1. 3

        It seems weird to criticize turing tests just based on the hypothesis that if machines do their job too well, then they could fail the test. Except they clearly don’t. This simple test just illuminates how far we are. The day we’ll see a machine fail a turing test because it was too clever, then this argument could stand a chance.

        1. 2

          Yeah, the whole concept of the article is very pop-science clickbait. Robots already do the “conversational” work of telephone customer service (for example), and it’s moot whether anybody is fooled, since the whole thing is so scripted to begin with. People just accommodate. Especially at the end, where the author says ELIZA was the prior “state of the art”. Such bullshit! Like Wolfram Alpha didn’t exist? Or that IBM thing, Watson? Or even Siri? We just don’t bother to pretend that they aren’t machines. That’s the major difference.

        2. 2

          The author provided GPT-3 with training data where all the questions have definite answers, and the answer is always “X is Y”.

          Unless I missed something, there was no training data where the answer is, “That makes no sense.”, or “I don’t know.” or “Actually, the U.S. did not exist as such in 1700, so there was no president.”

          Is it any wonder that GPT-3 followed suit and answered all the questions the same way, with the best approximation?

          I don’t think I would expect any more from a human either, if the human’s knowledge base was somehow made a clean slate, e.g. a child human.

          If you were training a child in this manner, you’d probably get similar results.

          Also, there was no opportunity for re-training. When you’re teaching a child, they’re bound to get some answers wrong, and then you would correct them, and they would also learn from that.

          No such opportunity was provided here, though I don’t know if that is technically possible with GPT-3.

          1. 3

            The author provided GPT-3 with training data where all the questions have definite answers, and the answer is always “X is Y”.

            The training data for GPT algorithms is just massive amounts of English language text from the Internet, isn’t it? I’m not sure how that’s consistent with “questions that have definite answers” - most of the training data text wouldn’t be in the form of any kind of question because most English language sentences are not questions.

            1. 2

              Training data is the wrong term - this is better termed “prompt data”, which is used to “set the stage” for GPT predictions.

            2. 2

              I’m unsure if GPT-3 can respond like that, although that would be an interesting thing to add to this. Another option would be to create some sort of autoencoder framework that lets the network determine when it is responding to something it’s never really seen before. Uber has a very interesting write-up about doing that.

            3. 2

              Q: How many eyes does the sun have? A: The sun has one eye.

              How poetic. I love it!

              There’s a fun book “The Most Human Human” by Brian Christian, where the author tries to ace a Turing test. Amusingly, one of the contestants was deemed a computer because she knew so much Shakespeare trivia.