1. 14
    1. 40

      No evidence is presented of ChatGPT passing the Turing test.

      1. 8

        I’d argue that ChatGPT sounds far more intelligent than my racist uncle when he goes on one of his rants. Similarly, I see some people accusing ChatGPT of “lying” or being “woke”, which tells me that some people are confused enough about it’s status as a computer that they think it also has political views and knows the difference between “lying” and “telling the truth”.

        I’ve heard arguments that ChatGPT doesn’t pass the Turing test, but they all involve ignoring a non-trivial number of humans or otherwise changing the definition of “intelligent”. It seems clear to me that the Turing test has been passed.

        1. 27

          That’s not the Turing test. The Turing test involves the specific setup of many iterations of an interviewer trying to determine which of two interlocutors trying to appear human is human and which computer. If the computer can convince the interviewer at least as much as the human does, in a statistically significant number of trials.

          1. 6

            I remember the BBC doing this with two chat bots and Craig Charles in the late ‘90s. It turned out that they’d set the bar for intelligent human very low and the chat bots came very close to passing.

        2. 5

          I don’t know what people mean when they say “ChatGPT is woke”. But how I always interpreted it, is: the service ChatGPT is “woke”, because OpenAI heavily censors the LMM and goes to great lengths to make sure it doesn’t generate anything too controversial or harmful. You can complain that a product is “woke” without actually believing that the product is conscious.

          My issue with that crowd is simply that “woke” is generally used to mean an intentional absence of racism/sexism/antisemitism/homophobia/transphobia/other similarly harmful ideas.

          1. 3

            “Wokeness” is a political judgement (as is any specific definition of what constitutes racism, sexism, etc.). So people complaining about ChatGPT being woke are really complaining that OpenAI is engaging in political censorship of their model’s outputs according to political considerations that they disagree with.

            In our era, basically half of American politics in general is arguments over the definition of racism, etc. that institutions punish effectively, and OpenAI’s control of ChatGPT’s output is just another battleground for this political fight to occur on. There’s nothing particularly special about Large Language Models here - the language model is just another piece of software running on Someone Else’s Computer that they grant you access to, so they ultimately control how you can use it.

            I do think that people should be extremely wary of ChatGPT as run on servers OpenAI controls, and even moreso if they don’t basically have the same set of political beliefs as the employees who work there and the institutions who can effectively influence OpenAI’s corporate decision-making. This is a longstanding free software argument, and makes ChatGPT no different than Twitter or any other social media website you could name.

    2. 10

      I think the better question is, who cares?

      The Turing Test hasn’t really been relevant for a long time. It was considered kind of a joke when I was an undergraduate in the late 90s.

      1. 3

        Why not?

        1. 21

          “A.I.” always means the thing we can’t do yet. Once we can do something, it’s just sparkling algorithms.

          1. 2

            That used to be the case, but I think we’ve passed that point somewhere around 2010s. Recognizing apples in images is something we’ve been able to do reliably for 5-10 years or so now, but nobody would argue that they’re not AI applications.

        2. 1

          I don’t know about the 90s, but a chat bot “passed the Turing test” in 2014 (https://en.m.wikipedia.org/wiki/Eugene_Goostman).

          1. 3

            Goostman is portrayed as a 13-year-old Ukrainian boy—characteristics that are intended to induce forgiveness in those with whom it interacts for its grammatical errors and lack of general knowledge.

            Uhm… that’s a bit of special pleading.

            A “real” Turing test would be to present either participant (the ones questioned by the interrogator) as “peers”, such as adults with English[1] as their native language. Ideally the interrogator would also know that they were taking part in a Turing test, and could therefore “probe” accordingly.

            Considering the great interest in and the general availability of ChatGPT I’m surprised no-one has tried the Turing test with it.

            [1] in this case, as I believe ChatGPT is English-only.

            1. 3

              ChatGPT speaks better Spanish than I do.

              1. 2

                Makes sense, there’s a huge corpus of Spanish to train on.

          2. 3

            If passing is fooling humans 30% of the time.

            1. 2

              Given that 50% would mean that the human’s guesses are as good as random, yeah, 30% is pretty significant. If it fooled the human 100% of the time that would somehow imply that it comes off as more human than human.

              1. 1

                If it fooled the human 100% of the time that would somehow imply that it comes off as more human than human.

                I assume most of my interactions (outside stuff like automated customer service chat support) is with a human. So far, I’ve not had reason to suspect that ~5% of every text-based interaction from an entity is actually created by a computer…

                Edit removed question answered in the Wiki article

    3. 10

      I was under the impression that the Turing test was more of a thought experiment used to investigate our concept of intelligence and sentience than an actual test to take, much like Schrödinger didn’t actually expect people to murder cats in boxes in the name of science.

      1. 7

        Fun fact, Turing thought that telepathy was real and completely breaks the Turing test! It’s in the original paper.

      2. 2

        Yes. And, in another parallel to Schrödinger’s story, Turing wanted us to consider the contrapositive implication: what if humans are wholly computational according to some formal system? Then what we think of as the human experience is really just the computational experience of human wetware! This is covered by many explainers of Turing, including Hofstadter and Smith.

    4. 6

      https://cs.nyu.edu/~davise/papers/WinogradSchemas/WS.html

      Can a machine can create a Winograd Schema, yet?

      When run against ChatGPT, it answered with, “A dog chased a cat up a tree and a cat came down.”

      1. 2

        If you feed ChatGPT novel Winograd statements, it only gets them right by chance.

        1. 4

          I just tried a few from here and it got them all correct, including adding correct logical explanations. Are you saying it’s memorized these as part of the training data?

          EDIT: I tried a handful more that I invented, and it got those correct too.

          1. 4

            Are you saying it’s memorized these as part of the training data?

            That sounds perfectly reasonable to me.

            1. 2

              Yes, anything that can be scraped is in the belly of the beast now. :-)

          2. 2

            I made some up a few weeks ago and it got them wrong. Maybe it turned rampant since then.

        2. 1

          If you feed ChatGPT novel Winograd statements, it only gets them right by chance.

          Does it not get them right from statistical inference?

          If the word “shatter” is associated with “glass” more often than “iron” (throughout its training corpus), then doesn’t it follow that it would correctly “guess” that when a glass ball falls on an iron table then the glass ball shatters, and not the iron table?

    5. 5

      This is a nice explainer. Thanks.

      Yep, I think a lot of folks were punked by Turing. He didn’t know, so it wasn’t deliberate.

      Many years ago, my grandmother who was in her 90s had a series of mini-strokes. For a while she was awake, alert, and went home. It’s just that everything wasn’t “right” with her. She didn’t talk a lot.

      I was close to my grandmother, so I visited several times. During one of those visits, I noticed something interesting: I could start a conversation with a common phrase we had used during our life together, say “And how are you doing this fine morning?”

      She would respond! In fact, it was possible to get into a rhythm of conversation that sounded completely normal to an outside observer. I think on some level she got what was going on, but at another level she was just on autopilot.

      Passing the Turing test is indeed a major milestone if it’s happened, but it’s not a milestone in AI. It’s a milestone showing all of humanity how little intelligence is required in most all of our activities and how we love to assign intelligence where it doesn’t exist. I have no idea what we’re going to do with that knowledge, but that’s a new thing for us. It’ll be interesting.

      1. 2

        I just wanted to say that I am sorry about your grandmother. My father still can’t speak at all. Funnily enough, he still assumes we somehow understand him even without him talking and gets angry that we don’t.

    6. 6

      Student summed it up nicely: “No evidence is presented of ChatGPT passing the Turing test.”

    7. 2

      The “Turing Test” is actually just Descartes’ First Test. We need to work on passing Descartes’ Second Test. https://blog.carlmjohnson.net/post/tom-gauld-chess-computers-and-here-i/

      The second test is, that although such machines might execute many things with equal or perhaps greater perfection than any of us, they would, without doubt, fail in certain others from which it could be discovered that they did not act from knowledge, but solely from the disposition of their organs: for while reason is an universal instrument that is alike available on every occasion, these organs, on the contrary, need a particular arrangement for each particular action; whence it must be morally impossible that there should exist in any machine a diversity of organs sufficient to enable it to act in all the occurrences of life, in the way in which our reason enables us to act.

      1. 1

        Wouldn’t that be covered by a non-blind Turing test?

        If I know that I’m trying to identify a potential automaton, I’m going to lean into much weirder territory than normal conversation. Like the Voight-Kampff test from Blade Runner, but probably involving more “The the the the the the the the the the the the the the the”-ing.

        1. 1

          The first Cartesian test and the Turing test are both about language: can the computer chat “so as appositely to reply to what is said in its presence, as men of the lowest grade of intellect can do.”

          The second Cartesian test is about Artificial General Intelligence: can it make progress on any problem, not just specific kinds of problems that it has been trained on. I think that LLMs turn out to be making progress towards AGI, but they still have a ways to go.