1. 5
    1. 6

      I’ve heard that much of the “Sparks of AGI” paper (HORRIBLE cringeworthy title) cannot be replicated.

      1. 7

        It seems really irresponsible to make such a broad and sweeping claim in the title. Is clickbait fair game in academia now?

        1. 5

          Let us be honest; that bird has already flown when an entire field named itself artificial intelligence and later machine learning.

        2. 1

          yes

      2. 3

        This was addressed in the video. If I understood correctly they had access to a pre-release of GPT-4, and as it received more safety training its ability to answer some questions was negatively impacted. (Such as its ability to draw SVG unicorns.)

      3. 2

        Any place I can read about the efforts to replicate it?

        1. 3

          There’s not been a directed effort to replicate it. Just random individuals have tried out bits of the paper and found that it doesn’t produce the documented results. More generally, reproducibility of scentific results on LLMs is kind of in a crisis right now because the state of the art is a for profit company asset. With time, things may become more scientifically hygenic but we gotta make do with what we got right now.

    2. 3

      After many, many hours experimenting with GPT-4, I’m convinced that it does have some properties of emergent general intelligence.

      That does not mean it is as smart as a human, that it is sentient, or even that it is useful - just that it can’t be explained as reciting rote text with word changes.

      But it can do basic logic and manipulation of concepts. It can follow a story and explain a joke, even completely original ones. It can read a news article and answer questions about it. It can come up with original ideas - or at least ideas I certainly can’t find anywhere using Google.

      Again, I do not think it’s a good idea to say it’s sentient or anything of the sort. It doesn’t even have a memory outside its very short context window. It is easy to confuse with misdirection. It doesn’t behave anything like a human in many situations.

      But on the other hand, we have no other non-human entities that can manipulate concepts and express them with language (other than perhaps some primates or dolphins), so it’s worthy of deep and intense study.

    3. 3

      Disappointing to see the promotion of such obvious con artists, but given the state of the field of ‘AI’ where the real work is usually buried and liars flourish it is unsurprising.

      1. 2

        Would be very interested in reading a summary of the con-artistry employed, either as a comment or a blog post.

        1. 4

          Been mulling over how to reply to this - but I realised that I’m not quite sure what you are asking. Eg are you asking for a history of the political economy and major research strands and self-presentation of those various strands to funders and the public of the field (a big question that I may take a stab at writing up my answer to some day, but not this day), or are you asking about the specific con artistry in the ‘Sparks of AGI’ stuff (more achievable, I might feasibly be able to write up my answer to this in a useful way, but still probably not soon), or some other thing?

        2. 3

          well they call it “AI” but it’s actually just predicting the next token.

          1. 3

            And? If you’re just going to go about repeating the stochastic parrot argument without expanding on it, then frankly you’re demonstrating less of that I in AI than ChatGPT.

            The interesting thing about GPT-x is that despite its architecture, it does appear to be able to perform basic reasoning; even spatial reasoning has arisen in it to an extent. Saying it’s just “predicting the next token” is underselling it by far.

            What’s the difference between the intuitive leaps needed by humans using mathematical notation on paper to derive a mathematical proof step by step, and “predicting the next token”?

            1. 8

              Two differences:

              1. No sources on me, but I’ve read a few different cases where it gives the correct answer for a problem and a wildly wrong answer for a slight perturbation of the problem, like telling it to sum “chickens” instead of “dollars”.
              2. I’ve personally seen that it gives very different answers if you fiddle with the temperature and penalty parameters. It’s less that it “does reasoning” and more that in the right regions and right questions, prediction and reasoning look identical. When you move out of that region it breaks down.
              1. 4

                For some reason your comment reminded me of Clever Hans.

            2. 4

              What’s the difference between the intuitive leaps needed by humans using mathematical notation on paper to derive a mathematical proof step by step, and “predicting the next token”?

              The difference is that they are completely different. So, part of the issue is that glossing the thing as ‘predicting the next token’ can be deliberately misinterpreted as ‘predicting the next token of the correct answer’, which would be something that at least has some kind of meaning grounding (albeit a very strange one). A better gloss would be ‘producing an answer that has the correct vibes’, which makes it a bit more obvious that there is no meaning grounding but that it is quite good at producing things that will fool people.

              The interesting thing about GPT-x is that despite its architecture, it does appear to be able to perform basic reasoning; even spatial reasoning has arisen in it to an extent.

              This is completely false. All of the publicly available examples that people point to to demonstrate this lack ‘construct validity’, and many of the examples have the problem of testing on the training set. If you actually understand what an LLM is and how it works, then you know that there is no ‘reasoning’ happening. No ‘concepts’. No ‘understanding’ Nothing like that at all.

            3. 1

              Maybe it doesn’t help that there are multiple versions of ChatGPT but people often refer to it without including the version number. I’ve only ever used the one at https://chat.openai.com/ but I don’t know which one that is. Presumably it isn’t ChatGPT-4 - the one that this paper is about - as it isn’t very good. I mean, it’s a bit better than Eliza, but it’s not a huge leap ahead. Is there a website where I can have a go with the “Sparks of AI” version?

    4. 2

      I’ve been holding off subscribing to ChatGPT-Plus because I already have a pretty good system that uses GPT-3-Turbo via the official API. The video you shared pushed me to finally took the leap, and I must say that I am impressed with the level of responses I get.

      The cap of 25 messages every 3 hours is too limiting, though.

      1. 3

        Apparently a big benefit of chatGPT4 is the larger context window which enables you to “teach” the system by correcting it. I tried to do this to teach it the TLA+ proof language but was largely unsuccessful.

        1. 2

          That’s a shame it didn’t already know TLA+!

          1. 2

            It does, just not the proof part of the language.

      2. 1

        Use the developer API, it’s 3 cents per thousand tokens and you can pick your own system message.