1. 36
    1. 32

      Friends and I have been playing with ChatGPT for the last couple of days, and a few things come to mind:

      • It has a decently good “understanding” of Elixir, perhaps on par with an intern or junior who can paraphrase all the blog posts they’ve recalled.
      • It will generate code with the right “shape” for some things, for example nix flakes. The details though it gets wrong.
      • If you start playing games with it to try and circumvent blocks, you can usually do so pretty easily.
      • It is capable of turning at least high-level English descriptions of “I want an app with thus-and-such data model, give me the boilerplate codegen to run” into usable steps. It isn’t capable of spotting the “smart” or “experienced” way of doing things _unless those are common knowledge on Stack Overflow or some equivalent in its training corpus.
      • It does okay at spotting and explaining some classes of code bugs, or parroting reasonable design considerations.
      • It is extremely easy to “taint” the results. I had it pretending to give a lecture on a 40-opcode VM machine for running GPT, and because it was a long session and required multiple “and then you continue” prompts from me, I had the chance to see it react in its replies to subtle suggestions in the “continue” prompts. One prompt, for example, said something like “and then you get excited for the next part talking about things that improve performance for the VM implementation” and like 3/5 of the next opcodes were “CACHE” or something like it.
      • It will give bogus information (say, asking it to generate certs or perform certain ciphers).

      My friends and my takeaways about it were that it could probably be an okay rubber ducking tool and a neat way of seeding a “I’m curious about X, but I don’t know who to start asking questions to” session. It is also extremely easy to gaslight yourself with it, because the human brain will pattern-match and anthropomorphize all the things. It almost caught me up once or twice when trying to get it to divulge information about its host–you get caught up in the story it’s weaving, and unless you keep a strong amount of skepticism you will fool yourself into thinking it’s doing something it isn’t.

      The singularity isn’t here yet, our jobs are safe for another decade, but we sure do now have a neat interactive assistant for searching Stack Overflow–and that’s really the other thing I noticed, is that I think most of the impressive results this thing gives are more a function of the sheer quantity of shit we as programmers have memoized via sharing than any sort of actual reasoning.

      Have fun with it, but be careful.

      1. 17

        I think one of the important take-home lessons from ChatGPT is that a lot of programming tasks require far more repetitive boiler plate than they should. The system can easily solve them because it has large numbers of examples with tiny differences to look at.

        1. 2

          Lisp Master Race rises again. Who cares about macros, all I care about is minimizing the stupid boilerplate.

          How well does ChatGPT do with a lisp?

      2. 4

        Thanks for mentioning that it’s only quite good at spotting some bugs. Based on what I have been seeing on Twitter and Discord, I have been assuming that it is very good at spotting bugs. But if it only spots some bugs, it’s not going to be nearly as good as a proofreading (or, I guess, static analysis?) tool, right?

        1. 8

          That’s the thing that’s annoying about it. It does such a good job for a lot of things that it lulls you into a false sense of security when you still need to be double-checking its work. IIRC it missed a format injection attack but caught the general bad usage of C for strings.

          I think it could be a force-multiplier to replace junior-level boilerplate generation in tasks where writing the code is slower than verifying/eyeballing it; my caution is that for sensitive stuff you still have to do the work yourself.

      3. 3

        Hm, my thoughts exactly. I had it generate docker files and k8s manifests. It’s good as a generic scaffolding tool but that’s about it.

      4. 2

        I asked it to generate valid boards for Sokoban, and then solve them. It did a decent job of generating boards, with valid goals. It gave a list of valid moves, but did not solve, unfortunately. :)

        It was able to get pretty close to writing a FizzBuzz implementation for a bytecode VM it wrote, even after I told it to use indirect threading instead of a switch statement. Was too late to ask about a garbage collector for the system.

    2. 12

      it did as good as me when I took it in 2018 💀

      1. 4

        In a way, AI makes me feel incredibly stupid and unskilled.

    3. 8

      This is interesting, but I can’t help but note that ChatGPT can’t solve extremely basic logic problems, like:

      For the input:

      Complete the following mathematical statement: “Given two parallel lines A and C, and two parallel lines B and D, lines A and B meet at this many points:”

      I got the output:

      It is not possible to complete this mathematical statement as stated because parallel lines do not meet at any points. By definition, parallel lines are lines that are in the same plane and do not intersect. Therefore, lines A and B cannot meet at any points.

      There are many valid answers to this question (“I need more information”; “zero, one, or infinity”, etc.) but “zero” is not one. I wouldn’t let someone who even occasionally made a mistake like that anywhere close to my code. I think ChatGPT is completing code just like any other language, rather than demonstrating any actual understanding of the problems at hand.

      1. 6

        that is correct. gpt does not work with meanings or semantics. it processes purely at a syntactic level.

        1. 3

          Syntactic transformations are Turing complete. What’s the difference between a sufficiently advanced syntactic transformation and meaning?

          DNA also processes purely at a structural level, and yet it eventually gave rise to meaning-processing machinery.

          1. 4

            The difference is that GPT’s syntactic processing is extremely shallow, compared to even simple logic machines from the 60s.

            1. 4

              I’m not sure I agree that the processing is shallow. Well maybe.

              The processing is quite deep in the following senses: it uses a multi layer neural network that recognizes increasingly high level conceptual information about the words its reading. it can handle significant long range dependencies between words, and it draws upon an extremely large data set.

              but in another sense, the processing is shallow in that it just goes text in => text out, it can’t stop and think about something before talking.

              1. 3

                I generally agree with your description.

                My use of ‘shallow’ was in the context of logical computation. For example, it can do a bit of very simple arithmetic, but breaks the second it gets a little complicated. But I think it’s also shallow in a more colloquial way. For example, if you ask it to write a song, you will get something plausible, but not much more sophisticated than a common pop song.

                1. 2

                  I agree with both meaning of “shallow”; I disagree that this is more shallow than the machines of the 60s.

                  1. 4

                    Well, consider that any computer, even really old ones, can parse an arbitrarily long and complex arithmetic expression and calculate it with extreme precision. They can’t write poems, but they have logical depth that GPT is sorely missing.

                    1. 2

                      Sure, but humans can’t even do that with a computer’s reliability, that’s what I mean. I agree that GPT doesn’t have this capability; I’m not sure if it’s a load-bearing element of “human or superhuman cognitive ability”.

                      1. 2

                        No, but we can do a LOT of it. Just very very slowly.

          2. 3

            Syntactic transformations are Turing complete

            What do you mean by this?

            What’s the difference between a sufficiently advanced syntactic transformation and meaning?

            The difference between GPT and the way people process language is that GPT doesn’t know that our 3-dimensional universe exists. Its entire universe is just streams of text. It’s just playing chess with a million pieces - it cannot know what any of these pieces represent because it doesn’t experience them.

            1. 2

              However, in a sense neither can we. After all, our brain likewise only interacts with electrical impulses, not unmediated reality. GPT is more distant from the universe than we are, because it only accesses human-preprocessed descriptions, but it’s a difference of degree not kind, IMO.

              Another way to look at GPT is as a practical experiment on the philosophical question of whether one can in fact use words to describe the meaning of colors to a congenitally blind person. It turns out, surprisingly well?

              What do you mean by this?

              Any computation can be expressed in terms of a system of rewriting strings. Quoth WP: “As a formalism, string rewriting systems are Turing complete.”

    4. 6

      One of its mistakes is:

      Question 1, point 1: fails to call getPoints or goalReached on a level object (it tries to access a levels array which doesn’t exist)

      An interesting thing about this problem space is you could cheaply try compiling the output and feeding the error back in an automated way. Ditto for linters or other non-AI tools. You could also ask the machine to write a test (or the human writes one) and report back if and how it fails. Someone’s experiments with Advent of Code show AI models can work with feedback like that.

      Those wouldn’t pass the spec for the task here (which is to pass a test that humans pass closed-book), but it does suggest there are strategies outside of the models themselves to make models better at real-world tasks.

      Also, that’s using tools built made to help humans. In current deployments, AI shows some non-human-like failure modes like producing runs of highly repetitive code or blending two potential interpretations of the input text (notably, unlike a human, AI can’t or at least doesn’t usually seek clarification about intended meaning when it’s not sure). That suggests there might be other AI-specific improvements possible, again outside of making the bare model better at getting the right result the first time.

      1. 12

        An interesting thing about this problem space is you could cheaply try compiling the output and feeding the error back in an automated way.

        I spent much of today “pair programming” with ChatGPT. Just feeding the error back will sometimes help. But not at all consistently. ChatGPT struggles with conditional branching, with non-determinism, and a number of other things. But I was eventually able to talk it through things: http://www.randomhacks.net/2019/03/09/pair-programming-with-chatgpt/

        The kind of things that only rarely work:

        • Feeding ChatGPT error messages.
        • Explaining that it has a bug in a particular expression.

        Things that work surprisingly well (paraphrased):

        • Restarting the session when it starts to get confused.
        • “Let’s try writing that string parsing in Python.” (Not C. sscanf is no way to live.)
        • “Here are some example inputs and outputs.”
        • “Please rewrite the parsing function using regular expressions.”
        • “Show me an efficient formula to calculate X. Now translate that formula to a Python function with the signature Y.”
        • “OK, now that the Python version is working, translate it to idiomatic Rust.”
        • “Please write a bunch of unit tests for function X.”

        It’s really more like mentoring a very diligent junior programmer than anything else. You have to be specific, you have to give examples of what you want, and you have to realize when it’s getting in over its head and propose a better approach. But it’s willing to do quite a lot of the scut work, and it appears to have memorized StackOverflow.

        I suspect that there is some CoPilot-adjacent tool that would be one hell of a pair programmer, given a human partner that knew how to guide it. Or at least, such a thing will exist within 5 years.

        But the more I played with ChatGPT, the more I felt like there was a yet-to-be-closed “strange loop” in its cognition. Like, I’m pretty sure that if you asked ChatGPT if there were better ways parse strings than lots of split and if, it could probably suggest trying regular expressions. I’d give you even odds it can explain the Chomsky hierarchy. But when it gets stuck trying to fix a underpowered parsing function, it can’t recursively query itself and ask about better parsing techniques, and then recursively feed itself instructions to use that prompt. At least not consistently. I need to close the “strange loop” for it.

        When it’s clever, it’s terribly clever. When it fails, it sometimes has most of the knowledge needed to do better.

        I figure that we’re still several breakthroughs away from general-purpose AI (happily). But I think it’s also a mistake to focus too much on the ways ChatGPT breaks. The ways in which it succeeds, sometimes implausibly well, are also really interesting.

        1. 1

          Yes! I rarely see somebody notice this.

          It feels like the GPT family is a (super?)humanlike implementation of about half of a human cognition.

      2. 2

        An interesting thing about this problem space is you could cheaply try compiling the output and feeding the error back in an automated way.

        It exists! There’s a neat little open-source CLI program called UPG that does exactly this (using Codex IIRC). I started a new thread for it over here: https://lobste.rs/s/0gi7bi/upg_create_edit_programs_with_natural