It seems like this post has an unstated motivation for getting rid of source code that I’m not seeing? Is it that it’s hard to type, or that in theory a neural network “shouldn’t need” to have source code? (and I’m not sure I agree with or understand the latter claim)
I think this summarizes the whole problem:
Today, GPT-based systems are being used to produce source code as an intermediate representation. We review and then feed that generated code into the build process. It’s worth considering whether this is necessary. … Human validation is currently a key part of the process and source code is a traceable medium.
So I agree that source code is a common language for humans and machines to collaborate. Sure, if machines can do 100% of the work, MAYBE they would choose some other representation, but that’s not even clear to me. They would probably just reuse compilers and interpreters as is.
(I also don’t see them doing 100% of the work any time soon – there is a danger in extrapolating exponentials, e.g. people wrote 20 years ago that Google would become conscious, etc.)
The LLMs seem to be quite adept at dealing with source code. Fundamentally they deal with syntax, and through some magic process some fuzzy and flawed notion of semantics sometimes arises. So it’s not clear to me why we’d want to get rid of the syntax.
I think there is a possible fallacy of thinking of LLMs as traditional computing systems where you might view source code as unnecessary; they are more like a different kind of computation which likes syntax.
For example, I think if the problem is naturally modelled in Python, it’s probably more likely for the LLM to directly generate a correct Python solution than it is for it to generate say the equivalent and correct assembly code. Or if the problem is naturally modelled in Erlang, they’ll probably like to write the Erlang code.
I think it relates to some mathematical notions of program compression and length, which exist independently of whether humans or LLMs are manipulating the program.
Maybe a shorter way of saying this is that LLMs are trained on programs that humans wrote, which have syntax.
So I’d expect them to be better at using that syntax than using a language that no human has ever used. (Aside from the hugely important point that humans also have to collaborate on the code. Why would anyone make their own job harder?)
It’s funny, my motivation was really to try to figure out whether there was any possible scenario under which source code might disappear. I like source code a lot, really.
As you might know, in Darklang-classic, you wrote code using a “structured editor”. This is a non-freeform editing experience that our users have rated somewhere between “Ok I guess” and “probably the worst part of Darklang”.
As well as no longer being important in a world of generated code, the old editor’s code was pretty awful, and no one was really excited about saving it. While we’ll always remember the good times we had with the structured editor, long story short, a few of us took it round back and shot it in the head last month.
I’ve have been making these M data structures x N operations arguments on the blog, with respect to text as a narrow waist.
It seems like this post has an unstated motivation for getting rid of source code that I’m not seeing? Is it that it’s hard to type, or that in theory a neural network “shouldn’t need” to have source code? (and I’m not sure I agree with or understand the latter claim)
I think this summarizes the whole problem:
So I agree that source code is a common language for humans and machines to collaborate. Sure, if machines can do 100% of the work, MAYBE they would choose some other representation, but that’s not even clear to me. They would probably just reuse compilers and interpreters as is.
(I also don’t see them doing 100% of the work any time soon – there is a danger in extrapolating exponentials, e.g. people wrote 20 years ago that Google would become conscious, etc.)
The LLMs seem to be quite adept at dealing with source code. Fundamentally they deal with syntax, and through some magic process some fuzzy and flawed notion of semantics sometimes arises. So it’s not clear to me why we’d want to get rid of the syntax.
I think there is a possible fallacy of thinking of LLMs as traditional computing systems where you might view source code as unnecessary; they are more like a different kind of computation which likes syntax.
For example, I think if the problem is naturally modelled in Python, it’s probably more likely for the LLM to directly generate a correct Python solution than it is for it to generate say the equivalent and correct assembly code. Or if the problem is naturally modelled in Erlang, they’ll probably like to write the Erlang code.
I think it relates to some mathematical notions of program compression and length, which exist independently of whether humans or LLMs are manipulating the program.
Maybe a shorter way of saying this is that LLMs are trained on programs that humans wrote, which have syntax.
So I’d expect them to be better at using that syntax than using a language that no human has ever used. (Aside from the hugely important point that humans also have to collaborate on the code. Why would anyone make their own job harder?)
Gary Marcus used the phrase “king of pastiche”, which seems accurate – https://garymarcus.substack.com/p/how-come-gpt-can-seem-so-brilliant
Post author here.
It’s funny, my motivation was really to try to figure out whether there was any possible scenario under which source code might disappear. I like source code a lot, really.
Another thing I was thinking of is that Dark just killed their structured editor, you know the kind where you can only write valid syntax.
And the reason was AI !!!
LLMs like dealing with text – they are wizards at it. They will not like using your custom editor for structured source code!
https://lobste.rs/s/elifoa/how_does_ai_change_programming_languages
https://blog.darklang.com/gpt/
I’ve have been making these M data structures x N operations arguments on the blog, with respect to text as a narrow waist.
https://www.oilshell.org/blog/2022/02/diagrams.html
Extremely important additions to the hourglass diagrams:
So yeah text is here to stay – it is a medium for humans and machines to collaborate, just as it’s a medium for humans and humans to collaborate.
https://platform.openai.com/tokenizer