Expanding on my thinking a bit, LLMs are showing significant promise at mediocre solutions to very general problems
I think this is where we’ll really see the game-changing things and I disagree with the rest of the paragraph that follows this. It isn’t so much about places where we’re hiring people to do things now and losing money, it’s about places where hiring someone with the skills would be so cost prohibitive that no one even thinks of doing it.
Imagine an office where everyone has a programmer sitting beside them with the job of automating anything that they do that’s tedious. I bet you’d easily see a 20% productivity increase across the board and a 10x increase in a few places. No one does this because the cost of doubling your workforce (even with the cheapest programmers that you can hire) is far more than a 20% increase in staff costs and identifying the few people who would get a 10x increase requires deep business understanding that rarely exist.
Now, instead of hiring a programmer for everyone, you give them a crappy AI assistant that can generate mediocre code from plain-text descriptions (or even from just watching what they do and predicting the next step). It may not give you the same speedup as a competent programmer, but the costs are tiny in comparison. An extra $10-20/month on top of the cost of hiring even a minimum-wage employee is basically in the noise.
The question will be how much those cost saving will be balanced with additional spending later down the road
Even today, many business run on terrible Excel spreadsheet or Access Database, coded by motivated amateur. When the original dev eventually move on, the spreadsheet stay in the business until it need to be updated, and then a very pricey expert is hired to change it. The company has by then become addicted to its software spreadsheet and must keep paying a lot of money if it want to keep it running.
Truth is, “bad cod written very quickly” is something we already have truckload of, and that business are already paying dearly. The “cost saving” they accomplished by hiring someone’s nephew who “know his way around computer” end up being eaten up by problem down the road: The software is usually so bad that it cost customers, reputation, money, and can in worse case often sink the business it was supposed to help.
That is a situation that exists now. AI generated software may well worsen that situation a lot. More bad code will allow more business to make more mistake faster, sometime fatal one, hurting themselves and probably their customers too in the process.
Yeah you can automate the work of some workers you think are unimportant… but many manager will find that those worker role were much more critical than they gave them credit for, and that mistake can be very, very expensive. A lot of business will discover this the hard way.
I think you’re overly pessimistic about businesses ran on Excel and duct tape. They don’t all lose money on it. It’s often a very pragmatic and reasonable step for smaller businesses. It gets replaced with a more advanced solution only after the business outgrows it, and losses from inefficiencies or failures of the Excel-based process exceed cost of switching to a better process. Growing businesses are pretty much all the time in a state of hitting limits of their tooling and processes. It can even be a mistake to invest too much into proper quality tools and processes before you need them.
Not to mention, it’s the “user empowerment” we should care about. The tools for it should be better, but the fact these tools exist and are helping the everyman is pretty good. You shouldn’t need to get a “real programmer” to make the CRUD app for your dog grooming business or whatever.
I think there’s a lot of low hanging fruit before you get to CRUD apps. Watching a normal office worker use a computer is amazingly frustrating. They will spend ten minutes doing a task every day that you could automate in a few minutes if you understood exactly what they were doing (which may take an hour) and they’ll do this kind of thing repeatedly. End user programming environments have largely failed, but something like ChatGPT connected to the AppleScript hooks (or local platform equivalent) could be a game changer (as long as they come with an emergency undo button).
You don’t even need to connect it to AppleScript hooks. I recently polled my nontech friends how they’d UPCASE A SENTENCE and many said they’d retype it manually. Simple text transformation tasks would be a godsent. They’re even a godsent if you’re a programmer. Here’s one I recently gave to GPT4:
Take the input text and return it unchanged,
except wrap cmdlets with
markdown links to the corresponding
doc for that Only change
cmdlets that are written out in full. {text follows}
ofc I checked each link manually afterwards; still saved me a good ten minutes.
I came up with a description of a programming task ChatGPT was not able to solve. I was surprised when gpt4 came out and it was able to solve it. There was an improvement in code gen capability, or the ability to interpret my description.
I would describe the current LLMs programming capability as shallow. it can regurgitate and customize common templates, it can apply relatively simple transforms. I imagine that iterative refinement + CoT can give a modest boost.
But what is the ceiling for pure LLMs coding capability? What is the fundamental limit to how good it can be at code gen? Are copilot type systems trained on compiler/interpreter output? I think one part of an answer is that it is limited by context length. There are also financial and computational limits on how much training we can do and how large we can scale the neural networks. I am not sure to what extent the data involved matters.
It seems equally plausible to me that the current LLMs are one iteration away from their limit in programming as it is to imagine they could out-program me on the same scale that AlphaGo can outplay me at go. At least in the context of a well specified programming task. It’s so difficult to predict at this point, I guess we will have to wait and see.
Yes, it feels impossible to predict what the ceiling is. On the one hand, they’ve already eaten up the internet, and there isn’t a second internet to use as supplemental training data, so maybe the ceiling is just a few feet overhead. On the other hand, a lot of new applications are being launched with just one shot learning, which while surprisingly effective, is obviously only worthwhile as a cost saving move. If money weren’t an object, you’d do fine tuning instead. So it could be that fine tuning is all we need to get another qualitative jump in abilities.
The AlphaGo analogy is pretty concerning, TBH. :-) You train an LLM to the point where it can evaluate its own responses, and then set it loose against itself, and 💥
Yes. In Go, there’s a clearly defined end state: victory or loss. However, one of the things that held back Go research for a long time was that it was hard to score the middle state. This meant you couldn’t just do a min-max search tree like you can with chess. (In college, my prof assigned us to try to do a min-max of Star Trek’s 3D chess as a group project, but we were all hopelessly out of our depths, and one kid did all the work for us.) But AlphaGo just bute forced passed all that.
For coding, you could imagine something like: tell the LLM to think up N coding challenges a la Leet Code or Advent of Code etc and then solve them, then feed those solutions into a compiler and run them if they compile (sandbox the machine, lol), then feed that back to the LLM so it knows if the code it produced compiles and passes its own test suite. It could be a good way of generating self-training data. It only works if the base model is smart enough to be able to get an initial foothold to bootstrap itself, but I think at this point the LLMs should be able to do that.
I think this is where we’ll really see the game-changing things and I disagree with the rest of the paragraph that follows this. It isn’t so much about places where we’re hiring people to do things now and losing money, it’s about places where hiring someone with the skills would be so cost prohibitive that no one even thinks of doing it.
Imagine an office where everyone has a programmer sitting beside them with the job of automating anything that they do that’s tedious. I bet you’d easily see a 20% productivity increase across the board and a 10x increase in a few places. No one does this because the cost of doubling your workforce (even with the cheapest programmers that you can hire) is far more than a 20% increase in staff costs and identifying the few people who would get a 10x increase requires deep business understanding that rarely exist.
Now, instead of hiring a programmer for everyone, you give them a crappy AI assistant that can generate mediocre code from plain-text descriptions (or even from just watching what they do and predicting the next step). It may not give you the same speedup as a competent programmer, but the costs are tiny in comparison. An extra $10-20/month on top of the cost of hiring even a minimum-wage employee is basically in the noise.
The question will be how much those cost saving will be balanced with additional spending later down the road
Even today, many business run on terrible Excel spreadsheet or Access Database, coded by motivated amateur. When the original dev eventually move on, the spreadsheet stay in the business until it need to be updated, and then a very pricey expert is hired to change it. The company has by then become addicted to its software spreadsheet and must keep paying a lot of money if it want to keep it running.
Truth is, “bad cod written very quickly” is something we already have truckload of, and that business are already paying dearly. The “cost saving” they accomplished by hiring someone’s nephew who “know his way around computer” end up being eaten up by problem down the road: The software is usually so bad that it cost customers, reputation, money, and can in worse case often sink the business it was supposed to help.
That is a situation that exists now. AI generated software may well worsen that situation a lot. More bad code will allow more business to make more mistake faster, sometime fatal one, hurting themselves and probably their customers too in the process.
Yeah you can automate the work of some workers you think are unimportant… but many manager will find that those worker role were much more critical than they gave them credit for, and that mistake can be very, very expensive. A lot of business will discover this the hard way.
I think you’re overly pessimistic about businesses ran on Excel and duct tape. They don’t all lose money on it. It’s often a very pragmatic and reasonable step for smaller businesses. It gets replaced with a more advanced solution only after the business outgrows it, and losses from inefficiencies or failures of the Excel-based process exceed cost of switching to a better process. Growing businesses are pretty much all the time in a state of hitting limits of their tooling and processes. It can even be a mistake to invest too much into proper quality tools and processes before you need them.
Not to mention, it’s the “user empowerment” we should care about. The tools for it should be better, but the fact these tools exist and are helping the everyman is pretty good. You shouldn’t need to get a “real programmer” to make the CRUD app for your dog grooming business or whatever.
I think there’s a lot of low hanging fruit before you get to CRUD apps. Watching a normal office worker use a computer is amazingly frustrating. They will spend ten minutes doing a task every day that you could automate in a few minutes if you understood exactly what they were doing (which may take an hour) and they’ll do this kind of thing repeatedly. End user programming environments have largely failed, but something like ChatGPT connected to the AppleScript hooks (or local platform equivalent) could be a game changer (as long as they come with an emergency undo button).
You don’t even need to connect it to AppleScript hooks. I recently polled my nontech friends how they’d UPCASE A SENTENCE and many said they’d retype it manually. Simple text transformation tasks would be a godsent. They’re even a godsent if you’re a programmer. Here’s one I recently gave to GPT4:
ofc I checked each link manually afterwards; still saved me a good ten minutes.
I came up with a description of a programming task ChatGPT was not able to solve. I was surprised when gpt4 came out and it was able to solve it. There was an improvement in code gen capability, or the ability to interpret my description.
I would describe the current LLMs programming capability as shallow. it can regurgitate and customize common templates, it can apply relatively simple transforms. I imagine that iterative refinement + CoT can give a modest boost.
But what is the ceiling for pure LLMs coding capability? What is the fundamental limit to how good it can be at code gen? Are copilot type systems trained on compiler/interpreter output? I think one part of an answer is that it is limited by context length. There are also financial and computational limits on how much training we can do and how large we can scale the neural networks. I am not sure to what extent the data involved matters.
It seems equally plausible to me that the current LLMs are one iteration away from their limit in programming as it is to imagine they could out-program me on the same scale that AlphaGo can outplay me at go. At least in the context of a well specified programming task. It’s so difficult to predict at this point, I guess we will have to wait and see.
Yes, it feels impossible to predict what the ceiling is. On the one hand, they’ve already eaten up the internet, and there isn’t a second internet to use as supplemental training data, so maybe the ceiling is just a few feet overhead. On the other hand, a lot of new applications are being launched with just one shot learning, which while surprisingly effective, is obviously only worthwhile as a cost saving move. If money weren’t an object, you’d do fine tuning instead. So it could be that fine tuning is all we need to get another qualitative jump in abilities.
The AlphaGo analogy is pretty concerning, TBH. :-) You train an LLM to the point where it can evaluate its own responses, and then set it loose against itself, and 💥
Go is a very constrained system compared to writing software.
Yes. In Go, there’s a clearly defined end state: victory or loss. However, one of the things that held back Go research for a long time was that it was hard to score the middle state. This meant you couldn’t just do a min-max search tree like you can with chess. (In college, my prof assigned us to try to do a min-max of Star Trek’s 3D chess as a group project, but we were all hopelessly out of our depths, and one kid did all the work for us.) But AlphaGo just bute forced passed all that.
For coding, you could imagine something like: tell the LLM to think up N coding challenges a la Leet Code or Advent of Code etc and then solve them, then feed those solutions into a compiler and run them if they compile (sandbox the machine, lol), then feed that back to the LLM so it knows if the code it produced compiles and passes its own test suite. It could be a good way of generating self-training data. It only works if the base model is smart enough to be able to get an initial foothold to bootstrap itself, but I think at this point the LLMs should be able to do that.