IMO programmers who think that AI cannot help them aren’t being creative enough in how they use them. I don’t use ChatGPT to write whole programs for me, I use it for things like getting implementation details of third party libraries.
Yes, but vice versa, I think for most programmers it’s not even a 10% improvement in productivity. It’s an occasional two hour task cut down to 10 minutes of back and forth with the bot.
What makes it good for CSS is that you can instantly see that it’s completely full of crap and not working at all. For tasks without clear testing conditions, it’s very dangerous, e.g. the insecure POSTing on Github’s Copilot demo page.
I’ve found it really variable and I can easily see people considering it a complete game changer or a total waste of time, depending on where their day-to-day work falls on the spectrum of things I’ve tried.
For knocking together some JavaScript to do something that’s well understood (and probably possible to distill from a hundred mostly correct StackOverflow answers), it’s been great. And, as someone who rarely writes JavaScript, a great way to find how some APIs have changed since I last looked. Using a LLM here let me do things in 10 minutes that would probably have taken a couple of hours without. If you are working in a space where a lot of other people live but you typically don’t, especially if you jump between such spaces a lot and so don’t have the time to build up depth of expertise, it’s a great tool for turning breadth of experience into depth on demand.
I tried it for some things in pgfplots, a fairly popular LaTeX package (and therefore part of a niche ecosystem). It consistently gave me wrong answers. Some were close to right and I could figure out how to do what I wanted from them, a few were right, and a lot were very plausible-looking nonsense). For fairness, I used DuckDuckGo to try to find the answer while it was generating the response. In almost all cases, I was about the same speed with or without the LLM if I was able to solve the problem. For some things I was unable to solve it at all (for example, I had a table column in bytes and I wanted to present those numbers with the base-2 SI prefix - Ki, Mi, and so on - and I completely failed). I probably wasted more time with plausible-but-wrong answers here than I gained overall because I spent ages try to make them work where I’d probably have just given up without the LLM. If you’re doing something where there’s a small amount of data in the training sets then you might be lucky or you might not. I can imagine a 10% or so improvement if the LLM is fast.
I’ve also tried using it to help with systems programming tasks and found that it routinely introduces appalling security holes of the kind I’d expect in example code (which routinely omits error handling and, particularly, the kind of error handling that’s only necessary in the presence of an active attacker). Here, I spent far more time auditing the generated code than I’d have spent writing it from scratch. This is the most dangerous case because, often, the code it generated was correct when given valid input and so non-adversarial testing would have passed. Writing adversarial tests and then seeing that they failed and tracking down the source of the bugs was a huge pain. In this scenario, it’s like working with an intern or a student, something that you would never do to be more productive, but to make them more productive in the longer term. As such, the LLM was a significant productivity drain.
I find that llms really shine when you give them all the context needed to do their task, and rely on some „grammatical“ understanding they learned. Relying on their training corpus somehow being qualitatively good enough to generate good code is a crapshoot and indeed a proper timeline. But, asking it to rewrite the one written out unit test to test 8 more edge cases I specify? Spot on. Ask it to transform the terraform to use an iterative and a variable instead of the hardcoded subnets? Right there. I like writing the first version, or designing the dsl that can then be transformed by the llm. You don’t see many of these approaches around, but that’s where all the stochastical underpinnings really work. Think of it as human language driven dsl refactoring. Because it’s output will be quite self consistent, it will often be „better“ than what I would do because my stamina is only so large.
I do use llms to generate snippets of code and have a pretty good flair for „ok this probably doesn’t exist“, but even then, I do get proper test scaffolding and maybe a hint of where to look in the manual, or even better, what api I actually should implement. It’s a weird thing to explain without showing it. I was very skeptical of using llms to learn something (in this case, eMacs and eMacs lisp) where I don’t know much and I knew the training corpus would be haphazard, but it turned out to be the most fun k had in a long time.
I think honestly it would sell me if it instead of trying to give me the answer, it would provide me links to various sources that should help me out.
“Maybe you should check out pages 10-15 of this paper.” or “This article seems to achieve part of your goal [x], and this one shows how to bring them together [y]”
The problem is it assumes it can give me an answer better than the original source, and while sometimes that’s true, its often not.
I’m sure I could learn to prompt it in a way that would give me these types of answers though..
This is typical of a certain take by experienced programmers where they give the model some kind gotcha „you have to get it right“ task, approach it with no experimental framework (say, trying different approaches, being cognizant of the model used, of the context size, of prompting techniques) and call their afternoon 10 minute one off an actual informed take.
What llms do brilliantly is transform one form of language into another, boilerplate and dullness and rote regurgitation of patterns included. And that’s most of my day job, not opining about loop invariants and binary searches. What’s transformative is what this capability enables. I don’t mind correcting some slightly off implementation of an api or adding a few unit tests by hand. What I mind is spending 4 days wading through a spec and transforming that into a clean typescript api, and then writing a mock server, and then a cli tool and a clean spec. An llm does that better than I would in the time it took me to make my cup of coffee.
This in turn means that I can actually spend those 4 days thinking about the problem I am trying to solve (or if it is even worth being solved)
I would say it’s useful enough to be a tool in my toolbox, but looking over my past interactions with ChatGPT 3.5, it misses the mark and wastes my time pretty frequently. Here are my recent uses of it:
Asked it to improve a remove text inside parens function I wrote. It did, but then it wrote test cases that were blatantly wrong, ie. “Hello (World)” => “Hello World”. I’d give that a B because the actual code was okay and it was obvious where it was screwing up.
Asked it to write code to make a table of contents for an HTML document. It wrote a lot of wrong code. It was mostly helpful for breaking my writers block and getting me kickstarted, but I didn’t really use much if anything that it wrote. For example, it wrote this which tries to use whitespace to indent the ToC:
html += `${indent}<li><a href="#${id}">${text}</a></li>\n`;
Someone learning how to program asked a question about filtering a list in a Slack I’m in. Someone else answered first, but for fun I asked Chat GPT and it gave the same answer. A+ on that one.
Asked it to translate a short Go function to JS and then make the JS idiomatic. It did a very good job here.
CSS question: total waste of time.
Convert a long Python script into Go: totally failed here because the Python was too long and it kept confusing itself about what it had converted or not.
Write some Go code using httptest.NewServer: failure. I dunno, maybe close enough to be helpful, but not better than just reading the docs.
Convert a recursive function to non-recursive. It did a good job of the coding task, but added unasked for commentary that said “Go’s slice is more efficient for small and medium-sized stacks, and doesn’t require manual memory management like a linked list would” which is nonsense.
Another CSS failure.
So, in summary: it’s good at cranking out well known code snippets and converting from one language to another. It’s really, really bad at CSS.
With gpt3.5 pretty much 95% of the time there’ll be something wrong with the code or the transformation I want it to do. With got4 I’ll surprisingly enough get decently working small prototypes, and pretty much on point transformations for most prompts. With some prompt engineering, I can get 3.5 to do specific transformations with a good chance of success. I design every tool around having the user not just in the loop, but at the center of the interaction, do having „wrong“ things come out is not usually a problem.
Same here. I have talked to people that are extremely skeptical of using LLMs for coding, but they expect to give a simple prompt and get a full application going.
I use ChatGPT daily and I can get it to produce what I need; I don’t expect it to be intelligent. I just use it to manipulate text at a very high level.
One thing I really want to use LLMs for if my work ever OK’s use on our code base is generating documentation comments, especially the YARD/JavaDoc/PyDoc kind.
I also find it useful in situations where there’s a lot of documentation available, but a dizzying API surface area. LLMs can help narrow it down to the functionality I need very quickly.
IMO programmers who think that AI cannot help them aren’t being creative enough in how they use them. I don’t use ChatGPT to write whole programs for me, I use it for things like getting implementation details of third party libraries.
Yes, but vice versa, I think for most programmers it’s not even a 10% improvement in productivity. It’s an occasional two hour task cut down to 10 minutes of back and forth with the bot.
…followed by 90 minutes of going out to confirm what the bot said.
What makes it good for CSS is that you can instantly see that it’s completely full of crap and not working at all. For tasks without clear testing conditions, it’s very dangerous, e.g. the insecure POSTing on Github’s Copilot demo page.
I’ve found it really variable and I can easily see people considering it a complete game changer or a total waste of time, depending on where their day-to-day work falls on the spectrum of things I’ve tried.
For knocking together some JavaScript to do something that’s well understood (and probably possible to distill from a hundred mostly correct StackOverflow answers), it’s been great. And, as someone who rarely writes JavaScript, a great way to find how some APIs have changed since I last looked. Using a LLM here let me do things in 10 minutes that would probably have taken a couple of hours without. If you are working in a space where a lot of other people live but you typically don’t, especially if you jump between such spaces a lot and so don’t have the time to build up depth of expertise, it’s a great tool for turning breadth of experience into depth on demand.
I tried it for some things in pgfplots, a fairly popular LaTeX package (and therefore part of a niche ecosystem). It consistently gave me wrong answers. Some were close to right and I could figure out how to do what I wanted from them, a few were right, and a lot were very plausible-looking nonsense). For fairness, I used DuckDuckGo to try to find the answer while it was generating the response. In almost all cases, I was about the same speed with or without the LLM if I was able to solve the problem. For some things I was unable to solve it at all (for example, I had a table column in bytes and I wanted to present those numbers with the base-2 SI prefix - Ki, Mi, and so on - and I completely failed). I probably wasted more time with plausible-but-wrong answers here than I gained overall because I spent ages try to make them work where I’d probably have just given up without the LLM. If you’re doing something where there’s a small amount of data in the training sets then you might be lucky or you might not. I can imagine a 10% or so improvement if the LLM is fast.
I’ve also tried using it to help with systems programming tasks and found that it routinely introduces appalling security holes of the kind I’d expect in example code (which routinely omits error handling and, particularly, the kind of error handling that’s only necessary in the presence of an active attacker). Here, I spent far more time auditing the generated code than I’d have spent writing it from scratch. This is the most dangerous case because, often, the code it generated was correct when given valid input and so non-adversarial testing would have passed. Writing adversarial tests and then seeing that they failed and tracking down the source of the bugs was a huge pain. In this scenario, it’s like working with an intern or a student, something that you would never do to be more productive, but to make them more productive in the longer term. As such, the LLM was a significant productivity drain.
I find that llms really shine when you give them all the context needed to do their task, and rely on some „grammatical“ understanding they learned. Relying on their training corpus somehow being qualitatively good enough to generate good code is a crapshoot and indeed a proper timeline. But, asking it to rewrite the one written out unit test to test 8 more edge cases I specify? Spot on. Ask it to transform the terraform to use an iterative and a variable instead of the hardcoded subnets? Right there. I like writing the first version, or designing the dsl that can then be transformed by the llm. You don’t see many of these approaches around, but that’s where all the stochastical underpinnings really work. Think of it as human language driven dsl refactoring. Because it’s output will be quite self consistent, it will often be „better“ than what I would do because my stamina is only so large.
I do use llms to generate snippets of code and have a pretty good flair for „ok this probably doesn’t exist“, but even then, I do get proper test scaffolding and maybe a hint of where to look in the manual, or even better, what api I actually should implement. It’s a weird thing to explain without showing it. I was very skeptical of using llms to learn something (in this case, eMacs and eMacs lisp) where I don’t know much and I knew the training corpus would be haphazard, but it turned out to be the most fun k had in a long time.
I think honestly it would sell me if it instead of trying to give me the answer, it would provide me links to various sources that should help me out.
“Maybe you should check out pages 10-15 of this paper.” or “This article seems to achieve part of your goal [x], and this one shows how to bring them together [y]”
The problem is it assumes it can give me an answer better than the original source, and while sometimes that’s true, its often not.
I’m sure I could learn to prompt it in a way that would give me these types of answers though..
This is typical of a certain take by experienced programmers where they give the model some kind gotcha „you have to get it right“ task, approach it with no experimental framework (say, trying different approaches, being cognizant of the model used, of the context size, of prompting techniques) and call their afternoon 10 minute one off an actual informed take.
What llms do brilliantly is transform one form of language into another, boilerplate and dullness and rote regurgitation of patterns included. And that’s most of my day job, not opining about loop invariants and binary searches. What’s transformative is what this capability enables. I don’t mind correcting some slightly off implementation of an api or adding a few unit tests by hand. What I mind is spending 4 days wading through a spec and transforming that into a clean typescript api, and then writing a mock server, and then a cli tool and a clean spec. An llm does that better than I would in the time it took me to make my cup of coffee.
This in turn means that I can actually spend those 4 days thinking about the problem I am trying to solve (or if it is even worth being solved)
I would say it’s useful enough to be a tool in my toolbox, but looking over my past interactions with ChatGPT 3.5, it misses the mark and wastes my time pretty frequently. Here are my recent uses of it:
So, in summary: it’s good at cranking out well known code snippets and converting from one language to another. It’s really, really bad at CSS.
With gpt3.5 pretty much 95% of the time there’ll be something wrong with the code or the transformation I want it to do. With got4 I’ll surprisingly enough get decently working small prototypes, and pretty much on point transformations for most prompts. With some prompt engineering, I can get 3.5 to do specific transformations with a good chance of success. I design every tool around having the user not just in the loop, but at the center of the interaction, do having „wrong“ things come out is not usually a problem.
Same here. I have talked to people that are extremely skeptical of using LLMs for coding, but they expect to give a simple prompt and get a full application going.
I use ChatGPT daily and I can get it to produce what I need; I don’t expect it to be intelligent. I just use it to manipulate text at a very high level.
I’m trying to make this catch on: “It’s a kaleidoscope for text.”
One thing I really want to use LLMs for if my work ever OK’s use on our code base is generating documentation comments, especially the YARD/JavaDoc/PyDoc kind.
I also find it useful in situations where there’s a lot of documentation available, but a dizzying API surface area. LLMs can help narrow it down to the functionality I need very quickly.