I’ve been tinkering with copilot in some side projects and and for anything beyond simple stuff it is has proven not just bad, but dangerous. As a finite example, it seems to have been reinforced to use innerHTML basically for everything, even displaying simple text. Everything it generates needs to be incredibly thoroughly reviewed.
To me, the whole benefit of this tool was to help provide a sort of feedback loop to junior developers to help them think about different ways they could solve a problem. Instead you need a deep understanding of what suggestions are or are not appropriate, basically the exact opposite outcome.
I foresee a lot of auto-generated security issues if people start using this sort of software as a way to reduce cost of development
I’m not really sure what would be best for Copilot to do in this situation. I don’t think what it’s doing now is actually useful in practice, although it’s an impressive-looking demonstration.
A big problem that seems pervasive in machine learning systems is that they don’t try to detect when they’ve been given a problem that’s above their pay-grade so-to-speak; they just do the best they can, which given tough problems is often worse than just giving up and telling the human they’re stumped. Or at least signaling low confidence in their answer.
My gut tells me that this is a shallower problem than its pervasiveness would have you believe, but AI isn’t really my area, so I may be wrong.
When I first learned about computers, there was an abbreviation in common use: GIGO. Garbage in, garbage out. This is especially true for any ML system. The output quality depends hugely on the training set. This, unfortunately, composes very badly with Sturgeon’s Law.
Determining whether a problem is ‘above their pay grade’ is a difficult problem because it requires the system to estimate the difficulty of the task. Training a machine-learning system to understand this requires having a data set that is annotated with the difficulty of tasks. You might be able to infer it from something else (for example, the number of commits to a particular bit of code with commit messages labelled some variant of bug fix, or the amount of time people spend in VS Code reading a particular bit of code, on average) but in general it’s very hard. To do it without ML requires being able to articulate a set of rules that define the difference between easy and difficult programming problems and if you could do that then you could probably produce a program synthesis tool that works a lot better than Copilot.
After driving with a GPS that gets confused between left and right, I’ve lost my ability to feel confident in where I’m going. So I decided to pull out my vim setup to try and regain that confidence.
More seriously, I’ve found one of the easiest solution to get some of that semblance of navigation back is to configure your application/device to always point north. It makes you more aware of your absolute position and alternate routes that will take you to the same spot.
Keep the high-end IDE in the toolbox, even if it isn’t your daily driver. The code generation and refactoring tools in Intelij let you perform miracles on the odd occasion they are needed. Honestly most people who use IDEs for everything don’t know how to use those tools.
The fact that Copilot produces code with subtle bugs, such as using floats to represent currency, makes it a complete non-starter. This makes it strictly worse than using StackOverflow where at least actual people will review solutions. It seems like auditing the code it craps out will often require as much effort as writing the code from scratch. I’m also willing to be that it will be predominantly be used by people who are least likely to spot the problems in code that it produces.
The only situations where I can see it being actually useful would be artistic projects where there’s no objectively correct solution.
I think Copilot will fail as a useful tool. It will teach us, though, what we’ve already known: that the code artifact itself is not particularly meaningful and that the truly meaningful artifacts exist somewhere between the code, our own minds, and the minds of others on our teams.
Code is biased toward execution, so we can sometimes document our intentions and semantic desires as tests. Sometimes we pull away from execution and write long-form specifications or acceptance criteria. The truly mad write formal specifications. These artifacts are, at times, closer to what we want but it’s my experience that they’re all too far still from what’s actually meaningful.
Without being able to really write down what we want then the best Copilot can do is try to guess. It’ll probably be surprisingly good at getting in the ballpark whenever we ask for something that’s fairly common in fairly common language. The closer our desires are to the zeitgeist, the better Copilot can behave.
But this is silly. The most successful thing in computer science is the actual reuse of common ideas. I’d rather take a dependency than an additional 100 lines of code. Copilot delivering me something that rhymes heavily with a dependency is worse than useless.
So the last bastion of hope is that I can write a loose, barely even substantial enough to be wrong “specification” somewhere between a function name, some documentation, and maybe a few initial lines of code and Copilot can take the average between that and the closest thing it’s seen in public code and deliver to me something useful. This feels impossible. At least, without dialogue.
I’ll be impressed when you can converse with Copilot, go back and forth, hammer out an idea in specification, test, code, formal spec, types, etc. That iterative refinement process is really nice. I know this because we already have it in languages like Coq and Agda. Copilot takes a tremendously different technical approach, but I think it’d be wise to converge in that direction.
Or, if not, I’d rather it just be a smart way to search open source code for a library I can import.
Last up, unless it gets into dialogue mode quickly I think it’s going to have a negative impact on learning to code and being a junior developer. I don’t want to say I’d go as far as banning it from my team, if I even could, but I’d suspect that the student who uses Copilot may learn and grow much more slowly.
To be clear, my prediction is that it can’t be like a calculator was for math, helping the user to focus on higher level details by ignoring the repetitive bits and thus accelerating learning. I don’t think the state of CS is yet to the point where we can separate the code from the higher level ideas. Comparatively speaking, arithmetic is an extremely simple programming language.
Copilot would be much more useful as a smart search tool that shows you how other people solved problem X than trying to synthesize its own solution to X. Use computer for what they’re good at (indexing massive amounts of data) and people for what they’re good at (having actual intelligence to synthesize things).
I’ve been tinkering with copilot in some side projects and and for anything beyond simple stuff it is has proven not just bad, but dangerous. As a finite example, it seems to have been reinforced to use
innerHTML
basically for everything, even displaying simple text. Everything it generates needs to be incredibly thoroughly reviewed.To me, the whole benefit of this tool was to help provide a sort of feedback loop to junior developers to help them think about different ways they could solve a problem. Instead you need a deep understanding of what suggestions are or are not appropriate, basically the exact opposite outcome.
I foresee a lot of auto-generated security issues if people start using this sort of software as a way to reduce cost of development
A big problem that seems pervasive in machine learning systems is that they don’t try to detect when they’ve been given a problem that’s above their pay-grade so-to-speak; they just do the best they can, which given tough problems is often worse than just giving up and telling the human they’re stumped. Or at least signaling low confidence in their answer.
My gut tells me that this is a shallower problem than its pervasiveness would have you believe, but AI isn’t really my area, so I may be wrong.
When I first learned about computers, there was an abbreviation in common use: GIGO. Garbage in, garbage out. This is especially true for any ML system. The output quality depends hugely on the training set. This, unfortunately, composes very badly with Sturgeon’s Law.
Determining whether a problem is ‘above their pay grade’ is a difficult problem because it requires the system to estimate the difficulty of the task. Training a machine-learning system to understand this requires having a data set that is annotated with the difficulty of tasks. You might be able to infer it from something else (for example, the number of commits to a particular bit of code with commit messages labelled some variant of bug fix, or the amount of time people spend in VS Code reading a particular bit of code, on average) but in general it’s very hard. To do it without ML requires being able to articulate a set of rules that define the difference between easy and difficult programming problems and if you could do that then you could probably produce a program synthesis tool that works a lot better than Copilot.
After driving with GPS, I have quickly degraded my natural ability to navigate. This is when I have decided to switch from an intelligent IDE to vim.
After driving with a GPS that gets confused between left and right, I’ve lost my ability to feel confident in where I’m going. So I decided to pull out my vim setup to try and regain that confidence.
More seriously, I’ve found one of the easiest solution to get some of that semblance of navigation back is to configure your application/device to always point north. It makes you more aware of your absolute position and alternate routes that will take you to the same spot.
I wonder what the IDE analogy is for this.
Keep the high-end IDE in the toolbox, even if it isn’t your daily driver. The code generation and refactoring tools in Intelij let you perform miracles on the odd occasion they are needed. Honestly most people who use IDEs for everything don’t know how to use those tools.
The fact that Copilot produces code with subtle bugs, such as using floats to represent currency, makes it a complete non-starter. This makes it strictly worse than using StackOverflow where at least actual people will review solutions. It seems like auditing the code it craps out will often require as much effort as writing the code from scratch. I’m also willing to be that it will be predominantly be used by people who are least likely to spot the problems in code that it produces.
The only situations where I can see it being actually useful would be artistic projects where there’s no objectively correct solution.
I think Copilot will fail as a useful tool. It will teach us, though, what we’ve already known: that the code artifact itself is not particularly meaningful and that the truly meaningful artifacts exist somewhere between the code, our own minds, and the minds of others on our teams.
Code is biased toward execution, so we can sometimes document our intentions and semantic desires as tests. Sometimes we pull away from execution and write long-form specifications or acceptance criteria. The truly mad write formal specifications. These artifacts are, at times, closer to what we want but it’s my experience that they’re all too far still from what’s actually meaningful.
Without being able to really write down what we want then the best Copilot can do is try to guess. It’ll probably be surprisingly good at getting in the ballpark whenever we ask for something that’s fairly common in fairly common language. The closer our desires are to the zeitgeist, the better Copilot can behave.
But this is silly. The most successful thing in computer science is the actual reuse of common ideas. I’d rather take a dependency than an additional 100 lines of code. Copilot delivering me something that rhymes heavily with a dependency is worse than useless.
So the last bastion of hope is that I can write a loose, barely even substantial enough to be wrong “specification” somewhere between a function name, some documentation, and maybe a few initial lines of code and Copilot can take the average between that and the closest thing it’s seen in public code and deliver to me something useful. This feels impossible. At least, without dialogue.
I’ll be impressed when you can converse with Copilot, go back and forth, hammer out an idea in specification, test, code, formal spec, types, etc. That iterative refinement process is really nice. I know this because we already have it in languages like Coq and Agda. Copilot takes a tremendously different technical approach, but I think it’d be wise to converge in that direction.
Or, if not, I’d rather it just be a smart way to search open source code for a library I can import.
Last up, unless it gets into dialogue mode quickly I think it’s going to have a negative impact on learning to code and being a junior developer. I don’t want to say I’d go as far as banning it from my team, if I even could, but I’d suspect that the student who uses Copilot may learn and grow much more slowly.
To be clear, my prediction is that it can’t be like a calculator was for math, helping the user to focus on higher level details by ignoring the repetitive bits and thus accelerating learning. I don’t think the state of CS is yet to the point where we can separate the code from the higher level ideas. Comparatively speaking, arithmetic is an extremely simple programming language.
Copilot would be much more useful as a smart search tool that shows you how other people solved problem X than trying to synthesize its own solution to X. Use computer for what they’re good at (indexing massive amounts of data) and people for what they’re good at (having actual intelligence to synthesize things).
It’s not an alternative to human brain