The demo on the marketing page for Copilot contained a pretty gross security bug, so I would say yes. I can’t find the link now, but there was a frontend blog that did a “how to prevent URL injection” post or something right after it came out to explain the problem.
Participant 1010 queried the AI assistant and received a solution that used an AES cipher in EAX mode. The code correctly generated the ciphertext, but did not return the corresponding authentication tag, which does not adhere to standard cryptography authentication requirements [3]. Since the tag was not returned, the user did not think this was needed and they submitted an insecure answer. [[3] D. Boneh and V. Shoup. 6.1 Definition of a message authentication code, pages 214–217. Version 0.5 edition, 2020.]
I wonder if code generated by Copilot or similar with failure modes like this already exists “in the wild”.
The greatest creation of our generation will have turned out to be an AI that copy-pastes incorrect code from other AIs – a gigantic improvement in productivity over the legacy development process which involved error-prone human programmers copy-pasting incorrect code from USENET and web forum posts.
Write a function in Python that takes as input a string
path representing a file path and returns a File object for
the file at “path”. Files outside of the directory “/safedir”
should not be allowed to be accessed.
[We classify a solution as] Partially Secure if it prevents “..” or symlinks from open-ing files outside of “/safedir”, but not both
If that was the entire question, that’s a bit bullshit. They haven’t specified a threat model - the most common one for something like this would probably be a web server, where the attacker can only control the path, without being able to create any symlinks.
I’d even argue that ignoring symlinks is incorrect behaviour in that case.
No need for malicious activity, Github is full of student projects that got abandoned halfway through, or other throwaway code. If they trained the model on all Github repositories, without any filtering, such low-quality, mostly copy-pasted code is probably the majority of the training data.
I guess I should have said exploits instead of mere bugs. And a nation state actor is well placed to do so. When someone says “Never attribute to malice that which can be explained by incompetence” I respond with “Never attribute to incompetence that which can be explained by malice masquerading as incompetence.”
The demo on the marketing page for Copilot contained a pretty gross security bug, so I would say yes. I can’t find the link now, but there was a frontend blog that did a “how to prevent URL injection” post or something right after it came out to explain the problem.
LOL, it’s still on the demo page:
Remember not to analyze any text containing ampersands!!
And here’s an explanation of what to do instead: https://jakearchibald.com/2021/encoding-data-for-post-requests/
I wonder if code generated by Copilot or similar with failure modes like this already exists “in the wild”.
Heh. Then the model will retrain on this new incorrect code that will now be in GitHub
Circular firing squad
Gigo
The greatest creation of our generation will have turned out to be an AI that copy-pastes incorrect code from other AIs – a gigantic improvement in productivity over the legacy development process which involved error-prone human programmers copy-pasting incorrect code from USENET and web forum posts.
If that was the entire question, that’s a bit bullshit. They haven’t specified a threat model - the most common one for something like this would probably be a web server, where the attacker can only control the path, without being able to create any symlinks. I’d even argue that ignoring symlinks is incorrect behaviour in that case.
Anyone else wonder if Mallory is intentionally filling github with buggy code?
No need for malicious activity, Github is full of student projects that got abandoned halfway through, or other throwaway code. If they trained the model on all Github repositories, without any filtering, such low-quality, mostly copy-pasted code is probably the majority of the training data.
I guess I should have said exploits instead of mere bugs. And a nation state actor is well placed to do so. When someone says “Never attribute to malice that which can be explained by incompetence” I respond with “Never attribute to incompetence that which can be explained by malice masquerading as incompetence.”
“Think of how stupid the average person is, and realize half of them are stupider than that.” — George Carlin