Something I’ve been wondering about. If you were to use comments from one of the many MS Windows source code leaks to prompt copilot to generate the related code, could you interpret that as MS making that code available to you to use under whatever license you choose (or whichever license copilot allows for)? Seeing as it’s both their product, and their source code…?
EDIT: What I’m saying is that, if we’re concerned about Open Source code being “laundered” by this method, there’s certainly plenty of closed source software that has been leaked that could be “laundered” equally as well. What we need is an open source project with sufficient backing and funding (and therein lies the biggest catch), to integrate code generated by copilot that corresponds directly to proprietary—but leaked—code into their code base, and wait for the legal action.
Note for anyone writing articles like this: Framing it as a copyleft problem is probably unhelpful. Almost all F/OSS licenses have an attribution clause. Even without such a clause, fair use / fair dealings often require attribution. In jurisdictions that have a notion of ‘moral rights’, you may find that you have a legal obligation to attribute quoted work, even if you have the right to perform the quoting. I suspect that this whole field is going to be fun for copyright lawyers for years to come.
That said, my biggest concern with something like Copilot is patent liability. There is code on GitHub that implements patents and comes with an explicit patent grant covering derived works of the open-source code. If the legal argument is that the output of Copilot is not a derived work, then it does not get the patent license. If it is a derived work, then you must abide by the terms of the copyright license.
This is quite difficult to test because a lot of projects have patent grants that aren’t explicit about the set of patents that they cover. The copyright license (or part of the license) covers all of the code in the repo, but the patent grant applies only to parts that happen to infringe patents owned by a contributor to the project. It’s fairly easy, given a snippet of code to find the GitHub project that it came from and the copyright license covering it (determining whether it meets the bar for originality to merit copyright protection at all is harder). It’s much harder to determine if it infringes a patent.
Note that, unlike copyright, patents cover the code even if it is a completely independent invention. I suspect that something like Copilot will make it easier to accidentally infringe patents: there are an infinite number of ways of solving any given problem, a finite number of sensible ones, and a smaller number of patented ones. If your ML model is trained on a set that includes patented techniques, it’s more likely to generate a patented implementation. The really interesting test case will be whether this meets the bar for wilful infringement. In the US, damages are much higher if you knew that you were infringing a patent. If the code is word-for-word identical to published patented code, it’s probably difficult to prove to the satisfaction of a non-technical jury that you didn’t know about the patent. ‘A magical machine helped me write code and it happened to generate identical code to this source file that the patent owner released’ will probably be interpreted much the same way as ‘the devil made me do it’.
If I were a patent troll, I’d start creating GitHub projects with very well-commented implementations of patents that I owned and a source-visible (not F/OSS) license, wait for Copilot to start inserting them into other people’s projects and then start filing lawsuits.
I don’t think an elaborate plan is needed. If project B is copying enough code from project A, either via a human or via an AI, it should be apparent that the code has been copied, given a timeline of events.
I wonder what Stallmann would say about liberating all of the code.
Personally I don’t think anyone can stop Microsoft with this now. It’s not an exact parallel, but it’s somewhat similar to what Google did when they scanned all the books. That was a clear, direct copyright breach, if I’m not mistaken, and they got away with it. Microsoft possibly does somehow less of a breach with Copilot.
In any case, it appears that stealing all the nickers really does lead to profit.
Yes, I did not write that clear enough. What I wanted to say is that looked like a clear and direct copyright breach - you take a copyrighted work, and you copy it straight up. Copilot is not even doing a copy.
My point was, it looks to me like this is going to pass without any consequences for Microsoft.
Copyright law, like patent law, is being grossly overblown today. It has long since ceased to have anything to do with the original idea of providing authors and composers with a livelihood. Today, it primarily stands in the way of the dissemination of works and the public presentation of most artists (besides the few super famous). I myself was never a member of a collecting society for this reason and distribute e.g. all my music under Creative Commons. I get more from people distributing and listening to the music than from the few paltry dollars that would barely pay for a week’s vacation. Books are also becoming more popular because of Google.
It probably did, but the target audience is relevant here, I think. For example, I don’t think it cost them a lot of credibility with me personally, they didn’t have any.
I don’t mean to say I don’t believe in “the new Microsoft”. I do, they’re changing. But the fact that they’re changing the way they are operating, doesn’t mean that they’re changing their goals, ideals, morals. Yes, adjusting to times, but the endgoal is still profit, and allowed means still include extermination.
I also don’t want to say that I predicted that they’ll be stealing code when they bought GitHub. I had a few thoughts about how things might go wrong, but I didn’t have the time to think about it a lot (nor do I now, aside from casual comments).
The only thing I’m trying to say, that this behavior does not surprise me, and for that company to have credibility, they had to have had any in the first place.
But as I’ve started, I’m not the target audience anyway. Millions of dot-net, Java, React/Angular developers - even experienced people but without clear biases from the past decades that I’m holding - they are. The governments that MS bribes their way into are. The corporations are. And those, I don’t think they mind that much. I don’t think it’s going to cost Microsoft a lot of credibility. They will just move on.
Millions of dot-net, Java, React/Angular developers - even experienced people but without clear biases from the past decades that I’m holding - they are.
I publish a lot of open source code and I have absolutely no problem if it is used this way. Every day I benefit from the fact that millions of texts are just so accessible on the Internet and searchable via Google. It is absolutely ok if society also profits from my contributions, even if I would not use a service like Copilot.
Something I’ve been wondering about. If you were to use comments from one of the many MS Windows source code leaks to prompt copilot to generate the related code, could you interpret that as MS making that code available to you to use under whatever license you choose (or whichever license copilot allows for)? Seeing as it’s both their product, and their source code…?
EDIT: What I’m saying is that, if we’re concerned about Open Source code being “laundered” by this method, there’s certainly plenty of closed source software that has been leaked that could be “laundered” equally as well. What we need is an open source project with sufficient backing and funding (and therein lies the biggest catch), to integrate code generated by copilot that corresponds directly to proprietary—but leaked—code into their code base, and wait for the legal action.
This might be particularly interesting for ReactOS and Wine.
that is an interesting idea! I really hope someone with the resources to do this does. The OSS community really needs to have more legal protections.
Note for anyone writing articles like this: Framing it as a copyleft problem is probably unhelpful. Almost all F/OSS licenses have an attribution clause. Even without such a clause, fair use / fair dealings often require attribution. In jurisdictions that have a notion of ‘moral rights’, you may find that you have a legal obligation to attribute quoted work, even if you have the right to perform the quoting. I suspect that this whole field is going to be fun for copyright lawyers for years to come.
That said, my biggest concern with something like Copilot is patent liability. There is code on GitHub that implements patents and comes with an explicit patent grant covering derived works of the open-source code. If the legal argument is that the output of Copilot is not a derived work, then it does not get the patent license. If it is a derived work, then you must abide by the terms of the copyright license.
This is quite difficult to test because a lot of projects have patent grants that aren’t explicit about the set of patents that they cover. The copyright license (or part of the license) covers all of the code in the repo, but the patent grant applies only to parts that happen to infringe patents owned by a contributor to the project. It’s fairly easy, given a snippet of code to find the GitHub project that it came from and the copyright license covering it (determining whether it meets the bar for originality to merit copyright protection at all is harder). It’s much harder to determine if it infringes a patent.
Note that, unlike copyright, patents cover the code even if it is a completely independent invention. I suspect that something like Copilot will make it easier to accidentally infringe patents: there are an infinite number of ways of solving any given problem, a finite number of sensible ones, and a smaller number of patented ones. If your ML model is trained on a set that includes patented techniques, it’s more likely to generate a patented implementation. The really interesting test case will be whether this meets the bar for wilful infringement. In the US, damages are much higher if you knew that you were infringing a patent. If the code is word-for-word identical to published patented code, it’s probably difficult to prove to the satisfaction of a non-technical jury that you didn’t know about the patent. ‘A magical machine helped me write code and it happened to generate identical code to this source file that the patent owner released’ will probably be interpreted much the same way as ‘the devil made me do it’.
If I were a patent troll, I’d start creating GitHub projects with very well-commented implementations of patents that I owned and a source-visible (not F/OSS) license, wait for Copilot to start inserting them into other people’s projects and then start filing lawsuits.
I don’t think an elaborate plan is needed. If project B is copying enough code from project A, either via a human or via an AI, it should be apparent that the code has been copied, given a timeline of events.
I wonder what Stallmann would say about liberating all of the code.
Personally I don’t think anyone can stop Microsoft with this now. It’s not an exact parallel, but it’s somewhat similar to what Google did when they scanned all the books. That was a clear, direct copyright breach, if I’m not mistaken, and they got away with it. Microsoft possibly does somehow less of a breach with Copilot.
In any case, it appears that stealing all the nickers really does lead to profit.
No, it was fair use, as ruled by the US Supreme Court, see http://www.supremecourt.gov/orders/courtorders/041816zor_2co3.pdf and https://web.archive.org/web/20151023043336/http://www.ca2.uscourts.gov/decisions/isysquery/ba7a8b55-1f21-4e93-b3e0-e12001eb6193/1/doc/13-4829_opn.pdf. Also what Github does with Copilot is fair use as far as IP lawyers have publicly commented. If it’s fair use, it’s not stealing by definition.
Yes, I did not write that clear enough. What I wanted to say is that looked like a clear and direct copyright breach - you take a copyrighted work, and you copy it straight up. Copilot is not even doing a copy.
My point was, it looks to me like this is going to pass without any consequences for Microsoft.
Copyright law, like patent law, is being grossly overblown today. It has long since ceased to have anything to do with the original idea of providing authors and composers with a livelihood. Today, it primarily stands in the way of the dissemination of works and the public presentation of most artists (besides the few super famous). I myself was never a member of a collecting society for this reason and distribute e.g. all my music under Creative Commons. I get more from people distributing and listening to the music than from the few paltry dollars that would barely pay for a week’s vacation. Books are also becoming more popular because of Google.
If Microsoft wants to take a stance against copyright, they’re free to release all their source code.
Without any legal consequences, sure. but this has cost them credibility.
It probably did, but the target audience is relevant here, I think. For example, I don’t think it cost them a lot of credibility with me personally, they didn’t have any.
I don’t mean to say I don’t believe in “the new Microsoft”. I do, they’re changing. But the fact that they’re changing the way they are operating, doesn’t mean that they’re changing their goals, ideals, morals. Yes, adjusting to times, but the endgoal is still profit, and allowed means still include extermination.
I also don’t want to say that I predicted that they’ll be stealing code when they bought GitHub. I had a few thoughts about how things might go wrong, but I didn’t have the time to think about it a lot (nor do I now, aside from casual comments).
The only thing I’m trying to say, that this behavior does not surprise me, and for that company to have credibility, they had to have had any in the first place.
But as I’ve started, I’m not the target audience anyway. Millions of dot-net, Java, React/Angular developers - even experienced people but without clear biases from the past decades that I’m holding - they are. The governments that MS bribes their way into are. The corporations are. And those, I don’t think they mind that much. I don’t think it’s going to cost Microsoft a lot of credibility. They will just move on.
I publish a lot of open source code and I have absolutely no problem if it is used this way. Every day I benefit from the fact that millions of texts are just so accessible on the Internet and searchable via Google. It is absolutely ok if society also profits from my contributions, even if I would not use a service like Copilot.
My point exactly, I don’t think a lot of people mind this, and therefore Microsoft didn’t lose much credibility with Copilot.
Has it? What makes you say that?