IANAL but searched a lot on this, this is very tricky subject legally.
To simplify:
- imagine all code Copilot trained on is GPL licensed.
- we have a universal function `isInfringing(code)` that has access to all GPL code, and returns `true` if it is infringing some GPL code.
for a given prompt; if `isInfringing(copilot(prompt))==false` we cannot claim copilot infringing on GPL code, even it is trained on GPLed code.
so the problem starts here; does the piece of code copilot emits, if written by yourself also would be infringing ?
> so the problem starts here; does the piece of code copilot emits, if written by yourself also would be infringing ?
why everyone on discussions tries to bring "if a human made it"? a generative AI operates way faster than anyone ever existed and ever will and probably a person aware of the license & acting respectful towards it, will create something more sensible/plausible to avoid plagiarism
now having dozen/hundreds/thousands of humans substituted by a machine that makes money for some for-profit company is really fair? even if they were a non-profit, as someone pointed up, people who create the content that feeds the weights aren't recieving a penny! they already made money with it, they will make more & that is/will upgrading/e the state of gen. AI
for sure legal battles on people copying code from permissive licenses should exist but it's feels a different discussion
because discussion is around 'legal' and laws only apply to humans. On ethical side of the discussion, I tend to agree with you. But it is also complicated subject; 'fair' in general is complicated, all this, GPL/AGPL stuff born out of this subject. Hosting GPL code as SaaS is legal but not 'fair' for example.
From my understanding of a blog post by GitHub last year, they are planning to launch a tool to find similar code to what emitted by CoPilot, implying that CoPilot does not mix multiple sources for a single function, but derives a code block it found with a similar functionality (or maybe bigger blocks with similar functionality, IDK).
If CoPilot indeed derives a function (or a functional block) from a single source, it might plainly violate the license of the repository where it derives the code from.
There are many questions, and nothing is clear cut. The only thing I know is, I will never use that thing.
What about BSL, SSPL, or other source available (for your eyes only) licenses? Copilot harvests all public repos, regardless of its license.