Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

This is a bit tricky, because at least in the U.S., I don't believe it's settled question in law yet. Some of the other posters on here have said that the resulting model isn't covered by GPL--that's partially true, but provenance of data, and the rights to it, definitely does matter. A good example of this was the Everalbum ruling, where the company was forced to delete both the data and the trained models used they were used to generate due to lack of consent from the users from whom the data was taken[1]. Since open source code is, well, open, it's definitely less a problem for permissively-licensed code.

That said, copyright is typically generally assigned to the closest human to the activation process (it's unlikely that Github is going to try to claim the copyright to code generated by Copilot over the human/company pair-programming with it), but since copyleft in general is a pretty domain-specific to software, afaik the way that courts interpret the legality of using code licensed under those terms in training data for a non-copyleft-producing model is still up in the air.

Obligatory IANAL, and also happy to adjust this info if someone has sources demonstrating updates on the current state.

[1] https://techcrunch.com/2021/01/12/ftc-settlement-with-ever-o...




> The case debates the legal right for Google to use copyrighted books in its training database in order to train its Google Book Search algorithm

That's not even remotely the same thing.


until the legal position is clear it you'd have to be insane to allow output from this process to be incorporated into your codebases

imagine if the output was ruled as being GPLv2, then having to go through a proprietary codebase trying to rip out these bits of code

it would be basically impossible




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: