> If you had a hermetically sealed code base that just happened to coincide line for line with the codebase for GCC, it would still be a copy.
That's not what the law says [1]. If two people happen to independently create the same thing they each have their own copyright.
If it's highly improbable that two works are independent (eg. the gcc code base), the first author would probably go to court claiming copying, but their case would still fail if the second author could show that their work was independent, no matter how improbable.
It is true that if two people happen to independently create the same thing, they each have their own copyright.
It is also true that in all the cases that I know about where that has occurred the courts have taken a very, very, very close look at the situation and taken extensive evidence to convince the court that there really wasn't any copying. It was anything but a "get out of jail free" card; it in fact was difficult and expensive, in proportion to the size of the works under question, to prove to the court's satisfaction that the two things really were independent. Moreover, in all the cases I know about, they weren't actually identical, just, really really close.
No rational court could possibly ever come to that conclusion if someone claimed a line-by-line copy of gcc was written by them, they must have independently come up with it. The probably of that is one out of ten to the "doesn't even remotely fit in this universe so forget about it". The bar to overcoming that is simply impossibly high, unlike two songs that happen to have similar harmonies and melodies, given the exponentially more constrained space of "simple song" as compared to a compiler suite.
All of this is moot for the purposes of LLM, because it's almost certain that the LLMs were trained on the code base, and therefore is "tainted". You can't do this with humans either. Clean room design requires separate people for the spec/implementation.
That's the "but their case would still fail if the second author could show that their work was independent, no matter how improbable" part of the post you're responding to.
One out of ten to the power of "forget about it" is not improbable, it's impossible.
I know it's a popular misconception that "impossible" = a strict, statistical, mathematical 0, but if you try to use that in real life it turns out to be pretty useless. It also tends to bother people that there isn't a bright shining line between "possible" and "impossible" like there is between "0 and strictly not 0", but all you can really do is deal with it. Where ever the line is, this is literally millions of orders of magnitude on the wrong side of it. Not a factor of millions, a factor of ten to the millions. It's not possible to "accidentally" duplicate a work of that size.
Thank you for providing a reference! I certainly admit that "very similar photographs are not copies" as the reference states. And certainly physical copying qualifies as copying in the sense of copyright. However I still think copying can happen even if you never have access to a copy.
I suppose a different way of stating my position is that some activities that don't look like copying are in fact copying. For instance it would not be required to find a literal copy of the GCC codebase inside of the LLM somehow, in order for the produced work to be a copy. Likewise if I specify that "Harry Potter and the Philosopher's Stone is the text file with hash 165hdm655g7wps576n3mra3880v2yzc5hh5cif1x9mckm2xaf5g4" and then someone else uses a computer to brute force find a hash collision, I suspect this would still be considered a copy.
I think there is a substantial risk that the automatic translation done in this case is, at least in part, copying in the above sense.
I fully agree with you. (A small information theory nit pick with your example. The hash and program would have to be at least as long as a perfectly compressed copy of Harry Potter and the Philosopher's Stone. If not you've just invented a better compressor and are in the running for a Hutter Prize[1]! A hash and "decomporessor" of the required length would likely be considered to embody the work.)
It's an interesting case. As I understand it, there is an ongoing debate within the AI research community as to whether neural nets are encoding verbatim blocks of information or creating a model which captures the "essence" or "ideas" behind a work. If they are capturing ideas, which are not copyrightable, it would suggest that LLMs can be used to "launder" copyright. In this case, I get the feeling that, for legal clarity, we would both say that the work in question (or works derived from it) should not be part of the training set or prompt, emulating a clean room implementation by a human. (Is that a fair comment?)
I've no direct experience here, but I would come down on the side of "LLMs are encoding (copyrightable) verbatim text", because others are reporting that LLMs do regurgitate word-for-word chunks of text. Is this always the case though? Do different AI architectures, or models that are less well fitted, encode ideas rather than quotes?
Edit: It would be an interesting experiment to use two LLMs to emulate a clean room implementation. The first is instructed to "produce a description of this program". The second, having never seen the program, in its prompt or training set, would be prompted to "produce a program based on this description". A human could vet the description produced by the first LLM for cleanliness. Surely someone has tried this, though it might be a challenge to get an LLM that is guaranteed not to have been exposed to a particular code base or its derivatives?
Because FIDO2 is not enough for non-tech-savvy people.
The main issue is potential confusion about what transaction they’re actually signing. For example, a malicious browser extension can pretend the site sends money to X while actually sending it to Y.
The European PSD2 directive mandates that the 2FA scheme must let the user see what they’re about to sign. At the very least, that includes the amount and part of the recipient’s IBAN. FIDO2 doesn’t have that.
It’s the reason I own a device that looks like this [0]. Without it, I wouldn’t be able to transfer money at all due to the lack of banking apps that work on Linux phones.
In this case, wouldn't FIDO2 only be used to log into the bank's website, not to sign individual transactions? (Corresponding to Mode2 in the Wikipedia article you provided?) Would this "mode2" only usage be allowed under European law, given that there is no transaction involving an amount of money taking place?
Banks used to give us those RSA tokens in the past for securely logging in to the web UI, but then discovered they can cut down on cost since everyone has two brands of smartphones.
No doubt. At least with FIDO2, people can provide their own hardware key, and get real security rather than a rolling number generated by a compromised algorithm [1].
Without a vaccination, it killed 12.9% of people who were infected, killing mostly older people and people who had multiple pathologies (eg. hypertension).
That’s 12.9% of hospital inpatients. All estimates I’ve seen for infection fatality rate — that is, mortality rate among all those infected — place it around 1–2%
It doesn’t kill 13% of people infected, only about 1%. Just look at the number of cases reported compared to the number of deaths. That paper was reporting 13% mortality rate among those admitted to the hospital, not among all those infected.
Don't conflate the Internet with Social Media. Social media is a service, just like FTP. The death of social media will not mean the death of the Internet. There's an argument that reducing social media use, by age verification or other means, will lead to a more free Internet due to reduced power of gatekeepers.
The performance of a human is inherently limited by biology, and the road rules are written with this in mind. Machines don't have this inherent limitation, so the rules for machines should be much stronger.
I think there is an argument for incentivising the technology to be pushed to its absolute limits by making the machine 100% liable. It's not to say the accident rate has to be zero in practice, but it has to be so low that any remaining accidents can be economically covered by insurance.
At least in the interim, wouldn’t doing what you propose cause more deaths if robot drivers are less harmful than humans, but the rules require stronger than that? (I can see the point in making rules stronger as better options become available, but by that logic, shouldn't we already be moving towards requiring robots and outlawing human drivers if it's safer?)
This thread makes me realise that the old Telequipment D61 Cathode Ray Oscilloscope I have is worth hanging on to. It's basically a CRT with signal conditioning on its inputs, including a "Z mod" input, making it easy to do cool stuff with it.
That's not what the law says [1]. If two people happen to independently create the same thing they each have their own copyright.
If it's highly improbable that two works are independent (eg. the gcc code base), the first author would probably go to court claiming copying, but their case would still fail if the second author could show that their work was independent, no matter how improbable.
[1] https://lawhandbook.sa.gov.au/ch11s13.php?lscsa_prod%5Bpage%...
reply