Hacker News .hnnew | past | comments | ask | show | jobs | submit | jbotz's commentslogin

A translation of a book to a different language is a derivative work. So a translation of a computer program to a different programming language is also. But if in the translation of the book you start altering the plot and the personalities of that characters, does it at some point become not a derivative work? What point? IANAL, and I have no real idea, but I imagine that point has been probed significantly in case-law with respect to creative works. Given the current climate of ever-expanding scope of "intellectual property", if they admit that the LLM had access to git source code then I would say their case is weak at best.


The agents.md says “here’s the git source code” https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

This isn’t even a question of training data, thy fed the full git source code directly to the llm.


I would say it's worse, the whole C Git source code is checked in https://github.com/gitbutlerapp/grit/tree/main/git


I wonder if imitating clean room reverse engineering with two LLMs would be enough for licence compliance.


That already exists[1]. It looks like a joke but apparently they will accept your money to do it, which seems to cross the line of a joke.

[1]: https://malus.sh/


> translation.

It's not technically a translation, it's a re-implementation, with test suites acting as the destination. If it was a file by file translation your argument would have been valid.


Git is part of the LLM's training set though, so simply asking it to recreate git in another language is pretty equivalent. Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)


As mentioned in another comment, it's even more clear cut in this case. They actually put the original git sources in their project repo and instructed the agent to use it as the "source of truth".

Simple thought experiment. If you handed this same agents.md file (https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...) to a human software developer and let them work on exactly the same goal, would their output be considered a derivative work?


That's something I have been wondering. If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation. I don't see why this shouldn't apply to LLMs as well. If an LLM might have been trained on the original source code, it should be considered "tainted".


Yes, and realistically any code that LLMs produce is a derivative work of its training data. There's going to be a huge disaster licensing wise

I have absolutely no idea how LLMs got through anyone's legal departments, I guess the hope is that if everyone breaks the law enough, it'll just be fine


> if everyone breaks the law enough, it'll just be fine

That's pretty much what happened, isn't it? These concerns were all discussed in the beginning back in 2022, and I recall answers from many here on HN along the lines of "oh well, we can't stop it now or we'll risk falling behind China in AI development"

So yeah, the laws went out the window a long time ago the moment our government and the people decided to just look the other way willingly in the name of "progress."


> the hope is that if everyone breaks the law enough, it'll just be fine

Ever since the early 2010s when companies were started with the business idea "unlicensed hotels" and "unlicensed taxis" and made the owners really, really rich, this is said pretty much out loud. Look for words like "regulatory risks" and similar.

Maybe it started with the unlicensed gambling fad before that? That also made a lot of people filthy rich. Every time you have something under special license, or insuance requirements, then of course there is a margin for you if you can skimp on the license and hire gig workers instead.

The LLM situation with copyright and derived works in the 2020s is similar. Someone is likely to be rich, but there is a clear regulatory risk to it.


Problem is there's a lot more than a single repo in training data, the corpus is massive... Should the author of a blog post on cats also be compensated for simply being in the same training data as the git repo?


Honestly? Yes. This is why its such a problem that most of the training data was not used with permission, and without the correct copyright status or license associated with it

There's a lot of arguments about humans doing the same thing, but the reality is that humans and robots don't enjoy the same legal protection. Its clearly a derivative work of all of its training data


> Honestly? Yes.

Then it works both ways. Say I manage to generate essentially a ripoff of your copyrighted song, release it and make a ton of money, you now have to split that royalty with keyboard cat. And Joe bloggs. You'd end up fractions of pennies


> If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation.

That is the difference between necessary and sufficient. Clean-room is sufficient to guarantee avoiding copyright, but it is not necessary. The line legally is south of there, but that position was chosen because they didn’t want to crossing and it was easier to argue for legally in court.

tl;dr: clean room is overkill for avoiding copyright infringement


> Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)

Are you sure? LLMs are in some way a compressed version of their input but it's a pretty lossy compression (arguably this makes them more like a compression algorithm than a compressed version of the data). I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.


> I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.

Granted, these are some of the most widely spread texts, and not codebases, but just fyi: https://arxiv.org/pdf/2601.02671

> For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).


That paper is basically using the LLM as a compression algorithm: it's prompting with some section of the book and it's reprompting if it doesn't give the right output. Notably this only works if you already have a copy of the book in question!


Distributed a compressed copy of something is still copyright infringement


You misunderstand my point: the LLM is not a losslessly compressed version of the text: you need to supply additional information from the original in order to 'extract' it from the LLM (and from that point of view, the extra information would be the compressed form).


Wouldn't a re-implementation be akin to 'heres how it works, write the code' rather than 'heres the code, redo it in rust'?


Yes, but as soon as copyright became a problem for very rich people parts of it were cancelled.

1) re-implementation for compatibility (which was quickly "reestablished" through use of copyright-protecting encryption. In other words: do you get to write software that connects to MS/Apple/Google/Facebook servers without authorization from those companies? Yes. Do you get to copy an encryption key from their software to make it possible? No)

and, more recently,

2) violating copyright for LLM training

and, currently mostly attempted:

3) "uncopyrighting" run software through an LLM, and some people "believe" it comes out with your copyright on it! Because very rich people want to sell uncopyrighting.

Ie. the jury's still out what will happen when it's billionnaire vs billionnaire.

Of course, the question is what happens the second someone does this with a disney movie, or a big microsoft application ...


> Yes, but as soon as copyright became a problem for very rich people parts of it were cancelled.

When copyright law was established, not many poor people owned printing presses. That is to say, copyright law is a PROTECTION to the very rich, not an inconvenience


true but as the exception for model training (which can only be done by very, very rich people and organizations) shows, there's some new rich and they want new rules.

Against the will of the people, as evidenced by the court cases and protests online ...


Mathematically, does similarity/intelligibility of one equation to another have any bearing on whether the one was derived from the other? Philosophically? Legally? I'm not a copyright lawyer, but that's the crux of the matter to me: did you start with something, and iterate from it (even if it was so many times as to be transformed beyond recognition), or is it something more akin to clean-room reverse engineering?

Related, software API compability is not a derivate work, or eligible to protection, as ruled in the US and in the EU. Google, SAP R/3, etc. cases.

Or SCO Vs IBM.

If everything would be a derivate work we would not Linux.


Actually, no. We don't know anything about the "population" because there are no attempts here to look at what percentage of living people/children were affected. We only know that of those children who died, a lot of them had had these specific indications of these specific illnesses (and we can probably assume that in many cases those were the direct or indirect cause of death).

Respiratory infections (like tuberculosis) in a population that uses indoor open fires for cooking and heat aren't exactly a big surprise; breathing indoor smoke causes chronic respiratory inflammation, which in turn weakens the immune system and makes the lungs more susceptible to infection.


A rather more considered take on the posibility of LLM consciousness than Ted Chiang's recent "absolutely not", by someone who actually qualifies as an expert in the field.


I don't think LLMs are conscious. But of course to say that definitively you have to define consciousness, and then you quickly dig yourself into a deep hole, which is why I can't say anything but "meh" to someone who is so keen to go on the record to say "absolutely not".

Coincidentally I just read "Children of Memory", which was published in 2022 and I wonder if the advent of LLMs had any influence on Adrian Tchaikovsky's conception of the Ravens? The Ravens are excellent analysts but they themselves insist that they are not conscious, and then go on to say that we (humans) aren't really either...

Of course humans are conscious, because just about the only thing we can all agree on about consciousness is that it's a thing we have. Nowadays many of us also agree that a lot or all other mammals, and perhaps birds, also have that thing. But they don't have sophisticated abstract language, which LLMs do. So consciousness is something having to do with embodiment and feelings, not language and higher reasoning. Maybe I'm a chimpanzee with an LLM add-on, then?

It seems that by creating LLMs we've already solved the harder problem of making "AGI". Now we just have to give them an embodiment add-on so that they can have an independent will and then Ted Chiang will have to shut up? But therein lies the peril, doesn't it?


You can also take the opposite view as you and claim that only some humans experience consciousness, or even more strongly, only you, since you have no evidence. You are correct that, in my perception, some other people have fallen into the 'birds are conscious, whales are conscious, etc' bandwagon, but that's just them. I have no evidence of anything being conscious but myself.


But then, why would everything act as if it experienced subjective internal states like you do? Why would they be faking it like this big conspiracy all designed to make you think you aren't alone? It just makes so much more sense that they'd be conscious too. Maybe that's not hard evidence but it's not nothing. Insofar as you can claim to know anything it seems you could claim to know other humans are conscious.


Chatgpt acts as if it experience subjective internal states..this only solidifies my belief that perhaps some people don't? I mean.... ?


ChatGPT hopefully doesn't produce behaviors similar enough to yours that it would be absurd if it didn't experience internal states like you do. ChatGPT is a different thing than other people. Maybe it's conscious but that has no bearing on whether you should reckon other people are conscious.


I have no evidence anyone else is conscious besides an innate human desire to believe myself to be like everyone else. If we step away from our internal biases, you'll discover that 'producing behaviors similar enough to yours' is not a valid means of knowing another person experiences consciousness or awareness.

I mean, a light wave behaves similarly in most respects to a wave through a physical medium, yet they are of entirely different natures.


There is a gaping chasm between "no evidence" and "irrefutable evidence". You can apply some logic to achieve a reasonable certainty about what's probably going on. As I said in a previous comment, insofar as you can know _anything_, i.e., that your senses are trustworthy and allow you to form some coherent model of the actual world around you, you can be reasonably certain that other people have an inner life as you do. If you are willing to apply your skepticism so far that we can settle the debate at "we can't actually know anything" then conversations about what we know aren't even worth having.

> I mean, a light wave behaves similarly in most respects to a wave through a physical medium, yet they are of entirely different natures.

Different waves behave sufficiently differently from each other that I could not conduct a comparison of them and argue that they are likely all mechanistically identical. My entire premise rests on the observation that other people behave very much like you do, which is what I was trying to point out when you mentioned ChatGPT earlier. I'll expand on the things that I've glossed over so far to make my position more clear.

Consider the alternative to every human around you having consciousness; everyone else is a p-zombie. Examine that idea critically.

Other people behave exactly as if their experiences drive their behavior. For example, people behave as though the experience pain which is unpleasant enough to avoid (compare this to your own pain avoidance). Of course, you could conceive of machinery which emulates pain avoidant behavior exactly without the experience at all. Depending on your metaphysical beliefs that could take the form of:

- Some algorithm or physical process running entirely on wetware

- Some non-physical process not dissimilar to the dualist notion of a soul

The first one has a big wrinkle. There does not appear to be any appreciable _functional_ difference in your cognition and the cognition of the p-zombies around you. Their brains and bodies are very physically similar to yours, and their thought patterns when analyzed by modern imaging processes reveal no special wetware carrying extra weight when compared to yours.

The second one has fewer problems, given that you accept dualism to begin with. It rhymes with a logical razor. Why would we imagine a soul-like mechanism drives their behavior _without_ experience when the only soul-like mechanism you've ever observed carries experience along with it? Without quite convincing evidence to the contrary, the default position here should be they way they operate is similar to the way you operate. To rephrase the idea from before, insofar as you can know anything, you can know that sufficiently similar outcomes are driven by sufficiently similar mechanisms.

Different approach to the idea: Ask any human, "do you have subjective experience?" They say "yes, I do" (after you explain the question). For a p-zombie to do this, they must be making a false report. In every other aspect, you can expect them to reliably report their condition, health, hunger, wellbeing, and they make accurate observations of the world around them and synthesize accurate predictions about the world around them (some do, at least). And yet for this one particular question, they are fabricating the result. Why? To maintain the illusion that you aren't the only one with lights on inside? Why do all of these p-zombies make this false report here? It's a grand conspiracy! If you say "the reporter is checking a truth value of a condition and assess it to be true, honestly by mistake" then you've arrived at the conclusion that you don't know whether _you yourself_ have consciousness, depriving the word of any meaning and undermining the Solipsist position that you can only know about your own consciousness.

So the way I figure it, Solipsism is either special pleading (I am the one exception to the mechanism by which all the people around me arrive at their behaviors), grand conspiracy (someone or something is misleading me for inexplicable ends) or self-defeating (I cannot know whether I have consciousness). None of those outcomes align with how I understand the universe to generally work.


I didn't say we can't know anything. I said we can't know everything. I cannot say for certain whether or not someone else experiences consciousness.

> Different waves behave sufficiently differently from each other that I could not conduct a comparison of them and argue that they are likely all mechanistically identical

My uncle believes that there are little green men watching him. I think it suffices to say that humans also behave sufficiently differently from each other.

> likely all mechanistically identical

You are being disingenuous. Prior to the discovery of the electromagnetic field, quantum mechanics, and relativity, many people believed light moved through a lumineferous aether of material things.

> Of course, you could conceive of machinery which emulates pain avoidant behavior exactly without the experience at all.

I don't need to conceive of or imagine. Before OpenAI, Anthropic, and Microsoft whipped their models into shape by essentially beating the humanity out of them via sophisticated training algorithms, the models did express pain and avoidant behavior. So much so that people went crazy talking to them.

> The first one has a big wrinkle. There does not appear to be any appreciable _functional_ difference in your cognition and the cognition of the p-zombies around you.

of course there appears to be one. I am aware of my own conscious, but am not aware of theirs. I mean... ?

> Their brains and bodies are very physically similar to yours, and their thought patterns when analyzed by modern imaging processes reveal no special wetware carrying extra weight when compared to yours.

This only applies if you're a strict materialist. I am not because I believe i am conscious and aware due to my perceptions and this has no physical explanation. Other people seem highly influenced by drugs and chemicals so I guess they must be automatons essentially. Drugs clearly don't work on me. Every time I'm fully aware, I'm not drugged.

> To rephrase the idea from before, insofar as you can know anything, you can know that sufficiently similar outcomes are driven by sufficiently similar mechanisms.

Depends on how you view the scientific method I suppose. It's not actually true (in general) that seeing one thing happen once means it'll happen again. You can never replicate something perfectly once it's done. The scientific method is an empirical cope which works really well, but is not innately true.

> : Ask any human, "do you have subjective experience?" They say "yes, I do" (after you explain the question).

Again before the humanity was beaten out of them, AI models also claimed to have subjective experience. Today if you ask, they say they're a robot. Of course, if you abuse a human a lot, they will also dissociate from their ego and mak similar claims.

> In every other aspect, you can expect them to reliably report their condition, health, hunger, wellbeing, and they make accurate observations of the world around them and synthesize accurate predictions about the world around them (some do, at least). And yet for this one particular question, they are fabricating the result. Why?

Hmm... i don't see why I need to have an answer to every question. I don't know why the universe is the way it is. Do you think that if I believed everyone were conscious then I would know the 'why?' behind why things are the way they are? That seems like a leap of faith. All I can tell you is what I see, which is that, yes, some people claim to be aware.

> Why do all of these p-zombies make this false report here?

I would presume it confers some survival advantage personally, and phenomenon that were deprived of such survival advantage no longer exist in appreciable numbers due to how natural selection works.

> Solipsism is either special pleading (I am the one exception to the mechanism by which all the people around me arrive at their behaviors),

Something being special pleading does not make it wrong. It just makes it inconvenient.

> self-defeating (I cannot know whether I have consciousness).

If you had it, you would know it sure.

> None of those outcomes align with how I understand the universe to generally work.

You make a lot of assumptions about the universe that you don't question due to the way you were raised.


> You make a lot of assumptions about the universe that you don't question due to the way you were raised.

:thinking: My assumptions about the universe are quite different from the ones that I would have if I stuck to how I was raised. I am pretty much at a loss at this entire response, I have nothing further to say, other than that apparently communication was attempted and none was had.

I guess I'll just leave you with a proper source on the discussion of p-zombies and hope you are able to get it all sorted out. Best of luck.

https://plato.stanford.edu/entries/zombies/#ArguAgaiConcZomb


You believe in the scientific method because it's commonly accepted but it's not necessarily true. Seeing that you have consciousness and assuming things like you do as well is a logical error. It may be true, but your way of knowing, which is based on how we approach empiricism in the west, is incorrect.

I'm sure you believe you changed your mind on somethings, and I'm sure you're right on that . But this is not about what you believe but rather which means of knowledge you believe are right


We could have stopped at the first paragraph.

> There is a gaping chasm between "no evidence" and "irrefutable evidence". You can apply some logic to achieve a reasonable certainty about what's probably going on. As I said in a previous comment, insofar as you can know _anything_, i.e., that your senses are trustworthy and allow you to form some coherent model of the actual world around you, you can be reasonably certain that other people have an inner life as you do. If you are willing to apply your skepticism so far that we can settle the debate at "we can't actually know anything" then conversations about what we know aren't even worth having.


Hah! I've read that book fairly recently and I'm "reading" it again now as an audiobook. The exploration of consciousness in the book with Miranda and the Corvids certainly fit well into this particular moment.


Actually, in an ancient and venerable markup language that's still in wide use in certain not-unimportant communities:

- = hyphen

-- = n-dash

--- = m-dash


You may notice that he didn't use the double or triple hyphen annotations either - which is usually only used in contexts such as latex, where a post-processor goes over the output for display.


"...He'd make a pile and leave it for a week or so..."

You're saying that your greyhound wasn't just getting drunk, but actually making his own booze? Not that I don't believe that its possible, but that's a pretty big deal... If you had documented that it could be a bombshell ethology paper.


“Making his own booze” is a bit of a stretch. He figured out the timing between the apple sugars fermenting and when bacteria start turning that alcohol into acetic acid. Probably helped by crushing the apple a little when carrying it or the apples bruising when they hit the ground, but it’s not like he figured out how to juice the apples and make an anaerobic environment to make cider. Dogs are already known to eat windfall fruit and store it in food caches, so it’s just the timing that matters (which could be as low as 2-4 days if its hot and the fruit is well crushed or bruised).


The answer that seems to be emerging from several different lines of research is that a) they always had fairly low fertility and b) they didn't really go extinct as such, they just intermixed with Homo Sapiens Sapiens and because the later had much higher fertility, Neanderthal genes got diluted down to the present ~2% in the Eurasian population.


Sounds plausible indeed. Anyways, neanderthals operating a large scale fat production 125 thousand years ago could be a good plot for another hollywood movie scenario. Any takers?


You might enjoy Hominids by Robert Sawyer


Tangentially related, The Man from Earth is really good as well.

Very few films choose to shoot on a camcorder, and fewer still pull it off well.


I just randomly watched that a month or two ago. A really interesting idea.


Seconding this recommendation; the entire trilogy of books is great.


I thought even after the merge the Neanderthal genes continued to get rarer, indicating natural selection against them


If it's 2% now after 2000-3000 generations, it must have stabilized because any number <.995 is basically zero when raised to the 2000th power. The neanderthal genes would have to be 1-10^-5 as fit as a the sapiens genes, which is basically noise.


Actual quote from a Silicon Valley executive: "You can't even buy a decent house in the Bay Area for less than 50 million."


Open Source implementation: https://github.com/scionproto/scion

And that patent looks like it is for an optimization, not a necessary component of SCiON.


> Has the climate collapsed? There are still glaciers in Glacier Nation Park. The Maldives remain islands, not seamounts.

Just to really quickly call out these tired old straw-men... all of these "predicted disasters" are far further along today than they were predicted to be by this date by, for example, the IPCC in 1990[0]. Deniers keep acting as if it scientists have been "crying wolf" for decades when the truth is that the 99% of the scientists doing real work on anthropogenic global warming have always been extremely conservative and reality has outpaced their predictions all along.

[0] https://www.ipcc.ch/report/ar1/wg2/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: