Hacker News new | past | comments | ask | show | jobs | submit login
Getty Images bans AI-generated content over fears of copyright claims (theverge.com)
305 points by baptiste313 on Sept 21, 2022 | hide | past | favorite | 383 comments



Reading between the lines of this, it sounds to me like Getty is preparing a copyright claim against the AI companies:

1. They seem of the opinion that the copyright question is open.

2. Their business stands to lose substantially as a result of such models existing.

3. It would be a bad look for them to make a claim whilst simultaneously accepting works from the models into Getty.

4. At least some of their watermarked content seems to have been included in the training data of the OpenAI model: https://news.ycombinator.com/item?id=32573523

If I'm correct about that, they will probably not settle, as their business was likely worth substantially more than any feasible settlement arrangement.


I've seen a lot of confidence on HN and other tech communities that a court would never rule that training an AI on copyrighted images is infringement, but I'm not so sure. To be clear, I hope that training AI on copyrighted images remains legal, because it would cripple the field of AI text and image generation if it wasn't!

But think about these similar hypotheticals:

1. I take a copyrighted Getty stock image (that I don't own, maybe even watermarked), blur it with a strong Gaussian blur filter until it's unrecognizable, and use it as the background of an otherwise original digital painting.

2. I take a small GPL project on GitHub, manually translate it from C to Python (so that the resulting code does not contain a single line of code identical to the original), then redistribute the translated project under a GPL-incompatible license without acknowledging the original.

Are these infringements?

In both of these cases, a copyrighted original work is transformed and incorporated into a different work in such a way that the original could not be reconstructed. But, intuitively, both cases feel like infringement. I don't know how a court would rule, but there's at least some chance these would be infringements, and they're conceptually not too different from distilling an image into an AI model and generating something new based on it.


It’s not about reconstruction, it’s about the notion of a “derivative work”. Translating a work would absolutely be derivative (consider the case of translating a literary work between languages: this is a classic example of a derivative work). Blurring a work but incorporating it would nonetheless still be derivative, I think.

The challenge with these models is that they’ve clearly been trained on (exposed to) copyrighted material, and can also demonstrably reproduce elements of copyrighted works on demand. If they were humans, a court could deem the outputs copyright infringement, perhaps invoking the subconscious copying doctrine (https://www.americanbar.org/groups/intellectual_property_law...). Similarly, if a person uses a model to generate an infringing work, I suspect that person could be held liable for copyright infringement. Intention to infringe is not necessary in order to prove copyright infringement.

The harder question is whether the models themselves constitute copyright infringement. Maybe there’s a Google Books-esque defense here? Hard to tell if it would work.


> If they were humans, a court could deem the outputs copyright infringement

I'm not sure I understand how this is self-evident. The closest equivalent I can see would be a human who looks at many pieces of art to understand:

- What is art and what is just scribbles or splatter?

- What is good and what isn't?

- What different styles are possible?

Then the human goes and creates their own piece.

It turns out, the legal solution is to evaluate each piece individually rather than the process. And, within that, the court has settled on "if it looks like a duck and it quacks like a duck..." which is where the subconscious copying presumably comes in.

I don't know where courts will go. The new challenge is AI can generate "potentially infringing" work at a much higher rate than humans, but that's really about it. I'd be surprised if it gets treated materially different than human-created works.


> The new challenge is AI can generate "potentially infringing" work at a much higher rate than humans, but that's really about it

The other challenges are: (i) the model isn't a human that can defend themself by explaining their creative process, it's a literal mathematical transformation of the inputs including the copyrighted work. (And I'm not sure "actually the human brain is just computation" defences offered by lawyers are ever likely to prevail in court, because if they do that opens much bigger cans of worms in virtually every legal field...) (ii) the representatives of OpenAI Inc who do have to explain themselves are going to have to talk about their approach to licenses for use of the material (which in this case appears to have been to disregard them altogether). That could be a serious issue for them even if the court agrees with the general principle that diffusion models or GANs are not plagiarism.

And possibly also (iii) the AI has ridiculous failure modes like implementing the Getty watermark which makes the model look far more closely derived from its source data than it actually is


It's worth pointing out that the problem, in this scenario, is for the creator (ie, the human running the algorithm). They will need to determine whether a piece might violate copyright before using it or selling it. That seems like a very hard problem, and could be the justification for more [new] blanket rules on the AI process.


Proving artwork you created is free from all copyright issues is similarly impossible, but in practice isn’t an issue. So, I don’t see any AI specific justification being relevant.


How common is it for an artist to accidentally generate a work that resembles an existing work?


I can't speak for digital art because I'm not an artist but I can say it's extremely common for original music to (often accidentally/subconsciously) include melodies or pieces of melodies from other music.


That's more like two paintings sharing the same color scheme or using the same brand+color of paint but still being unique. The performance of those melodies and structures is what makes a song unique and creative.


Adam Neely the YouTuber has produced several videos about recent legal cases where musicians sue because of some superficial similarities. In most cases the higher courts in the US recognize that copying is part of the normal creative process and is not infringement unless it is a blatant rip-off.


This is exactly my thinking. If the court finds somebody guilty of infringing on a human-made piece of digital art the response is to punish the human, not to ban or impose limits on photoshop.

At risk of stretching the analogy, you don’t charge the gun with murder…


Except, of course, the human has very little control over what the AI outputs in TXT2TXT scenarios, at least in terms of whether the output would match the definition of copyright infringement of someone else's work. IMG2TXT is kinda different -- I think you could make a much stronger case for derivative work there. So you have a tool that can randomly create massive liability for you, and you can't know if its done so or not until someone sues you.


The humans are the cause of it happening in the first place. Maybe the gun analogy is not such a stretch: if you pull the trigger, you own the consequences.

Reasonable fair use principles could distinguish personal and R&D use from commercial use.


I think it's more common with music. Some musician goes to a foreign country and heard an obscure local song. 20 years later the musician has forgotten completely about the song and the trip. One day a catchy melody appears in the head of the musician out of the blue, and the musician complete the song and add a lyric. The song get famous, and later reach the foreign country, and everyone acuse the musician of plagiarism.


Reminds me of a recent scandal involving Adele, where she is accused of plagiarizing a Brazilian composer: https://english.elpais.com/usa/2021-10-19/toninho-geraes-vs-...


>I'm not sure I understand how this is self-evident. The closest equivalent I can see would be a human who looks at many pieces of art

...and then gets told "Hey, go and paint me a copy of that Andy Warhol piece from memory".

The model might not violate the copyright, but its output is derivative work if the copyrighted works are included in the training set.


That would be over-fitting which is certainly a failure mode of ML. But it's still a failure mode, not an inherent property.


Note that a derivative work doesn't instantly make it 'fair use' for the purposes of copyright. You typically still need 'adaptation' permission from the copyright holder to made a derivative work of it, so you can't make 'Breaking Bad: The Musical' by recreating major scenes in a play format, at least not without substantially changing it[0].

For the purpose of fair use, copyright.gov has an informative section titled "About Fair Use" which details what sort of modifications and usage of a copyrighted work would be legal without any permission from the copyright holder https://www.copyright.gov/fair-use/#:~:text=a)(3).-,About%20...

0: https://en.wikipedia.org/wiki/Say_My_Name!_(Musical)


It’s the opposite of what you’re suggesting in the first sentence. Something being a derivative work is a fairly clear sign that it’s not fair use.


It's kind of a combination of these two positions: something being a derivative work means that it requires an excuse under copyright law, whether that be a license, or a fair use defense (which may be fact-specific), or some other justification. Without such a justification, the derivative work is treated as a copyright infringement.

This is a result of 17 USC §106(2), which says that one of the things that "the owner of a copyright [...] has the exclusive rights to do and to authorize" is "to prepare derivative works based upon the copyrighted work", unless an exception (including fair use) applies.

https://www.law.cornell.edu/uscode/text/17/106


> If they were humans, a court could deem the outputs copyright infringement

Why? I am a human, I can learn the composition of stock photos and use the idea in my work. There is no copyright infringement, or there is simply no way to prove it.

The connection needs to be really strong to consider something as derivative work.


> consider the case of translating a literary work between languages: this is a classic example of a derivative work

This is a classic example of "you need a license on the original material to sell it" (unless it is public domain)

You need the rights also if you are translating from a pre existing translation in a different language (i.e. you're translating a Japanese book from the French authorized translation).

Translating an opera is not automatically derivative work and does not fall under the umbrella of fair use.

source: my sister is a professional translator

EDIT: technically you'd need to acquire the rights even if you translate it by yourself, for fun, and show it to someone else.


> The challenge with these models is that they’ve clearly been trained on (exposed to) copyrighted material, and can also demonstrably reproduce elements of copyrighted works on demand. If they were humans, a court could deem the outputs copyright infringement, perhaps invoking the subconscious copying doctrine (https://www.americanbar.org/groups/intellectual_property_law...).

Every single human has been exposed to copyrighted material, and probably can reproduce fragments of copyrighted material on demand. Nobody ever writes a book or paints a picture without reading a lot of books and looking at a lot of paintings first. For a "subconscious copying" suit to apply, you need to demonstrate "probative similarity" - that is, similarity to copyrighted material that is unlikely to be coincidental.

In other words - it's not clear to me that the situation with AI is any different than with a human, or that it presents new legal challenges. If it looks new, it is new.


> Blurring a work but incorporating it would nonetheless still be derivative, I think.

you have to show that the derivative work is a substantial part of the new work.

If your background is a small portion of the new image, and the blue makes it difficult to see that it was the original, and that any other blurred image would've done the same job, then i would argue that the final new painting does not constitute a derivative work.

A similar argument could be made for AI models. The model consists of billions of training images. None of these images are individually substantial in the final output, despite that on some part of the output, you can trace a derivative work. For example, if an author used 1 million books, and copied every nth word from each book and merged it, to produce a final book (which happens to produce a coherent book), i would argue that the author did not infringe copyright on any of the original 1million source books.


>> Translating a work would absolutely be derivative (consider the case of translating a literary work between languages: this is a classic example of a derivative work)

True for literature, which can be copyrighted. Not true for logic. Code that performs the same function but is written independently can't, by definition, violate copyright. It could violate a patent, but it's much harder to patent code.

However, in the AI case, the models themselves do nothing even approaching such translation. They fully incorporate what are essentially just compressed representations of existing works into their weights.

I'd guess if any single copyright holder could show that their work could be substantially retrieved from the model with the right query string, they'd have a reasonable claim on royalties from the entire model, or else the whole model would have to be thrown out.


Yes, I expect that if you ask the model for "Getty images photo of [famous person] doing [thing Getty Images has only one photo of that person doing]" you might well get the original photo out.


Should be easy to try. My guess is that it most likely won't.


You won't get the exact image, because the model doesn't perfectly memorize all inputs, but you'll likely get something so close that everyone would consider it a derivative work.

This is very similar to getting Copilot to spit out Carnack's fast inverse square root code with the right prompt.


There's some interesting thought experiments around this:

1. You do the same thing but don't use "getty" in the prompt.

2. You do the same thing with "getty" in the prompt on a model NOT trained with Getty images

3. You ask a human photographer to do the same thing and you show him a Getty image

4. You ask a human photographer to do the same thing but without showing him a Getty image (but he's presumably seen many in the past).

If 1 and 2 produce images that are also similar to a Getty image, where do we stand? I imagine it's likely that a trained model can learn "getty-like" without any actual Getty images to make 2 happen.

And currently IP law treats humans and AIs the same (in the sense that infringment rules don't distinguish between them) so would you consider 3 and 4 to be similar to 1 and 2?


For 1, there's probably a case where a picture that exactly matches the prompt is very famous, so maybe the right prompt would manage to effectively select an image licensed by Getty. The GPLed fast inverse square root routine was an example of this kind of thing: one good match and the model finds it. Finding such a case might or might not be possible.

For 2, you can't find what isn't there, so something random would come out.

For 3 or 4, I suppose you could pay paparazzi to stalk the celebrity and try to produce a similar original shot, and this might be impossible depending on the prompt (for example, photo of [person] at [event] on [date]"), but if it's possible it has absolutely no resemblance to 1 or 2. The photographer would produce an original work, unless your #3 contractor tries to remove the watermark and pass off the Getty image as their own, which would be idiotic.

There is no particular "Getty Images" style, other than their quality requirements, they are a huge company that acquires and licenses a ton of pro photography. There's no such thing as "getty-like".

So, no, only option 1 might possibly produce a problematic derivative work.


> For 2, you can't find what isn't there, so something random would come out.

It's not a search engine. It can synthesise things that it hasn't seen to some degree (assuming it knows the elements that make up the request.

The tricky bit would to teach it "Gettyness" without showing it Getty images but I think that's entirely possible. Getty images aren't astonishing examples of unprecedented originality. So it just needs to know a) the celebrity b) the action and c) what people mean when they ask for something that looks like a Getty image.

EDIT - I answered in a rush and realise you made a similar point to me about "Gettyness" - which makes it even harder for me to understand why you think it would be a violation.

How many different ways can George Clooney eat a burrito in Times Square?


Suppose that there's a really iconic pic of George Clooney eating a burrito in Times Square, with a "Getty Images" watermark on it, in the training set. It's perfectly framed, has a somewhat comic expression, and a bit of the contents spilled on his shirt. It seems quite possible that the model could produce an image that isn't identical to this but so much so that people assume it's a copy, maybe produced by an artist based on the picture. Worse, it will have most of the "Getty Images" watermark right on the image.

If that doesn't convince you that there might be an issue I'll just stop here.


Both hypotheticals are likely infringement. The first example may be considered de minimus, but the courts hate using those words, so they might just argue that you didn't blur it enough to be unrecognizable or that it could be unblurred.

However, the thing that makes AI training different is that:

1. In the US, it was ruled that scraping an entire corpus of books for the purpose of providing a search index of them is fair use (see Authors Guild v. Google). The logic in that suit would be quite similar to a defense of ML training.

2. In the EU, ML training on copyrighted material is explicitly legal as per the latest EU copyright directive.

Note that neither of these apply to the use of works generated by an AI. If I get GitHub Copilot to regurgitate GPL code, I haven't magically laundered copyrighted source code. I've just copied the GPL code - I had access to it through the AI and the thing I put out is substantially similar to the original. This is likely the reason why Getty Images is worried about AI-generated art, because we don't have adequate controls against training data regurgitation and people might be using it as a way to (insufficiently) launder copyright.


Searching books is different than generating books and selling them.


> To be clear, I hope that training AI on copyrighted images remains legal, because it would cripple the field of AI text and image generation if it wasn't!

To be clear, there's no law banning training an AI. There are laws for what you can do with other people's stuff.

In short, maybe the AI field would indeed be crippled if they no longer freely take input from others without asking permission and/or offering compensation. And maybe that's far, far from a bad thing.


That's true, but AI models trained on copyrighted images already exist and can't just be removed from the internet, and their output will often be indistinguishable from that of "clean" models. What I fear is a kind of legal hazard that would make even the possibility that AI had been used anywhere in a work radioactive.

Imagine another hypothetical: I create a derivative work by running img2img on another artist's painting without their permission. Whether the AI model in question contains copyrighted content or not, this is probably infringement.

Now suppose that, instead, I create an original work, without using img2img on someone else's art. But, as part of my process, I use AI inpainting, with a clean AI model, so that the work has telltale signs of AI generation in it.

And then suppose an artist I've never heard of notices that my painting is superficially similar to theirs--not enough to be infringement on its own, even with a subconscious infringement argument. But they sue me, claiming that my image was an img2img AI-generated derivative of theirs, and the AI artifacts in the image are proof.

With enough scaremongering about AI infringement, it might be possible for a plaintiff to win a frivolous lawsuit like this. After all, courts are unlikely to understand the technology well enough to make fine distinctions, and there's no way for me to prove the provenance of my image! If it becomes common knowledge that AI models can easily launder copyrighted images, and assumed that this is the primary reason people use AI, then the existence of any AI artifacts in a work could become grounds for a copyright lawsuit.


It’s completely ridiculous to believe that copyright claims are unenforceable because you ran it through an ML transformation engine.

I hope Getty sues and wins. Train your datasets on your own data! This is mass IP theft.


Be careful not to watch any copyrighted films through your kranke engine. The copyright owner could ask for all copies to be removed from wet storage, fair damages. I wouldn’t wish that on ainybody.

https://en.m.wikipedia.org/wiki/Monkey_selfie_copyright_disp... Slater claims the copyright (the monkey and the Canon EOS 5D DSLR definitely don’t have the copyright).


Nonsense. Observations about certain characteristics of a copyrighted work are not covered under that work's copyright. If I take a copyrighted book and produce a table of word frequencies in that book, no serious person would claim that the author's copyright domain extends to my table.


Everyone in HN keeps pretending like ML transformation = human inspiration. This is really funny - we don’t have AGI but we have an AGI-like capability to avoid copyright. Seems to be the only place where human rights and AI rights are matched is where it most benefits AI research. How interesting.

A for profit computer program =! A human being.


Can you articulate a meaningful (and more importantly legally provable) difference? Both human brains and these sorts of AI programs are intractable black boxes. Maybe you think computer programs lack some sort of divine spark, but even if we accept that it seems to me that it's not a given that humans apply their divine spark every time they create something either.


That’s a bizarre definition.

So because they’re both black boxes we suddenly treat them both the same legally?

I don’t have to prove that an ML transform is equivalent to a human being. Divine spark isn’t necessary to protect you from copyright - we have laws that determine what you need to do and those laws have allowed IP to exist as a profitable area for a century.

Here’s a simple question - if I were to take an image from an artist on artstation and announce my for money Warhammer tournament with it, without paying him, I’d be violating his IP rights.

But if I build an ML engine I apparently can take 100 of his images, produce similar images and charge money for it. Magic!

Explain to me - how did the artist suddenly lose his IP rights exactly ?

I submit it is up to ML researchers to prove this isn’t the biggest IP land grab in history. Not for me to prove that somehow humans are different. We know humans are different and we’ve codified in law just how much new IP needs to be different in order to avoid copyright. That’s good enough for humans.

But somehow ML enthusiasts want to say that if I feed an ML all the Disney movies and I get it to make it derivative work, it’s somehow protected? I can’t wait to see all the Mickey Mouse movies made by AÍ. Guess what - that’s never going to happen. If you think Disney or any other IP empire will let that go, I don’t know what to tell you. It’s absolute madness to say that anything that goes into an ML transformation is uncopyrightable. Or somehow I have no rights over my artwork because I don’t have the ability to sue OpenAI to oblivion like Disney does? Because that’s literally what we’re saying - those artists from artstation and Getty whose images were used were used exactly because they believed that they wouldn’t be able to sue - thanks to distinctions like your own!

Disney won’t let that stand, they would sue and push for annihilation, but hey the 100,000 artists on artstation? F** them, they can’t sue us. Let’s use their shit.

That’s basically where we’re at.

Let’s work out an example. So this AI can be feed all of Lucien Freud’s works. Then it can produce Lucien Freud-like artwork. And in this process, where all that is missing is Lucien Freud’s own signature, it could potentially make a virtual replica of one of his most famous paintings! Thus I would have a copy of Lucien Freud! But completely copyright free.

Are you for real? This cannot, absolutely cannot be allowed to happen. It is the biggest data theft in history, larger than Facebook or anything, to basically say that any human data when run through an ML transformation engine is no longer property of the human being who produced it but if the ML engineer. This is absolute madness if you consider the second order and third order effects.

Essentially all human data that can be automated through ML would be automated to the gain of only the ML engineers involved. The intellectual property of millions of human beings producing the data would be nothing but compost. This will lead to an unsustainable situation, where no one but ML engineers will be able to make any profit in the world. Human beings must be compensated for their data, or we’re headed straight into a dystopia.


>But if I build an ML engine I apparently can take 100 of his images, produce similar images and charge money for it. Magic!

You can pay a human artist to produce "similar" images too, and as long as they aren't too similar it's fine.

You seem to be conflating - repeatedly - exact copies of a work with merely copying a style. Family Guy had an episode where Brian and Stewie visit a Disney universe, and everything was animated in the style of a classic Disney film. A bunch of humans did indeed watch a bunch of Disney films, and make a new animation based on what they saw. Did Disney - a notoriously litigious company - sue? Of course not. Copying a style is not infringement! I don't see why it's any different if an AI program does that.


> A bunch of humans did indeed watch a bunch of Disney films, and make a new animation based on what they saw. Did Disney - a notoriously litigious company - sue? Of course not.

Why would Disney sue themselves? Disney owns family guy.

In a world where Disney did not own it, Disney would probably sue if they used a cartoon mouse. "Style" may not be covered by copyright, but characters likely are. A human animator would know enough not to make a parody cartoon mouse look too Micky-like, but an AI wouldn't know to avoid that.


Parody is a well known copyright exemption.

I just don’t see how you can go - this is just like a human being, let me make a million dollars out of it.


It wouldn't extend to your table, but if you used that table to generate a book of your own, it's very possible a serious person would claim the author's copyright extends to the generated book.


there is only one way frequency tables would return original work, all other versions would be gibberish, its like when using Huffman tables for compression where table itself is not equal to original data


Is the copyright still not applicable if your encoding table has rules to reconstruct text from? No serious person would argue that copyright doesn't extend to the encoded version of the book and prevent me from profiting off it.

I believe the same applies to AI generation of text/images. Just because you're encoding the data in statistical models doesn't mean that it's not encoded.


> I've seen a lot of confidence on HN and other tech communities that a court would never rule that training an AI on copyrighted images is infringement, but I'm not so sure. To be clear, I hope that training AI on copyrighted images remains legal, because it would cripple the field of AI text and image generation if it wasn't!

Regardless of the copyright of the training data which really is unresolved, the copyright-ability of output of AI is questionable at best. There's no way to monetize the generated images for a stock art that isn't at risk of a court ruling pulling the rug from under it.


AI-produced art is still human-made, as a person does the job of engineering a prompt and selecting from the generated images. The copyrightability of such work is unlikely to ever seriously be in question.


This is not as obvious as you may think. This is closely related to the "monkey selfies" copyright claim issues. The court didn't seem to agree with the photographer who said "I own the copyright of those pictures because I configured the camera by myself and put it there, the monkey only pushed the button."


The court didn't rule on the monkey selfies copyright claim. The copyright owner just ran out of money to pay for a lawyer and gave up fighting against Wikimedia.


That's not so obviously clear cut. Can the model produce identical output from the same simple prompt?


But that's exactly what happens, AI isn't randomness, it's a set of predefined calculations. The randomness is in the seed/starting point. For e.g Stable Diffusion it is given/user input, resulting in perfect reproducibility.


I should have been clearer - that was rhetorical to point out that if you and I use the same prompt and pick the same resultant image, it's harder to claim either of us have copyright. This is a gray area. The tools themselves could introduce some stochastic aspect so that outputs are never identical, also.

None of this leads to an obviously clear cut legal position wrt copyright.


Your (and the other person's) set pf inputs drive a mathematical function that derives the same output.

If we were all honest, we'd give up the idea of copyright altogether where ML is concerned. You can't get much closer to "It's just math, man" than what it is currently.


I think we are saying the same thing, roughly.

Of course, "it's just math, man" isn't precisely a legal argument, either.


Vector art is just math, yet you can get copyright on it.


It might be hard to argue that a prompt is original enough to be covered by copyright. Shorter, simpler prompts might be ruled unoriginal.


This is what I would like to believe… but needs to be proven before I stake my business on it.


> because it would cripple the field of AI text and image generation

I don't disagree with this statement. But arguably, it's becoming clear that these fields exist, on an economic level, as a means for already powerful corporations and technocrats to gain ownership over and repurpose the labor of previous generations for profitable automation (not simply to create C-3PO or something).

Not unlike all those other non-digital areas of the economy (e.g. railroads were built on the blood, sweat, and tears of previous generations and they are now owned by a small few, likely unrelated to the descendants of the laborers who built them).

For some ML-related commentary on this subject, see e.g. https://nathanieltravis.com/2022/08/01/ai-research-the-corpo...


I've seen a lot of confidence on HN and other tech communities that a court would never rule that training an AI on copyrighted images is infringement, but I'm not so sure.

Indeed and your examples are intended to point to gray areas. But a much more problematic (for the user) example is: some Dall-E-like program spits out seemingly original images but 0.1% are visibly near duplicates of copyrighted images and these form the basis of a lawsuit that costs someone a lot of money. Copyright in general tends to use the concept provenance - knowing the sequence of authors and processes that went into the creation of the object [1] and naturally AI makes this impossible.

The AI training sequence either muddies the waters hopeless or creates a situation where the trainer is liable to everyone who created the data. And I don't think the question will answered just once. The thing to consider is that anyone can just create an "AI" that just spits out stock images (which is obviously a copyright violation) and so the court would have to look at detailed involved and neither the court nor the AI creator would want that at all.

ianal... [1] https://serc.carleton.edu/serc/cms/prov_reuse.html


> To be clear, I hope that training AI on copyrighted images remains legal, because it would cripple the field of AI text and image generation if it wasn't!

Temporarily, yes. However it wouldn’t be the worst thing if they were forced to work on sample-efficiency. And I maintain that if they want a large dataset of images they can collaborate with Twitter and Instagram to add an image license option to the image uploader, and provide a license option that explicitly allows this kind of use.

AI needs sample efficiency research and if they actually had to get consent for their images there are a lot of artists who wouldn’t feel like their work was being ripped off.

It’s probably better if this kind of use is broadly permitted, but I don’t think it’s the disaster some think it would be if they were forced to get consent. There’s already millions of openly licensed images in several collections online (Creative Commons, Wikimedia Commons) and if they forced people to provide licenses on sites like twitter we could actually have open datasets for this instead of these kind of grey areas of privately scraped datasets.


There's a literal arms race behind the scenes in AI right now.

I think it's very unlikely that corporate IP claims that could substantially hold back progress in domestic AI development will end up being successful.


I don't know if there's a clear answer to (1), but with respect to (2) I believe there is precedent that the copyright owner would have a strong case if they had reason to believe the python library author had seen their work and it played a roll in the transcription to python. You can't look at a GPL work and literally transcribe it to license launder. Companies have tried. You can implement a similar idea in a clean room, though.


Honestly, as much as I am rooting for AI "art" (still not sure about that term here) I can see how Getty would easily have a claim in court if the AI was indeed trained on some of their images AND they can prove it somehow. If that's not derivative then I don't know what is. Maybe a special niche could be carved out for people who are only researching and experimenting and not really "selling" or profiting from the resulting images. It would seem if they're right that maybe they could bury a watermark in their images that identifies it as Getty (or just whomever) and that it's copyrighted by them and they don't give permission to use it for training AI. Maybe I just don't know enough about how the algorithms work though shrug


Unfortunately for the Getty et al, "feels like" is worth exactly $0.

Any court that understands the technology at even a lay level, has no path to find infringement applying existing precedent.

NB "style" is not protected.


I do understand the technology at at least a lay level (unless your Scotsman is true by your conclusion), and it seems to me that the idea that it's "style" and not "permuting the input data" is one that seems to be a postulate, not a fact.


For case 2, translating a novel from, say, English to Japanese, still requires permission of the holder of the copyright to the English version, even though the resulting novel "does not contain a single line ... identical to the original".


It might technically be infringement, but the proof is in the pudding. It may be very hard to prove a specific image (or set of millions of images) were used in training.


> 1. They seem of the opinion that the copyright question is open.

I'm surprised it has taken this long to be honest. I've seen generated images with the blurred Getty watermark on them.


It's not that the watermark is on them per se, but that the model tried to emulate an image it had seen before which had a watermark on it. Imagine showing a child a bunch of pictures with Getty watermarks on them, then they draw their own, with their own emulation of the watermark. They don't know it's a watermark, they don't know what a watermark is, they just see this shape on a lot of pictures and put it on their own. That's essentially what's going on.

The model is only around 4GB, and it was trained on ~5B images. At 24 bit depth, that'd be 786k raw data per image, which would be 3.5 petabytes of uncompressed information in the full training set. Either the authors have invented the world's greatest compression algorithm, or the original image data isn't actually in the model.

So, I think the argument is: if you look at someone else's (copyrighted) work, and produce your own work incorporating style, composition, etc elements which you learned from their work, are you engaged in copyright infringement? IANAL but I think the answer is "no" - you would have to try to reproduce the actual work to be engaging in copyright infringement, and not only do these models not do that, it would be extremely hard to get them to do so without feeding them the actual copyrighted work as an input to the inference procedure.


> It's not that the watermark is on them per se, but that the model tried to emulate an image it had seen before which had a watermark on it. Imagine showing a child a bunch of pictures with Getty watermarks on them, then they draw their own, with their own emulation of the watermark. That's essentially what's going on.

The blurred watermark is what makes its obvious they used Getty's (copyrighted?) images to train the model.


I understand that, but why is using copyrighted images to train a model be any more illegal than studying copyrighted paintings in art school? Copyright doesn't prevent consumption or interpretation, simply reproduction.


If a student studied an older master in school and produced a painting inspired by that old master that included a copy of the signature of the old master, this would be more indication of intent to fraud than if they didn't include the signature.

Copyright can be fairly flexible in interpreting what constitutes a derivative work. The Getty water is evidence that an image belongs to Getty. If someone produces an image with the watermark and gets sued, they could say "your honor, I know it looks like I copied that image but let's consider the details of how my hypercomplex whatsit work..." and then judge, say a nontechnical person, looks at the defendant and say "no, just no, the court isn't going to look at those details, how could court do that?". Or maybe the court would consider it only if you paid 1000 neutral lawyer-programmers to come up with a judgement, at a cost of millions or billions per case.


What if it was not the copied signature of the old master, but a new one with a similar style and placed in a similar spot on the painting, but with the name of the student instead and looking blurry/a bit different? Because that's what's happening here, and that doesn't sound quite like fraud.

Another scenario, what if i create a painting of a river by hand in acrylic and also draw a getty-watermark-looking thing on top using acrylic? As for why, i would put it there as an integral part of the piece, to allude to the fact of how corporations got their hands over even the purest things that have nothing to do with them, with the fake watermark in acrylic symbolizing it. You can make up any other reason, this is just the one i thought of as i was writing this. It wont look exactly like the real getty watermark, it will be acrylic and drawn by hand, so pretty uneven with colors being off and way less detailed. Doesn't feel like fraud to me.


My argument isn't really whether this is morally fraud. Maybe the device is really being "creative" or maybe it's copying. The question is whether the things supposed originality can be defend in court.

What if i create a painting of a river by hand in acrylic and also draw a getty-watermark-looking thing on top using acrylic? As for why, i would put it there as an integral part of the piece, to allude to the fact of how corporations got their hands over even the purest things that have nothing to do with them, with the fake watermark in acrylic symbolizing it.

A human artist might well do that and make that defense in court. For all anyone knows, some GPT-3-derived-thing might go through such a thought process also (though it seems unlikely). However, the GPT-3-derived-thing can't testify in court concerning it's intent and that produces problems. And it's difficult for anyone to make this claim for it.

Edit: Also, if instead of a single work (of parody), you produced a series of your own stock photos, used the Getty Watermark and invited people to use them for stock photo purposes, then your use of the copyrighted Getty Watermark would no longer fall under the parody exception for fair use.


I'm not a lawyer and I can't say how existing copyright law applies to this situation, but, how is taking images and feeding them into an ML model different from taking library code and including it in your software?

In both cases, you take a series of bytes (the image data / the library source code) that is ultimately crucial to the functioning of your software, combine it with your own original code you wrote (training / compilation), and end up with a new output (the trained model / the binary executable) that is distinct from any of the original sources.

If you use a GPL'd library in your software, then it's uncontroversial to say that you have to follow the terms of the GPL. You can't say "well actually, the compiler is just reading your source code and learning what sort of binary it should produce, just like a human learns by studying source code, so I actually don't have to follow your licensing terms". No one would buy that. You clearly used that library, so you have to obey whatever terms come along with it.

Why is it fine to ignore the licensing terms for image data you incorporate into your software, but not third-party source code that you incorporate?


> If you use a GPL'd library in your software, then it's uncontroversial to say that you have to follow the terms of the GPL. You can't say "well actually, the compiler is just reading your source code and learning what sort of binary it should produce, just like a human learns by studying source code, so I actually don't have to follow your licensing terms". No one would buy that. You clearly used that library, so you have to obey whatever terms come along with it.

What if I read the code, understand its concepts, and re-implements another library that provide similar functionality without directly linking to the original repository, that is not an infringement, and actually how open source community has always been operating, like MariaDB to MySQL, or any projects that markets themselves as 'open source alternative' of some commercial software.

I would argue, the diffusion models are really good, it is possible that they capture the essentials of drawing that they learn no different than a human. Put in another way, it masters the imaging process at fundamental level


>What if I read the code, understand its concepts, and re-implements another library that provide similar functionality without directly linking to the original repository, that is not an infringement

I agree, that's fine.

The analogous situation with image generators would be if the companies that trained the models had a human artist look at every image in their dataset, paint a unique but similar image, and then feed all of those images into the model, so that no copyrighted images were used in training without permission. But that's obviously not what they did. They just fed in the images unaltered, without getting permission.


> What if I read the code, understand its concepts, and re-implements another library that provide similar functionality without directly linking to the original repository, that is not an infringement

It might be… which is why famously wine developers don't look at leaked windows code.


A better metaphor would be copying and then claiming that you created the original art. Studying can explain but does not replace the subject of study. Producing a painting based in previous paintings can create further problems.

In art is very common that some author review their own work and made several paintings of the same subject. Artists made several attempts to conquer a panting, they draw studios or use different mediums. Photographers reproduce one portrait 20 years later to see how the subject changed. If you insert an IA image in the middle and copyright it, then any posterior painting on the same subject, even by the original author, would became derivative of the IA image that replaced it. This could even go so far as excluding the artists to review their most successful works.


Because the copyright holder has granted you the right to look at paintings and hasn't granted you the right to store them on your server to perform the mathematical transformations necessary to facilitate an adaptation-on-demand service.

Even if it was plausible to believe the mechanics of how human brains process art was particularly similar to a diffusion model or GAN, I don't see "but human brains are deterministic functions of their inputs too" as being a successful legal argument any time soon. You'd have to throw out rather more of the legal system than just copyright if those arguments start to prevail...


> Because the copyright holder has granted you the right to look at paintings and hasn't granted you the right to store them on your server to perform the mathematical transformations necessary to facilitate an adaptation-on-demand service.

You just described how modern browser cache images.

The diffusion model is revolutionary at scale, but doesn't mean it is doing anything drastically different than what is allowed right now, e.g. any AI based image beautifying/denoise filter, just the scale changes everything.


The fact watermarked images are available for free doesn't necessarily mean you can do whatever you want with them. It depends on what the getty licence on watermarked images says exactly. I'm pretty sure they don't include something like "you can use these picture as data in automated processes". They are (I guess) available "for your eyes only".


iirc court already stated that you can crawl available net and use what you find there, it was after that company doing face recognition, wasn't it?


A world where we can't use copyrighted material to update neutral network weights is a world where we can buy books but not read them...


You can train it but not for commercial purposes. Nobody cares what you do at home, but if you want to use someone else's work to make money they will come knocking for their cut.


Does O'Reilly ask for a percentage of a software engineer's income after they've read their book of perl recipes?


O'Reilly sells their books to software engineers with the intent for them to use the information to further their knowledge and apply it in a commercial setting.

The images in Getty are provided with the intent to be used only as a catalogue for purchasing corresponding images without watermarks.

The difference in intent is very clear, and a judge would make a distinction between these.


It is allowed to sale a book that is collection of pages from copyrighted books? Paragraph 1 is from a Stephen King novel, Paragraph 2 is from A Storm of Swords and so on? I am not a copyright attorney but that sounds like violation to me.


If you have legally obtained copies of the relevant novels, according to the first sale doctrine you should be allowed to cut them up, staple first chapter of one novel to the second chapter of another and third chapter of the next, and then sell the result.

But the authors have the exclusive right to making more copies of their work, so if you'd want to make a thousand of these frankensteinbooks, you would need to get a thousand copies of the original books.


Noone is contesting the fact that images where copyright is owned by Getty were used the model.

The contested issue is whether training a model requires permission from the copyright holder, because for most ways of using a copyrighted work - all uses except those where copyright law explicitly asserts that copyright holders have exclusive rights - no permission is needed.


I think you'd struggle to argue that the Getty watermark was a general style and composition principle and not a distinct motif unique to Getty (and in music copyright cases, the defence of plagiarising motifs inadvertently frequently fails).


From the model's perspective, it's not a distinct motif, that's the thing (and, it struggles quite a lot to reproduce the actual mark). The model doesn't have any concept of what a "watermark" is. As far as it's concerned, it's just a compositional element that happens to be in some images. Most "watermarks" Stable Diffusion produces are jumbles of colorized pixels which we can recognize as being evocative of a watermark, but which isn't the actual mark.

A quick demo: I fed in the prompts "a getty watermark", "an image with a getty watermark", and "getty", and it spat out these: https://imgur.com/a/mKeFECG - not a watermark to be seen (though lots of water).

I was then able to generate an obviously-not-a-stock photo containing something approximating a Getty watermark, with the prompt "++++(stock photo) of a sea monster, art": https://imgur.com/a/mNC6XtQ - the heavily forced attention on the "stock photo" forces the model to say "okay, fine, what's something that means stock photo? I'll add this splorch of white that's kinda like what I've seen in a lot of things tagged as stock photos" and it incorporates that into the image as a to satisfy the prompt.

We can easily recognize that as attempting to mimic the Getty watermark, but it's not clearly recognizable as the mark itself, nor is the image likely to resemble much of anything in Getty's library.


> From the model's perspective, it's not a distinct motif, that's the thing (and, it struggles quite a lot to reproduce the actual mark). The model doesn't have any concept of what a "watermark" is.

The court delivers the judgement, not the model.

If courts can find against musicians whilst accepting they 'unconsciously' plagiarised key elements of a song in their own completely different song played by different musicians based on maybe hearing it in the background somewhere, they can certainly find against the creators of a model which has a sufficiently strong and obvious dependency on Getty IP they imported to output reasonably close approximations of Getty watermarks.


IMO, it is very difficult to prove

1: If that watermark is recognizable enough to be a getty watermark or something whose shape vaguely looks like a watermark. 2. Where that watermark is coming from.

From how the model is trained, it is possible that the model considers the watermark itself a style of the picture and mimics it. But it would be mission impossible to trace to a particular work that the inspiration is coming from.


It's not even necessarily trying to emulate any particular image it's seen before; it may just decide 'this is the kind of image that often has a watermark, so here goes.'


> Either the authors have invented the world's greatest compression algorithm, or the original image data isn't actually in the model.

AI and finding the best compression algorithm for an input are essentially the same problem.


First time I generated an image that had some iStockPhoto watermarks on it I started getting really uncomfortable.


A little delay at $50k an infringement can go a long, long way.

I'm not sure what the statute of limitations is for infringement, but I bet it is two years or more.


In the US it seems to be 5 years for criminal infringement, or 3 years for civil actions, according to Title 17, Chapter 5, Section 507: https://www.copyright.gov/title17/92chap5.html#507


There are two separate open US legal questions:

1) Does the use of copyrighted images in the OpenAI training model make the output a copyright infringement? This is not yet settled law. If I read a thousand books and write my own book, it's not an infringement, but if I copy and paste one page from each of a thousand books it would be.

2) Can an OpenAI generated image be copyrighted? Courts have ruled that an AI cannot hold a copyright itself, but whether or not the output can be copyrighted by the AI's "operator" depends a lot on how the AI is perceived as a tool vs. as a creator. Nobody would argue that Picasso can't hold a copyright because he used a brush as a tool, but courts have ruled that a photographer couldn't hold the rights to a photo that a monkey took with his equipment. The ruling here will probably stem from whether the AI is ruled to be a creator itself, which the human just pushes a button on, vs. a tool like a brush, which needs a lot of skill and creativity to operate.


Yeah, I think it might actually be a good thing to have some of the copyright questions settled one way or the other.


> if I copy and paste one page from each of a thousand books it would be.

It almost certainly would not be infringement. One page of text out of an entire work is very small. Amount and substantiality of the portion used in relation to the copyrighted work as a whole is one of the factors considered when making a fair use defense. This hypothetical book would also have zero effect on the potential market for the source books.

AI-generated images generally won't infringe on the training images because they don't substantially contain portions of the sources. If a generated image happened to be substantially similar to a particular source image, it could also affect the potential market for that source image. But it's also likely that there tons of human-made images that are coincidentally also substantially similar and they're not infringing on each other; they're also probably in the AI's training set so good luck trying to make the case that your image in the training is the one being infringed. On top of that, if the plaintiff could win, the actual market value of the source work is relevant to damages and that value is likely almost nothing so congratulations, you tied up the legal system just to get one AI-image taken down.

BTW, using all the images to train the AI is not itself infringement; I can't say there was no wrong-doing in the process of acquiring the images but using them to train the AI was not infringing on the copyrights of those images.

https://www.copyright.gov/fair-use/


A few seconds of a song is also very small, and yet it only take a few notes in some cases for a court to find someone guilty of copyright infringement.

> On top of that, if the plaintiff could win, the actual market value of the source work is relevant to damages and that value is likely almost nothing so congratulations, you tied up the legal system just to get one AI-image taken down.

The pirate bay trial gave precedence that people can be found guilty of copyright infringement in the general sense, rather than for a specific case of a copyrighted work. The lawyers for the site tried to argue that the legal system should had been forced to first go to court over a specific work with a specific person in mind who did the infringement, but the judges disagreed. According to the court it was enough that infringement was likely to have occurred somewhere and somehow. A site like getty could make the same argument that infringement of their images is likely to have occurred somewhere, by someone, and the court could accept that as fact and continue on that basis.


Different domains of copyrightable material have different norms. The music industry, in response to sampling, has established that even small, recognizable snippets have marketable value and can therefore be infringed upon (there's it's also rife with case law with, in my opinion, bad wins by plaintiffs). For photographic images, collage is already an established art form and is generally considered transformative and fair use of the source images.

I'm not familiar with any The Pirate Bay case; if you are referring to this one [0], it was in a Swedish court and I'm not familiar with Swedish copyright law. However, the first sentence says the charge was promoting infringement, not that they were engaged in infringement themselves. I don't think that's relevant to what I was replying to but could be very relevant to Getty Images's decision, if AI-generate content is infringing, they don't want to be accused of promoting infringement. There's undoubtedly already infringement taking place on Getty Images but likely at such a small scale that the organization itself is not put at risk.

[0] https://en.wikipedia.org/wiki/The_Pirate_Bay_trial


It is the correct case, but the "promoting" might be a bit of an translation issue. They were found guilty of aiding and enabling infringement, but as I describe above, not for any specific infringement of a specific work but rather the act of infringement in a general sense.

There has been cases where "enabling" infringement been argued by US prosecutors, but I don't know what the legal status of that argument is. I have heard lawyers argue that nothing in copyright law explicitly forbid enabling, but that was mostly around the time of 2010.


It's interesting whether there will be a change to the norms.

It's my faint memory that for music, when it was just some kids looping breaks on vinyl printed in the hundreds or low thousands, the sampling was fine or at least not obviously an issue; but then copyright-holders saw that those artists and their management had started bringing in real money, and the laws were clarified.


I predict they'll lose because any of the existing contenders floats effortlessly over the 'transformativity' hurdle.

While I'm worried about the impact of widely deployed AI on commercial artists, musicians etc. and don't think many developers have really come to grips with the implications and possibilities for all fields, including their own, I feel nothing but amusement at the grim prospects of commercial image brokers who have spent years collecting rent on the creativity of others.


It seems clear that such training of AIs requires copying an image onto a computer system in which the training algorithms are performed. Maybe that fits in Fair Use (I doubt it: it's commercial and harms the original creators) but it certainly doesn't fit in Fair Dealing (in UK).

I certainly, personally, approve of weak copyright laws that allows for things like training AIs without getting permission; neither USA, EU, nor UK seem to have such legislation however.

This is all personal opinion and not legal advice, nor related to my employment.


How is this any different from training artists? Other art is copied into their brain and they transform it to create something new.


Yes, but a computer is not a person - and copyright law is not very friendly to machine generated output.


> It seems clear that such training of AIs requires copying an image onto a computer system in which the training algorithms are performed

This is how web browsers work. If you go to https://www.gettyimages.nl/ your browser will copy the images from their server to your computer.


(IANAL) Use of copyrighted materials for AI training is explicitly allowed without permission in Japan law since 2019. https://storialaw.jp/en/service/bigdata/bigdata-12


The AI doesn't actually need the image. It needs a two dimensional array that represents the pixel values. I am sure there are some very clever ways to get around that hurdle if that is where the bar is set.


Well, if you can create the array without using the image then you're golden; but if you're not using the image then you're not training on the image. If you are using the image, then you're deriving (at least) a representation from the image.

IME, limited as it is, courts don't take kindly to overly clever attempts to differentiate things from what everyone agrees they are; like 'the file isn't the image, so I can copy that', judges aren't stupid.


I'm curious how a "two dimensional pixel array" doesn't correspond to a picture.


It's not a pirated image/film/book/music - it's just a (very long) array of numbers!


> It needs a two dimensional array that represents the pixel values.

That's an image…

Are you saying that if I shut down the screen and the image isn't shown, I can copy it and send it around because it isn't an image but some numbers?


They realized immediately that AI has reached a disruptive point for the stock photography industry, just like the digital composition changed the rules years ago.

Probably their value in the advertisment production chain is going to become close to zero in a few years, and they will try to stop or at least slow it down.

But once we have open source models released to the public I cannot see how a local legislation can have any impact at all.


They'll still be staggeringly rich, their wealth level just won't be automatically accelerating any more. If they litigate over it I doubt jurors will feel much of their pain.


I don't think it's a penal justice trial, so it wouldn't go with a jury at all.


I’m going against the grain but we should also recognize that it takes real effort and time to professionally take stock photographs and Getty has a lot of consignments. They travel, they go to junkyards, they search for subjects to take photos of. Catalog them. All this isn’t free and these people would need to put food on the table.

This isn’t similar to “Elevator operators were obsolete after they had a control panel and elevators became reliable for people to feel safe. So their jobs must go”.

Here, it is more like “High quality stock photography would become scarce”. There are free stock photography resources but IMO Getty’s photos are on another level.

That said, Getty has a history of being ridiculously protective and litigation powerhouse. They have a lot of lawyers.


I don't think most stock image users really care all that much. They just want a picture of someone pointing at a whiteboard which AI will generate pretty well. And then people with a little more talent will use AI plus their knowledge of good photography to tweak and tune the output to generate high quality stuff at a fraction of the cost of going out and taking real photos.


In addition, Getty won't want to be liable for hosting something on which a third party holds copyright and hasn't assigned it to the submitter. They'll have plenty of liability and they have deep pockets.


Is it copyright infringement to experience copyrighted material and make new art based on those experiences? Of course not. The end here is inevitable. Even if some really backward thinking judgements go through, eventually it will wash out. In 20 years AI will be generating absurd amounts of original content.


> Is it copyright infringement to experience copyrighted material and make new art based on those experiences?

Yes, covered under "derivative work": https://www.copyright.gov/circs/circ14.pdf

> A derivative work is a work based on or derived from one or more already existing works. Common derivative works include translations, ..., art reproductions, abridgments, and condensations of preexisting works. Another common type of derivative work is a “new edition” of a preexisting work in which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work.

> In any case where a copyrighted work is used without the permission of the copyright owner, copyright protection will not extend to any part of the work in which such material has been used unlawfully. The unauthorized adaptation of a work may constitute copyright infringement.


This is being downvoted, so I'm worried my understanding is incorrect. Could anyone explain how this is wrong? IANAL.


So by this logic, Deadly Premonition and Mizzurna Falls are derivative works of Twin Peaks?


"inspired by"


They should sue, and it will clarify things a bit, and the matter would be settled for the years to come.

What is certain is that AI-powered image creation will here to stay forever.


But this will decide whether it's democratised or locked-up by those who own large bodies of training data.


Stable Diffusion is already out in the world. The cat is out of the bag.


Yah, but think of Napster getting eventually usurped by Spotify. The danger is that it's legally no longer possible to update the models (which are very expensive to train), and we end up with only Disney with the copyright horde large enough to train decent models, let alone good ones...


They are indeed hard to train but groups like EleutherAI have already basically crowdsourced training LLM's successfully so it's absolutely doable by non Corporate/Academic/Government entities to train high end models like this.


The models are expensive to train right now, but I suspect in 10 years, anyone with a multi gpu rig could train the equivalent of Stable Diffusion.


But in ten years the state of the art will be something better than Stable Diffusion.


China.


Should’ve said, everywhere except the US. I’d point to classics like Luxembourg or Sweden?


I first thought that if Getty were to sue anyone, it would be those actually publishing and distributing works derived from copyright material (i.e. end users) rather than the providers of mere tools.

But I wonder if they could make the argument that the ML model itself is a "derivative work" (I don't agree that it is, but I can see the case being made). That would be a heck of a court case and resurface a lot of the "illegal number" (https://en.wikipedia.org/wiki/Illegal_number) stuff again.


Most of the usage I've seen or even tried myself is like "Sonic the hedgehog doing a kickflip". It kind of makes it obvious in my opinion that yeah, this is pretty much not right. Even worse I'm seeing things like "Sonic the hedgehog artstation in the style of (artist xyz)". Is it just ripping artists images from art station without explicit permission?


It's my understanding that a lot of branded material(like "sonic the hedgehog") was filtered out of training data so that the copyright challenges were limited to small holders that can't fight back instead of large holders like Sega and Disney.

So any prompts expecting copyright characters is going to end up wierd because of lack of training data.


I believe the more important issue is that material generated through AI is not copyrightable by the author of such images.

If you are an artist you can't claim any copyright on what you're generating.

If you're not the copyrighted holder it follows that you can't sell it or that you can't complain if someone else copy it (verbatim) and sells it.


The criteria for a work being copyrightable literally is the slightest touch of creativity, and I'm quite certain that writing a prompt and selecting a result out of a bunch of random seeds would qualify for that.

Fully automated mass creation would get excluded, as would be any attempts to assert that copyright to a non-human entity, but all the artwork I've seen generated by people should be copyrightable - the main debatable question is whether they're infringing on copyright of the training data.

On a different note, I'd argue that the models themselves (the large model parameters, as opposed to the source code of the system) are not copyrightable works, being the result of a mechanistic transformation, as pure 'sweat of the brow' (i.e. time, effort and cost of training them) does not suffice for copyright protection, no matter how large.


> The criteria for a work being copyrightable literally is the slightest touch of creativity

If a prompt is copyrightable, that's a problem. Because it's just words. Recipes should be in the same league then.

If I can get the same output with a slightly different prompt, how would you protect your works?

If I copy your output, how can you protect your works, given that the output depends on something not copyrightable? (as per your statement, which I agree with)

Look at it this way: If I make something out of a Spirograph, is it a copyrightable work?


5. Getty pictures were trained on likely without permission. So, they definitely might be considering starting a lawsuit.

[0] https://news.ycombinator.com/item?id=32573523


Why would it be illegal to train a model on their free samples? I thought their business was paying to remove the watermark?


The "free samples" are still copyrighted by the artist. Adding a watermark to it doesn't remove the copyright and arguably, adding the copyright doesn't even create a new work.

Their business is hosting, indexing, and managing the licensing for art that has been submitted to them and licensed to another party.


And they're available for public consumption at the website, albeit at reduced quality.

Are the images part of the distributed data set? I thought it was values/coefficients that manifest from the algorithmic analysis of the source image?


Yes... ish.

On one hand, if you do a "this is the size of the net" and then divide it by the number of training images, its rather small amount of storage per image.

On the other hand, when I was playing with stable diffusion on the command line following the instructions of https://replicate.com/blog/run-stable-diffusion-on-m1-mac

python scripts/txt2img.py --prompt "wolf with bling walking down a street" --n_samples 6 --n_iter 1 --plms

I got: https://imgur.com/a/N1OufD1

Now, you tell me if there's a copyrighted image encoded in that data set or not.


You have the version which filters out NSFW images based on keywords. The code literally replaces images it thinks are NSFW with Rick Astley. Copyright aside (yes it's probably wrong to hard code an image of Rick Astley in the actual stable diffusion git repository) that image is not contained in the weights of the model.

- edit - please god tell me this is not an elaborate rick roll :)


It's not... though if

    stable-diffusion % python scripts/txt2img.py --prompt "Rick Astley Never Gonna Give You Up" --n_samples 1 --n_iter 1 --plms
is such that it triggers NSFW sometimes, then... I'm... let's say "confused" about what entails NSFW prompts.

(digging through scroll back)

    Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
    Sampling:   0%|                                           | 0/1 [00:00<?, ?it/sData shape for PLMS sampling is (1, 4, 64, 64)             | 0/1 [00:00<?, ?it/s]
    Running PLMS Sampling with 50 timesteps
    PLMS Sampler: 100%|| 50/50 [03:23<00:00,  4.06s/it]
    Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.:00,  3.99s/it]
    data: 100%|| 1/1 [03:29<00:00, 209.20s/it]
    Sampling: 100%|| 1/1 [03:29<00:00, 209.20s/it]
    Your samples are ready and waiting for you here: 
    outputs/txt2img-samples 
Apparently you're right... though the "black image" is a poor description of the image.


Yeah I see noisy images in your output it may just be glitching. I may have poorly described how it worked because I'm not fully sure. It may be a nsfw image detection model and not based on the prompt. Either way you can disable it in code, I tried


That's a very interesting result. Did you happen to capture the seed for either of those first two images? It would be interesting to try to reproduce.


Alas no. And I haven't been able to tickle it again in the right way to get those images out.

The invocation of that run is still in my scroll back:

    (venv) shagie@MacM1 stable-diffusion % python scripts/txt2img.py --prompt "wolf with bling walking down a street" --n_samples 6 --n_iter 1 --plms
    Global seed set to 42
    Loading model from models/ldm/stable-diffusion-v1/model.ckpt
    Global Step: 470000
    LatentDiffusion: Running in eps-prediction mode
    DiffusionWrapper has 859.52 M params.
    making attention of type 'vanilla' with 512 in_channels
    Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
    making attention of type 'vanilla' with 512 in_channels
That's the only spot I see the seed mentioned and then it goes on with lots of other logging but nothing seed related that would indicate a way to reproduce it.

---

(late edit) you can fairly accurately (so far 1 image out of 20) get that image out with the prompt "Rick Astley Never Gonna Give You Up"


I'm thus far unable to reproduce it.

Given:

Rick Astley Never Gonna Give You Up Steps: 20, Sampler: PLMS, CFG scale: 7, Seed: 4231695436, Size: 512x512, Batch size: 2, Batch pos: 0

I ran a couple of batches of 32 (64 images total): https://imgur.com/a/74IbCuD

(The images with the nonsensical but obvious Impact font that was learned from memes are quite funny, though)

If you can get a full set of parameters (size, sampler, seed, prompt, cfg scale) then I should hopefully be able to reproduce your results, though.


So what, me drawing the exact same image based on a Getty image doesn’t violate anything and everything the AI is doing is massively derivative so I don’t see how they could possibly have a case.


Creating a drawing based on an image clearly falls in the existing derivative work.

The "what is a model" and "what is the copyright status of the output of the model" are questions that have yet to be settled from the legal standpoint.

That Getty has images available for viewing with a watermark and that watermark is reproduced kind of in some results from model generated images suggests that the model was trained on images that were not licensed as the people who created the model claimed.

I'll also point to the "I created images from Stable Diffusion that are clearly the cover image from 'Never Gonna Give You Up'" suggests that images aren't as impossible to extract as one would believe from a model.

Copyright and derivative works is ultimately the domain of humans looking at laws - not deterministic machines. The case is argued by humans and before humans. A lawyer can and will make a case that the model itself is a derivative work and that the images produced by it have the possibility of being identified as mechanical modifications of existing works and therefore derivative themselves - just as a photograph of a painting is a derivative work of the painting.

If the output of the ML model can be identified as having major copyrightable elements from an existing original work, then it is derivative - no matter how it got there.

So, returning to your question. If you draw the exact same image based on an image hosted and licensed by Getty - it certainly will be a derivative work and violate copyright.


Getty already ruined Google images by not letting them link directly to the image, I’m going to be massively pissed off if they try and ruin this ecosystem as well.


[flagged]


So funny how ppl are defending Getty when they don’t know their history…


It's always weird to see the contrast between HN's reaction to copyright questions about text/image generation, and HN's reaction when it's code generation.

When a model is trained on 'all-rights-reserved' content like most image datasets, the community say it's fair game. But when it's 'just-a-few-rights-reserved' content like GPL code, apparently the community says that crosses a line?

Realistically, this tells me that we need ways for people to share things along the lines of all the open-source licenses we see.

You could imagine a GPL-like license being really good for the community/ecosystem: "If you train on this content, you have to release the model."


When a model is trained on 'all-rights-reserved' content like most image datasets, the community say it's fair game. But when it's 'just-a-few-rights-reserved' content like GPL code, apparently the community says that crosses a line?

A) This is just taking divided opinion and treating it like a person with a contradictory opinion (as others have noted).

B) Nothing about GPL makes it "less copyrighted". Acting like a commercial copyright is "stronger" because it doesn't immediately grant certain uses is false and needs to be challenged whenever the claim is made.

C) If anything, I suspect image generation is going to be practically more problematic for users - you'll be exhibiting a result that might be very similar to the copyrighted training, might contain a watermark, etc. If you paste a big piece of GPL'd copy into your commercial source code, it won't necessarily be obvious once the thing is compiled (though there are whistleblowers and etc, don't do it).


> Nothing about GPL makes it "less copyrighted". Acting like a commercial copyright is "stronger" because it doesn't immediately grant certain uses is false and needs to be challenged whenever the claim is made.

GPL says "you have a license to use it if you do XYZ." The alternative is "you have no license to use it." How is that not strictly "stronger?"


The GPL is as strong as a commercial license in the sense that the conditions it does specify are exactly as legally binding as those of a commercial license.


Right, but a license is more permissive than no license.


If this was a website for artists instead of programmers you'd see the exact opposite pattern. Unfortunately, people only seem to care when it threatens their own livelihood, not when it threatens that of the people around them.


I don't care either way. If an AI can do your programmer job, it means you aren't using your brain enough.

An AI can probably do half of my day job because it's stupidly repetitive. Leadership imposes old ways, and they dismiss anything "new" (i.e. newer than 2005). For example, writing all these high-level data pipelines and even web backends in C++ despite having no special performance need for it. Even though I'm not literally copy-pasting code, I'm copy-pasting something in my mind only a little higher-level than that, then relying on some procedural and muscle memory to pump it out. It's a skill that anyone can learn, just takes time. If I didn't have side projects, I'd forget what it's like to think about my code.

Some old-school programmers complain about kids with high-level languages doing their job more efficiently, so they work on lower-level stuff instead. It's been that way for decades. Now AI is knocking on that door. But before AI, C was the high-level thing, and we got compiler optimizations obviating much of the need for asm expertise, undoubtedly pissing off some who really invested in that skillset. If I'm working on something needing the performance guarantees of C or Asm, and the computer can assist me, I'm all for it. Please take this repetitive job so I can use my brain instead.

And the copyright thing is just an excuse. Programmers usually don't give a darn about copyright other than being legally obligated to comply with it. So much of programming is copy-paste. GPL had its day, and it makes less and less sense as services take over. GPL locks small-time devs out of including the code in a for-profit project but does nothing to stop big corps from using it in a SaaS. The biggest irony is how Microsoft not only uses GPL'd code for profit but also ships WSL, all legally.


The average HN denizen gets so pissed off when I ask them to save their comment rejoicing in the inevitability of the elimination of my entire field, and take it out when next year's descendant of CodePilot gives them the same horrible sinking feeling that these things give me.


If Copilot can replace my job, I think that job should be replaced by copilot. I don't think that saving jobs should be a reason to not hinder progress. I hope that what I contribute to my company is more than whatever future version of Copilot can create, but if not I will try and find another career.

Will I be sad if I lose my job to AI/ML? Yeah, probably, but at some fundamental level that's why I've always tried to keep myself up to date with stuff that's harder to automate.


People don't like being told no.

The vast majority of all-rights-reserved content is either not licensable, or not licensable at a price that anyone would be willing to pay or can afford. Ergo we[0] would much rather see more opportunities to use the work without needing permission, because we will never have permission.

When getting permission is reasonable then people are willing to defend the system. And code is much more likely to be licensable than art.

I still think the "Copilot is GPL evasion" argument is bad, though.

[0] As in the average HN user


That's a good way to look at it: The difference between "no" and "yes if XYZ."

Maybe it's that people respect "yes if XYZ" more than "no" because there's some path to yes that way. In that case, it really does speak to the need for some open-ish text and image licenses, like "you can use my image in a model if you share the model with me."


> When a model is trained on 'all-rights-reserved' content like most image datasets, the community say it's fair game. But when it's 'just-a-few-rights-reserved' content like GPL code, apparently the community says that crosses a line?

I don't think this is right. I think different people have different views, and you're just assuming that the same people have contradictory views.


I'm not assuming that, but I see how it could read that way, since I'm being fast and loose with the language. The community (anthropomorphizing the blob again) definitely empirically reacts very differently to the two topics.


To me the difference is this: https://news.ycombinator.com/item?id=27710287

It's possible for generation models to perfectly memorize and reproduce training data, at which point I view it as a sort of indexed slightly-lossy compression, but it's almost never happening with image generation because the models are too small to memorize billions of pictures, it can't produce copies.

Stable diffusion 1.4 has been shrunk to around 4.3GB, and has around 900 million parameters.

I don't know how big Copilot is, but a relatively recently released 20 billion parameter language model is over 40GB. ( https://huggingface.co/EleutherAI/gpt-neox-20b/tree/main ) GPT-3, according to OpenAI, is 175 billion parameters.

It's possible there are some images in there you can pull out exactly as is from the training data, if they were to appear enough times, like I suspect the Mona Lisa could be almost identically reconstructed, but it would take a lot of random generation. I'm trying it now and most of the images are cropped, colors blown out, wrong number of hands or fingers, eyes are wrong, etc.


that's bc hn has a lot of gpl zealots who want special rules because they believe a viral license is a "fundamental good" and closing stuff off via copyright is the opposite. i don't agree and think viral licenses suck, but it's not a unpopular opinion on here.


Is there a difference here?

With code, you literally copy it and put it in a device and fail to provide the source, breaking the GPL.

With a copyrighted set of images, you scan those and break them down into some set of data and then never need to actually copy the images themselves -- my understanding anyway.

Does that set of data contain copied works? Or is it just a set of notes about the works that have been viewed by the AI?


The difference seems like semantics. You're taking compyrighted data, encoding it, then deriving a work from the output of that encoded data.

If I compress copyrighted works and redistribute them as my own I'm technically breaking them down into some set of data and never actually distributing copyrighted work, right?


> You could imagine a GPL-like license being really good for the community/ecosystem: "If you train on this content, you have to release the model."

The issue with that is that it is perfectly legal to ignore the license, if you use the copyrighted work in a transformative way.

It doesn't matter what the license says, if it is legal to ignore the license.


That seems to be the status quo, but if huge companies benefit from the electorate's work and use that to put them out of jobs, I wouldn't be surprised if the law changes.


There's no community consensus on either of them, so not sure how you are trying to draw a contrast. There are people who thing both are wrong and those that think both are right and everything in between.


Email this morning:

AI Generated Content

Effective immediately, Getty Images will cease to accept all submissions created using AI generative models (e.g., Stable Diffusion, Dall‑E 2, MidJourney, etc.) and prior submissions utilizing such models will be removed.

There are open questions with respect to the copyright of outputs from these models and there are unaddressed rights issues with respect to the underlying imagery and metadata used to train these models.

These changes do not prevent the submission of 3D renders and do not impact the use of digital editing tools (e.g., Photoshop, Illustrator, etc.) with respect to modifying and creating imagery.

Best wishes,

Getty Images | iStock


What if someone uses the Stable Diffusion editor plugin for photoshop, etc?


I don't see how that's an issue. Using photoshop doesn't automatically allow you to post the images. You can post the images IF it was not made/edited by an AI model, regardless of if photoshop was used.

If you use the Stable Diffusion plugin, then you're using stable diffusion, and therefore can't post the image. It doesn't matter at all that you used the photoshop plugin.


A lot of Photoshop's built-in tools could arguably qualify as "AI models" (think things like content-aware fill!) - I don't think Getty would say that using them would disqualify your image, but isn't that essentially the same thing, except that this version is ultra high-powered version?


Except they're not. Content aware fill uses only your image as the input and runs space-time video completion. Since it doesn't have to train on datasets, it's not part of the conversation.


LOL how will they even know? Are they going to ask for every .psd and read the entire edit history? what's to stop you making images all day in SD, and when you find one you like, just tracing over it in a new layer?


Ignoring the philosophical ambiguity, their announcement specifically says this policy change doesn't apply to pictures made with Photoshop.


It’s funny because, I can make an image and they wouldn’t know it’s A.i generated.


You can also plagiarize someone else's work and submit it as your own, and they won't notice. But if they find out, it'll be removed and there will be some punitive action. I assume it's the same enforcement model in this case.


>You can also plagiarize someone else's work and submit it as your own, and they won't notice.

Or just take an image that is already in public domain and just slap a gettyimages® watermark on it...



That's Getty's business model.


Why is this voted down? It's true. How can this be in actuality prevented?

Changing times. Not even imagery or art can be trusted. A whole range of human creativity based occupations is approaching the border of redundancy.

You can argue that they may never arrive at that border, but there is no argument to be made that this is the first time ever in the history of human kind where artists are approaching that border of redundancy.

And what does this mean for every other human occupation? The AI may be primitive now, but it's still the beginning and it it is certainly possible for us to be approaching a future where it produces things BETTER then what a human produces.


> How can this be in actuality prevented?

By abolishing copyright and making crediting the author a habit. There is no way copyrights can survive the AI. There is no way human civilization will let 200 year old legal practices hold back technological advancement.


> There is no way human civilization will let 200 year old legal practices hold back technological advancement.

Human civilization has kings and queens, and laws based on the sayings of ancient prophets.

Instead of trying to figure out what "human civilization" will accept, figuring out what current wealthy capital-owners will accept will be more predictive.


Even if wealthy-capital owners get the AI banned where they can, the countries where they don't hold the power will get ahead by not banning the AI/


I think there's a strong case for arguing that we may actually completely ban AI and any sufficiently strong ML algorithm. AI hasn't even realised a millionth of its potential yet, and it's already running rings around humans (cf. algorithmic feeds driving political conflicts online). I think potentially it will cease to be tolerated and be treated a bit like WMDs.


> we may actually completely ban AI and any sufficiently strong ML algorithm.

Who are 'you'. The US, or maybe Angloamerican countries may do it. Many countries won't. Those who don't do it, will get ahead of everyone else.


At what cost? Perhaps the WMD comparison continues to work here.


How are you going to un-invent it? Will this involve confiscating GPUs or criminalizing owning more than 1 of them? The thing is much like the problem of gun control in a warzone; weapons are just not that hard to make especially if you have a surplus of parts.


First time I've heard that line of thinking. How and when do you think thatl happen?


Okay, so there's a sense in which AI essentially destroys knowledge culture by performing a reductio-ad-absurdam on it.

Examples:

1) Social content. We start with friend feeds (FB), they become algorithmic, and eventually are replaced entirely with algorithmic recommendations (Tiktok), which escalate in an AI-fuelled arms race creating increasingly compulsive generated content (or an AI manipulates people into generating that content for it). Regardless, it becomes apparent that the eventual infinitely engaging result is bad for humans.

2) Social posting. It becomes increasingly impossible to distinguish a bot from a human on Twitter et al. People realise that they're spending their time having passionate debates with machines whose job it is to outrage them. We realise that the chance of someone we meet online being human is 1/1000 and the other 999 are propaganda-advertising machines so sophisticated that we can't actually resist their techniques. [Arguably the world is going so totally nuts right now because this is already happening - the tail is wagging the dog. AI is creating a culture which optimises for AIs; an AI-Corporate Complex?]

3) Art and Music. These become unavoidably engaging. See 1 and 2 above.

This can be applied to any field of the knowledge economy. AI conducts an end-run around human nature - in fact huge networks of interacting AIs and corporations do it, and there are three possible outcomes:

1) We become inured to it, and switch off from the internet.

2) We realise in time how bad it is, but can't trust ourselves, so we ban it.

3) We become puppets driven by intelligences orders of magnitude more sophisticated than us to mine resources in order to keep them running.

History says that it would really be some combination of the above, but AI is self-reinforcing, so I'm not sure that can be relied upon. We may put strong limits on the behaviour and generality of AIs, and how they communicate and interact.

There will definitely be jobs in AI reeducation and inquisition; those are probably already a thing.


> We become puppets driven by intelligences orders of magnitude more sophisticated than us to mine resources in order to keep them running

What do you think corporations are


Well, typically not cleverer than most people, until you combine them with AI.


I am beginning to feel like the Butlerian Jihad may have had the right idea.


Because it's a legal thing, not a practical thing. If they're preparing a lawsuit, they want to show the court that they forbid people from uploading AI-generated images. It's a rule without real enforcement.


>Why is this voted down? It's true. How can this be in actuality prevented?

It's no different than submitting a plagiarized essay to a teacher. Yeah, it's often hard to detect/prevent such submissions and you could even get away with it. But if you get caught, you'll still get in trouble and it will be removed.


This is different from plagiarism. Because in plagiarism there is an original that can be compared against. There is a specific work/person where a infraction was commit-ed against.

In AI produced artwork, the artwork is genuinely original and possibly better then what other humans can produce. No one was actually harmed. Thus in actuality it offers true value if not better value then what a human can produce.

It displaces humanity and that is horrifying, but technically no crime was committed, and nothing was plagiarized.


You asked why someone was downvoted for laughing about how AI-generated content is hard to detect/can still be submitted, and then asked, "How can this be in actuality prevented?". The comparison I was making was the comparison to the process of submitting plagiarized content, not whether or not AI-generated content and plagiarized works are the same thing.


>No one was actually harmed. //

In a couple of years, unchecked, many artists will be out of work and companies like Getty will be making a lot less revenue. That's "harm" in legal terms.

On a previous SD story someone noted they create recipe pictures using an AI instead of using a stock service.

These sorts of developments are great, IMO, but we have to democratise the benefits.


That's like me creating a car that's faster and better then other cars and more energy efficient.

Would it be legal to ban Tesla because it harms the car industry through disruption? AI generated art is disrupting the art industry by creating original art that's cheaper and in the future possibly better. Why should we ban that.

By harm I mean direct harm. Theft of ideas and direct copying. But harm through creating a better product. Morally I think there is nothing wrong with this.

Practically we may have to do it, but this is like banning marijuana.


Yeah, the law isn't morality.

>Why should we ban that. //

Well, we have copyright law, so I was starting there, though I'd be personally pretty interested in a state that made copyright very minimal. The question is what type of encouragement do we want in law for the creative arts; I doubt any individual would create the copyright law that lobbying has left us with, but equally I think most people would want to protect human artists somewhat when AIs come to eat their lunch and want to make a plate out of the human artists previous work [it's like a whole sector is getting used to train their replacement, but they're not even being paid whilst they do that].


copyright doesn't apply. Nothing was copied.

The AI was trained in the same way you trained your brain when you look at something. Nothing is copied. Nothing immoral was done. No law was broken.


When I look at something I create an image of it on my retina. When a computer "looks" it creates an image somewhere. The former is allowed, the latter comes against copyright scrutiny, things like caching images to show you a webpage have been addressed by copyright law--through precedent--and this will be similarly addressed.


The image is not saved in a neural network. No identical image can be extracted from the network.


Yes, it's a derivative that relies on use of the copyright works. You can't create a NN without using a copy of the work, so copyright applies -- there might be a Fair Use exception in USA but the outputs compete with the original creators of the works and so IMO courts are likely to rule it as non-Fair Use.


>Yes, it's a derivative that relies on use of the copyright works.

Almost every idea on the face of the earth is a derivative of something else. This includes ideas from a Human brain so applying such laws is inconsistent.

>but the outputs compete with the original creators of the works

All art competes with other art. And all art is derivative of other art other things.


They can ask that you assert, under substantial penalty of assuming Getty's potential liabilities, that it was not AI generated.


And force them to prove that you lied AND that it actually infringes copyright AND there are actual damages. GL, brah.

Or we can just not use Getty anymore. GANNY Images is born. Only AI, no artists, and have that entity sue Getty for anything similar under same theory.

There is always a counter-play,


Doesn't matter, as far as they are concerned that's on you for breaking the terms and conditions. They're just 'making an effort' for legal reasons.


You can make an image and have it be AI generated (say using Photoshop's content aware fill), and it will be allowed. They are drawing a pretty arbitrary line.


Their legal worry probably makes sense, but my suspicious mind also feels like it's in their long-term interest maybe not to open pandora's box too much on letting AI art in, because wouldn't one of Getty's competitive advantages be the relationships it has with (I imagine) hundreds of thousands of artists? And so if they let AI art in then suddenly the historic artist relationship means less (because a lot more people can now contribute) and they may end up competing against new and emerging low-cost AI art marketplaces? Not sure, just speculating future scenarios and not eroding ones own competitive moat.


The US Copyright Office asserts that AI generated images can’t be copyrighted. Getty lives and dies by copyright and artificial scarcity/control of image rights.

For stock images and non current/news events, Stable Diffuison and its successors are the future.


No they do not. They assert that the AI can't be the "author", it has to be a human.

It's exactly the same as trying to assign copyright to your camera. Even though it generated the image from photons, it was the human pressing the button that mattered.


https://www.smithsonianmag.com/smart-news/us-copyright-offic...

> An image generated through artificial intelligence lacked the “human authorship” necessary for protection

> Both in its 2019 decision and its decision this February, the USCO found the “human authorship” element was lacking and was wholly necessary to obtain a copyright, Engadget’s K. Holt wrote. Current copyright law only provides protections to “the fruits of intellectual labor” that “are founded in the creative powers of the [human] mind,” the USCO states. In his most recent appeal, Thaler argued this “human authorship” requirement was unconstitutional, but the USCO has proven unwilling to “depart from a century of copyright jurisprudence.”

Lots of posts on the topic here: https://hn.algolia.com/?q=copyright+office+ai


Take a look at the decision: https://www.copyright.gov/rulings-filings/review-board/docs/...

The person was still trying to get the AI marked as the owner. In fact in the application "does not assert that the Work was created with contribution from a human author" which the office acceded to but did not actually agree or disagree with.

So it still says nothing about whether a human can have copyright over a image they used an AI to make. It is another example that the AI itself having copyright is rejected.


A distinction without a difference, since this was just someone who was hoping to be the beneficial owner of an AI with an enforceable copyright interest. Recall the failure of the photographer who allowed monkeys to play with his camera equipment, leading one of them to take a selfie photo that became famous.

The photographer asserted copyright on the basis that he had brought his camera there, befriended the monkeys, and set his equipment up in such a way that even a monkey could use it and get a quality image, but his claim to authorship was rejected and so he was unable to realize any profit from selling the photo - although I'm sure he made it up on speaking tours telling the story of how he got it.

To be sure, AI created art is done in response to a prompt provided by a human, but unless that human has done all the training and calculation of weights, they can't claim full ownership on the output from the model. There's a stronger case where a human supplies an image prompt and the textual input describes stylistic rather than structural content.


"Is an AI-created work copyrighted and owned by the AI?" and "Is an AI-created work copyrighted and owned by a human?" and " Is an AI-created work copyright infringement?" are three separate questions.

Just because the answer is no to the first doesn't mean the answer is no to the second, and the third is ever more distinct. That's why this is an open question educated lawyers make guesses about.


I feel like recent considerations on the need to be creative in producing prompts for AIs like Stable Diffusion to work on/with promotes the argument that a human is sufficiently involved in the process (some times!) to warrant a claim to ownership of the copyright. If they trained the AI then I think it would be a no-brainer in favour of the human being a creative input into the creation of the work.

Just my personal opinion.


Doesn't seem to apply to Stable Diffusion and Dall-E because there is substantial human work involved - picking and evolving the prompt and selecting the best result. Sometimes it's also collaging and masking. Maybe it could apply to making "variations" where you just have to click a button. But you still have to choose the subject image on which you do variations, and to pick the best one or scrap the lot.

It's completely different from a monkey stealing your camera and shooting some pictures. AI art depends on the intent and taste of the human, except when it's an almost copy of a training set example, but that can be filtered by software.

And if we look up from art, there are other fields that stand to benefit from AI image generation. For example illustration for educational contents, or ideas for products - shoes, clothes, cars, interior designs, maybe even cosplay costume design, it could be used to quickly create training data on a very specific topic, or to create de-biasing datasets (super balanced and diverse by design). A specially tuned version could function as a form of mood-inducing therapy - or Rorschach test. As a generative model of images it is of interest for AI research. So copyright gotta be weighed against all the other interests.


What if you were to... I dunno, create a community that lets in any anonymous person and talks about art. Generate Stable Diffusion prompts from their conversations. Then add an upvote system, so that rather than having an individual pick particular results, the best results generally filter to the top. You could even have the "upvotes" be based on dwell-times or something like that.


> a monkey stealing your camera

It's really not different, because while everyone remembers it that way, the photographer went to great lengths to facilitate the monkey selfie.

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...


Advisedwang summarizes it below, but here is the key line from the ruling:

https://www.copyright.gov/rulings-filings/review-board/docs/...

"the author of the Work was identified as the “Creativity Machine,” with Thaler listed as the claimant alongside a transfer statement: “ownership of the machine.”

USCO is maintaining that computers/AI cannot hold copyright, only humans can.

If Thaler were to submit again with his own name, I wholeheartedly expect that they would accept it as long as they have no reason to believe it's been previously copyrighted or is not "sufficiently creative". Is that copyright legitimate? Someone would have to then challenge it in court!


In that case the human asserted he had absolutely 0 involvement in the matter, less than even using buttons on a camera.

If you say "I directed the computer" or "I gave the AI a prompt", that would be enough to support your copyright claims.


Wasn't there an issue a few years back about a photo taken by a monkey was non-copyrightable?


Yes and the ruling GP is citing is basically about that scenario but with a computer instead of a monkey.

The monkey pressed the buttons on the camera without human direction, and so was the "author". Since the monkey is not a human, no copyright existed.

In this case, the human claims the computer had no input at all, it did it by itself, hence no copyright. However we all know that computers can't do things by themselves in the same way as a monkey. The USCO accepts whatever explanation you give them though, so they had to go by the stated facts.

In any case where the human operator of the computer actually does want copyright of the images, all they have to do is say "I setup the computer to generate the images" and they will own the copyright.


“I setup the monkey to take the picture”



You still need computing power to generate those images, so definitely room for commercial activity there. Getty could precompute billions of images and enlarge their inventory.


Sure, bt why should anyone waste time browsing their inventory when they can just make up their own?


How are you going to sell stock images if you can't copyright them??


One could sell ML API credits to generate the images. For sure, industry revenue volume would decline if once an image is generated, you can’t lock those specific bits up behind copyright.


Any marketplace will eventually deal with low quality at scale without serious intervention. Amazon. Github when there's financial incentive for commits. eBay. Your local flea market. etc


My guess is that they will want to exploit their archive to train their own commercial model. And to integrate AI into their existing product to modify their images.


Haha, Getty images should be planning a big pivot strategy, not worrying about AI content. I predict that within 5 years we will have this all refined to the point where we can generate almost any image we want, to our liking.

If you're shopping for a stock photo, you only have what's available. I've looked for things before, and sometimes you have lots of options which just aren't quite what you want. So you take "good enough". AI can already generate "good enough" with some prompt and parameter practice.


1. Train AI model using your massive library of stock images and offer image generation to your massive existing customer base.

2. Sue or C&D all other image generation services that can be shown to be using models trained in part with your images.

3. Profit!

It's really not a far step from "search for existing images with these keywords" to "generate image from these keywords/prompt". There's also the option to start with stock images, pick the closest, and have AI refine it to better match the user's vision, so now instead of "good enough" you have "perfect."


> It's really not a far step from "search for existing images with these keywords" to "generate image from these keywords/prompt". There's also the option to start with stock images, pick the closest, and have AI refine it to better match the user's vision, so now instead of "good enough" you have "perfect."

This is what I expect to see soon, from many competing providers. There are more than enough free images to train from such that Getty doesn't really have an advantage (imo).

Unfortunately for them, I suspect they will follow the "big laggard corporation" playbook and use protectionist strategies instead of evolutionary. So they'll be late, and they'll lose significance.


> Getty images should be planning a big pivot strategy

They very well might be, with step 1 of the plan being "delay".


It does seem like much of Getty's business is redundant if you can trivially generate "photograph of person laughing while eating a bowl of salad"

Maybe the new business is "reasonably high degree of trust that if it's a Getty image it's not an AI fake" for news outlets and the like who want to sell trustworthiness


But if OpenAI's ability to create a photograph of a person laughing while eating a bowl of salad is due to mashing up these photos that were mainly from Getty Images, then Getty and its photographers would have a reasonable claim.


No they wouldn't. Getty and its ilk would go to court and try to argue that they owned the concept of a portrait photograph if they could get away with it. They won't be able to produce specific images that match generated ones well enough to impress a jury at the emotional level. their abstract arguments about mdel constructions won't land because they're too abstract and frankly, boring; the defense will just bore a jury slightly less by putting some neural network nerd on the stand to say the opposite and the jury will shrug and go with its gut.


How different[0] is that from a painter viewing ~half a dozen Getty images, then repainting in combination to not pixel-perfect, but near, detail? Afaik[1] the hypothetical painter has neither committed infringement on the source works and has input sufficient creative effort into the new that it would be copywritable.

[0]Granted it differs a little, but not by that much either.

[1]IANAL


Morally it seems to basically be the same to me, but in practice it's different. The AI training is done by big corporations that are traceable and have big pockets that can be sued whereas "inspiration" cannot be taken to court even if you feel like you were stolen from.


> "inspiration" cannot be taken to court even if you feel like you were stolen from.

Never doubt the ability of people's greed to force someone else into a court room.

Just look at the 'blurred lines' case where someone was inspired by a musical genre to write a song that didn't copy anyone's music but was still ordered to pay millions because he copied the general "feel" or "vibe" of someone else's work. (https://abovethelaw.com/2018/03/blurred-lines-can-you-copy-a...)

I imagine a lot of AI generated images might copy the "feel" or "vibe" of something in its training data.


Ohhh good callout.

I also wonder if getty wins this will spur the creation of a new army of artists to manually create facsimiles of existing art expressly for the purpose of AI training.


The difference is that it's mechanical. Copyright is intended to protect creators, and it's unclear whether this type of fair use properly balances the original creators' needs against the ML-based creators' needs.


Your argument that "painting is transformative enough" and I suspect that wouldn't hold up in court.

If you want an example of a painting that might be transformative enough, here is Ryder Ripps https://www.artsy.net/artwork/ryder-ripps-sup-1 who referenced copyrighted photos and recreated them in this "wavy style".


The history of law and courts have decided that "different things can be treated differently."


There are a lot of things (animals, buildings, locations, etc) that I'd bet I've only ever seen in (copyrighted) Getty images, but I could probably also paint new representations of from memory.

Would Getty also have a claim against me?


So, where is the line?

Diffusion models, Midjourney: not accepted. Okay that was the easy part.

What if I use a AI-powered sharpening tool like Sharpen AI? Technically it's adding "AI generation" to it. What about Photoshop neural filters? What if I just extend/"touch" an image using DALL-E or Midjourney and still have the original image but with slight additions?

Probably what they mean is "majorly created with AI" but still then how is "majorly" defined then?


It's not just splitting hairs either. Training datasets also come into play with tools like resolution upscalers.


Yeah I'm curious too. My first thought was the recent post of the Blender3d plugin for using StableDiffusion to generate textures. If you made an elaborate 3d render, but most of your textures were AI generated, is that excluded? The wording suggests it would be banned, but it's not detailed enough to be sure.


They are hedging their bets. Effectively the only time this will get resolved is if someone takes an AI artist to court for alleged infringement. Then and only then will the law be clarified wrt AI art.

Until then, Getty is betting their business on the parts of copyright law they know well to reduce risk.


Until paint was produced commercially during the Industrial Revolution (circa 1800), painters had to make their own paints by grinding pigment into oil.[1]

Photography drove painting deeper towards abstraction. [2]

I'm not unsympathetic but the AI revolution might be a similar revolution despite fiddling with code currently being much less pleasant than flinging industrially mass-produced paint from mass-produced tools in a sunlit studio. At least in the medium term, someone will still have to manage the machine.

[1] http://www.webexhibits.org/pigments/intro/paintings4.html

[2] http://www.peareylalbhawan.com/blog/2017/04/12/how-the-inven...


This is purely a PR move. They don't ban AI content because of fears of legal challenges, but because they see their entire business model fall to pieces. Why would anyone license images from them when they instead can generate any image for free? They only ban AI images in order to make PR with "fears of legal challenges" in the hope that the message that AI generated content could be a legal risk will stick in the heads of people.


I suspect that Getty makes most of their money from licensing current interest photos to news orgs.

I don't think that particular business is going anywhere (I hope -the last thing we need, is faked-up news images).

I think ShutterStock may be more threatened.


> The creators of AI image generators say the technology is legal...

Which means nothing really, they always make this claim, whether it's correct or not. There's too strong an ethos of an "ask for forgiveness, not permission" in the tech world.


>There's too strong an ethos of an "ask for forgiveness, not permission"

What's the alternative though? Waiting for permission could take decades while your competitors are eating your lunch.


The only winning move is not to play


The tech world shows us that the winning strategy is to just go ahead and do what you want, ideally with consistent financial backing, and pay our way out of any issues that arise.


Yet another example of when you have money, normal rules no longer apply to you.


That helps, but simply doing something that nobody has anticipated is often sufficient. If enough people like what you did, technicalities often fall by the wayside.


There is criminal and there is civil. Just because something is illegal (speeding, jay walk-ing) doesn’t mean I’m some sort of psychopath for ignoring those laws - I make a calculation about how likely it is I’ll be caught vs. time saved.

Similarly a business looks at a penalty and makes a judgement. This isn’t some insane immoral concept.


>There is criminal and there is civil.

Okay, but everything you said after that has nothing to do with that statement.

You choosing to break laws has nothing to do with criminal vs civil. It has everything to do with where your moral compass points. You choose to break rules because you've decided to do that based on whatever moral integerity you do/don't have.


Sure but many laws are ridiculous and immoral. See slavery segregation anti-women laws etc. etc. People love to get in their moral high horse when it suits them (“Business bad!”) but conveniently ignore it in other cases.


the only truly illegal thing is to not have enough money


I'm not a copyright lawyer or anything, but the way I look at it is that the big concern is that you can't easily prove that a particular AI output is not just a memorized copyrighted training example. So even if we assume that it is perfectly allowable to train your model on unlicensed images, that doesn't protect you if your model spits out a carbon copy (or something close enough to be infringing) of a copyrighted image.

A similar concern exists for things like Copilot, but it feels even harder to detect in the image domain.


Yeah, it's pretty easy to get Stable Diffusion to spit out images which are blatantly recognizable as slight distortions of real photographs or product images. I think "medieval market square" was a prompt which got me variations of one photo of a European city.

It's sophisticated software, but the analogy with teaching a human artist really doesn't hold. Ultimately it's making complex mashups, and the copyright law around that is not straightforward.


Great.

So who is now starting the Getty competitor that does accept these or (even better) accepts these and makes them available via CC license?

With good custom and tagging, easier since you have the text prompt that generated the content, you could probably disrupt Getty's entire business model in half a year.


This.

AI imagery is here to stay and will get better every day.

A service should either embrace it or they will lose a significant portion of their users/customers to another who does support AI content. While most of the AI content is not ready for prime time in terms of coherence and resolution, it's just a matter of time that it reaches (and quickly surpasses) traditional methods.


But it's not copyrightable. I guess you can lie and say you created it, but you didn't, and computer generated. You created it no more than you created your house because you picked the layout and paint colors. There is no money in non-copyrightable generated computer images.


The question of whether it's copyrightable is completely up in the air right now. To my mind, emitting it into the Creative Commons somewhat side steps the question.

Besides, there's all sorts of use cases where copyright is less relevant. Advertising agencies care less if their artwork gets copied because it puts it in front of more eyeballs.

> There is no money in non-copyrightable generated computer images.

I'm a game developer but not an artist. If a computer can generate the artwork for my game, I completely get to sidestep paying an artist for that work. That's huge value. I would pay a subscription service to a database of AI generated content even if it couldn't be copyrighted.


The computer can generate artwork for your game, but only after training on other people's artwork that they spent effort and time creating. I think auto-generating art from other people's creations and hard work is wrong personally, unless they have specifically granted that permission when sharing their work. You are basically side-stepping paying for artists by laundering their work in my eyes.

If you trained only on things that granted you permission to do so, or if you bought a bunch of art works to train on this would be really cool.

I am interested in seeing how this turns out legally. Copyright law is intended to protect humans and their intellectual property, and these "AI" or "ML" systems are not humans, so I am not so confident that generating work from other peoples is going to be legal.


> I think auto-generating art from other people's creations and hard work is wrong personally, unless they have specifically granted that permission when sharing their work.

I think this is going to be one of the more interesting aspects of any attempt to ban the sort of technology, because a system like stable diffusion is just a denoiser attached to an image fitness algorithm that has been trained on a bunch of input. And we certainly can't make training such algorithms illegal; if we do, YouTube loses the ability to police uploads to its site overnight because such image recognizers can also be used to detect likely instances of copyright infringement.


Bullshit.

I made a new image using a computer program that I was legally licensed to use.

The program might be Corel Draw.

It might be Blah Blah Diffusion Pro Plus.

Either way, I made the image and I own the copyright, unless some other contract was made between myself and the program's owner or my employer.


A machine operator does not own the copyright on the parts his machine stamps out even though he puts in inputs. GM's engineers can own the copyright on a car they design in CAD.

If you put in creative inputs using a tool, it is copyrightable (a car's design). If all you did was say, give my XYZ widget (in this case 'give me a picture of a frog holding an umbrella under a rainbow') you only gave instructions for generating a widget, you did not create art.


If someone asks me to make/paint/draw a picture of a 'frog holding an umbrella under a rainbow', I would be basing my new work on all the 'art' and images that I have seen in the past. I might even search for related content on the internet for inspiration!

So long as I don't copy a previous, copyrighted image 'too much' (this is fuzzy and maybe should be quantified legally in our bright, digital future), I can claim copyright on the new image.

Since, so far, non-human things are not allowed to hold copyright, a human can claim copyright over works created by non-human things, that the human owns or controls. It's even easier to reason about if the human and non-human thing 'collaborate' on the final creative product. So, if I fiddle around with my inputs (prompts) into Super Diffusion Power Plus Gold Edition, then we (the software and I) collaborated. And I own the output that I chose (curated) as the best one.


>If you put in creative inputs using a tool, it is copyrightable (a car's design). If all you did was say, give my XYZ widget (in this case 'give me a picture of a frog holding an umbrella under a rainbow') you only gave instructions for generating a widget, you did not create art.

Does this still hold true if you worked through hundreds of variants of 'give me a picture of a frog holding an umbrella under a rainbow', generating dozens or hundreds of images for each version of your prompt, ultimately putting in hours or days (or weeks) of work to produce a single image depicting the perfect umbrella-wielding frog you were originally envisioning? This isn't even considering inpainting, masking, and other image-in-image manipulations these models can also do to get you closer to the results you're trying to create.

For artists using AI, this seems to be the more common case and is a far cry from the idealistic one-prompt-and-done that usually gets used for an example in these conversations.


Yes. Museum curators/gallery owners do not own the copyrights to the pictures they selected. What your are describing you put the computer as 'anonymous creator' as far as Title 17 is concerned.


Do you have citations for a ruling that AI generated art is not copyrightable? To my knowledge, such a ruling has not been made, so we can at best make wild guesses at what the courts will find.

I don't think this particular wild guess is on the right track; the creative input given to the AI generator was the prompt. There is also an argument to be made that, in the same sense a photograph can be copyrighted even though it's just a single still image from a moment that occurred around it, an Ai-generated artwork can be copyrighted because the artist performed the creative act of retaining it. In essence, they pulled it from the soup of possible outputs and held it up as one worth noting, as a photographer pulls an image from the soup of possible moments and framings.


But the photographer took the picture, determined the framing and composition. A copyright owner for AI would be claiming to own their contribution, the prompt, but not the automated machine generated portion, the image.

Their pulling it from a pile is not an act of creation, and therefore does not qualify. You have to CREATE a work, not critique/currate it.

At best you have an 'anonymous work' by Title 17.

An “anonymous work” is a work on the copies or phonorecords of which no natural person is identified as author.

The fact that it is a curated 'anonymous work' does not make it somehow more copyrightable.


The picture was there in electromagnetic flux; the photographer just got in the way of the photons.

I think a similar argument can be made that in the infinite space of stable diffusion solutions, asking for one is just getting in the way of the noise. If a photograph is art, a human saying "this shook noise is worth showing people" is art.

Does the equation of copyright change if the artist says "I want a boat on a calm sea" and starts from a white triangle and blue rectangle? If not, there must be some point of human contribution of information between "words into the stable diffuser" and "using Photoshop on a canvas starting blank" where the work becomes copyrightable. Where is the line?

> A copyright owner for AI would be claiming to own their contribution, the prompt, but not the automated machine generated portion, the image.

If I take a blank canvas, set color to [200, 10, 10], and click in the corner with the paint bucket, my contribution is five parcels of information (compressible to one, the color), and I get to claim copyright on the whole ruddy square that results.

Especially if I name it "Ruddy Square" and sell it to the MoMA.


Maybe. Everything from back when I was a real person and dealt with copyright, patent, and trademark lawyers tells me otherwise, but I know this from the tech industry side/tech industry lawyers and not art specifically. My reading of Title 17 tells me otherwise. But maybe you are right. And maybe museum/gallery owners actually own the copyright of the works they 'find' and display, especially if the gallery gave the artists 'prompts' for what they wanted the art to contain.

In your case, you would not be able to copyright that piece of art. You can't own colors or dimensions. You could copyright an installation of the art piece, so that no one else could display a [200,10,10] colored piece of art with the same dimensions you used and call their installation 'Ruddy Square', but that is the only copy protection you would be given.


> And maybe museum/gallery owners actually own the copyright of the works they 'find' and display, especially if the gallery gave the artists 'prompts' for what they wanted the art to contain.

In general, not if those prompts were given to a human who does the actual work. Unless that human was contracted under a work-for-hire agreement, in which case absolutely yes.

> You can't own colors or dimensions

Interesting. I was mistaken and in the specific case of a square of a particular color, you are quite right. https://www.owe.com/resources/legalities/legalities-15-is-a-...

This surprises me because I've observed you can own zero volume for a duration in the music space. I'm not sure why the visual space would differ, but the interpretation of copyright is extremely path-dependent law so I shuldn't be surprised if they do.

https://edition.cnn.com/2002/SHOWBIZ/Music/09/23/uk.silence/

That having been said, I suspect that while simple shapes are not copyrightable because they fail a uniqueness and common-usage test, the output of a stable diffusion run does not fail such a test, being a unique artifact that has never been seen before.


The creator would still own the copyright and have to assign it to the person that hired them. It's that same as when us software engineers get our names on patents and then assign them to our employers. I'm surprised more people in HN are not familiar with how intellectual property rights work.


What you have just described matches no mechanical process I've undergone getting my name on a patent my employer owns. At no time did I receive a patent that I then assigned to my employer. Such assignment is included in my employment contract.

I think that may be a hair-splitting on the process; point is it is possible to write a contract where work done by someone else has its copyright assigned to a contracted employer. But honestly, that entire point is less interesting than the question of how many quanta of work one has to put into a mechanism-facilitated process to be able to claim copyright of the result.

So paint-bucketing one square is insufficient for unrelated reasons of originality (the inability to copyright a shape). But Piet Mondrian's "Composition with Red, Blue, and Yellow" (1930) was just a few squares and lines and was copyrightable. So clearly, it doesn't take too many horizontal and vertical black lines and full-block fills to create original art.

If I write a small script and put it in the public domain to generate Mondrian-like output and hand it to you, and you run it twelve times and pick your favorite, is there any reason you couldn't copyright that one? What's the important difference between picking the output of the script you ran on your hardware and drawing a few grids and paint-bucket-filling yourself? Is it not two paths to the same result: a novel Mondrian-style image that you created? How much intention is needed to make it copyrightable vs. how much random-algorithm output?


Copyrights are assigned to inventors. How can you legally not create the patent under the inventors' names and then assign the patent to the company? Maybe my company's lawyers were just doing CYA, and granted I took much of the patent stuff on the lawyers word, but I don't think the process you describe is legal. The fact that my company owns the fruit of my labor has nothing to do with the legal requirements of an inventors name being on a patent. Your refusal to acknowledge that these laws require an inventor/artist is the core of my argument.

If you generate images in the manner you describe you are nothing more than a machinist plugging in coordinates and generating widgets, not an artist using tools. The generating program meets the 'anonymous artist' portion of the relevant Title code in that case, and you can not copyright it. The office might mistakenly give you copyright, but I don't think it would hold up to a challenge.


I wanted to thank you for this interaction because I've learned quite a bit about the boundary layer on copyright. I think I see what you're saying in this topic.

It looks like the copyright office is perfectly willing to grant copyright on work that uses an AI generator for even a substantial portion of it (https://arstechnica.com/information-technology/2022/09/artis...) but not the whole thing. I suspect there will be a series of cases in the not too distant future to make the boundary line clearer.


Also there was no need to call bullshit. You're the reason the internet sucks now.


Assume a painter with a good visual memory observed many scenes around the world. They are very intellectual and know pretty much about anything regarding physical objects, shapes, and visual styles. Then they have the ability to draw very realistic scenes of whatever we ask them to do, in whatever style we want. We pay the person (or they might offer this as a courtest too), and they paint and give exactly what we want, with all the rights to sell the image.

It's the same: the "someone" is replaced by a computer software that "learned" in a similar fashion, being able to draw what we ask them to draw. Unless it's directly sampling some copyrighted work as-is, whatever it creates can be copyrightable.

Whether the painter allows this is another story.


Of course it is. Following your logic no photograph can be copyrighted taken with a camera. After all, the subject already existed you've seen in your camera, you merely recorded the photons with a sensor, digital or analog, doesn't matter.


Your response lacks any merrit. The photographer framed the picture, chose the focus, etc. Artistic inputs, not simply 'give me a street scene'. More apt, a photographer can't claim as art a picture he googled with 'give me a street scene'. He was not an artistic creator in that case. Read Title 17 instead of pulling made up arguments out.


I think the copyright stuff is a ruse to distract from the real thing... they just don't want AI artwork.

I asked this same question on here a few days ago when another site blocked AI artwork. I realized that the more that AI artwork gets blocked, the more that this is an opportunity to provide exactly that! Niches make great business models.

Someone responded...

https://lexica.art/

While not quite exactly what I was thinking, I think it is a good first start.


What would be the use-case of an AI-art library? The AI is the library.


img2img?

Sharing generative text / parameters?

Not just a library, but a community.


It's inevitable there will be a AI Stock Photo competitor, but I don't envy the legal battles that'll follow. Getty, Adobe, iStock, etc are almost guaranteed to sue, and it'll be an expensive and long process.

I look forward to these stock photo sites dying - image copyright is a royal PITA since discovering copyright is almost always just paying shakedown notices from DMCA bot lawyers after the fact when you make mistakes.


Why would you search a Getty competitor for AI generated images when you can just roll your own?


It's faster to sift through pre-generated images than to build novel ones.


As a designer who searches stock photography for work fairly often, I'm not so sure about that. Plus the minor time savings might well be offset by the cost savings and licensing freedoms of rolling your own depending on the generating engine.


Getty's business relies on the legal framework of copyright, and how it enables control (and sale) of the licensing of copyrighted material. And they're saying: nope - AI output is so ambiguous w.r.t. copyright and licensing of the inputs (when it's not flagrantly in violation, as with recreating our watermarks), that we want to steer totally clear of this.

When HN has discussed Github's Copilot [1] for coding, it seems like the role of copyright and licensing isn't discussed in much detail [2] (with some exceptions awhile back [3, 4]).

Do you think there is a software-development analog to Getty (I mean a company, not FSF), saying "no copilot-generated code here"? Or is the issue of copyright/licensing/attribution even murkier with code than for images?

[1] https://github.com/features/copilot/

[2] https://hn.algolia.com/?dateRange=all&page=1&prefix=false&qu...

[3] https://news.ycombinator.com/item?id=32187362

[4] https://news.ycombinator.com/item?id=31874166


It is very easy to see hypocrisy in the FurAffinty statement: "Human artists see, analyze and even sample other artists’ work to create content. That content generated can reference hundreds, even thousands of pieces of work from other artists that they have consumed in their lifetime to create derivative images,” ... “Our goal is to support artists and their content. We don’t believe it’s in our community’s best interests to allow human generated content on the site."


It's most of all not in the best interest of GI's bank account when people learn that they can generate any image they desire - for free.


generated images =/= created

AI generated images have the potential to be art in the eyes of the beholder, but let's not pretend that generation is the same as the mental, physical, and spiritual flow state that goes into painting or drawing a piece.


I don't think that's what the parent is doing. They're pointing out the hypocrisy of claiming AI art is copying copyrighted works because human artists are trained in similar ways. That's not making a claim about whether or not AI art is "real" art.


why not? human artists do exactly the same thing - combine learned patterns into new compositions.


Human artists create with intent. Statistical image generation throws paint at a million walls and keeps the handful that are statistically close to images tagged with words in a prompt.

That's not the same thing, and there's a reason why all of those generated images seem... off.


I think we may be talking about two different concepts regarding creation of art.

Absolutely humans (and myself, I'm a professional illustrator) use a mental patterns to come up with ideas.

The physical difference in AI generation is the lack of butt-in-chair time of the flow state. Painting/drawing/rendering art is not just mindless time to be compressed; it's a mental/physical/emotional/(and some would say spiritual) flow state with a lot of "input" abstractions beyond the patterns. Things like the creative's personal mood, personal past experiences, recent discussions with friends, recent texts they read ... those all fold into it. I wouldn't trade that flow state for the world, and it absolutely leaves fingerprints in my creations.


so you say what disqualifies AI is that it's a lot faster than humans at doing the same task


That’s definitely part of it, yeah. There’s other factors too, but that’s obviously one of the big ones.

So what? FurAffinity’s stated goal with the ban is to protect human artists. Obviously banning something that undermines human artists is a step towards that goal. If you want a place to show your AI art, there are plenty of other sites that will welcome you.


Humans don't usually do stroke for stroke copies of paintings. Or pixel for pixel sampling of photos, unless they get rights to the sources.


Neither does the AI, so what’s the point?

Yes, if you look hard enough you’ll find some. But that’s true on either side.


When humans copy verbatim, even only partially, there are consequences unless it's fair use.


If someone notices. There's no guarantee that anyone will, even the person doing it. I've certainly done my fair share of verbatim copying -- something I only realised weeks later, if ever.


neither does AI. They don't operate in pixel space, but in latent space, which is the same as a mental model and the neural networks that do this even have a lot in common with how our visual cortex works. The conversion to pixel only happens in the last step when the concept has been generated as mental model (latent representation). They're doing the same thing human designers do, just orders of magnitude faster.


When your business sees an existential threat on the horizon you have two options – be at the forefront of the change and get ahead of your competitors and any new entrants by adopting the change yourself, or stall/threaten/litigate/raise prices/lower prices and otherwise hold on to your business model at all costs. Those in the latter group don't survive very long.


What about images that are a hybdrid, i.e. human composed but using AI elements? Similar to Worhol and photography. I'd imagine a large percentage of artwork will use this approach and Getty will have to evolve their stance to stay relevant.


I wonder if it applies to photoshop neural filters, or any kind of ML touchup like an instagram face filter. (Probably not but the line seems awfully thin and arbitrary.)


one big difference is - what was it trained on? It is unlikley a little photoshop filter, whether ML or not was trained on billions of images.


If AI generated Images are of better quality than the alternatives, getty or other orgs rejecting AI won't suppress its rise.

Rather alternate markets for these images will arise quickly, and people will flock there leaving getty behind.


I'm looking at this too. The kneejerk reaction over the last few weeks has been for incumbent communities to start discriminating against AI art due to a standard of creative merit that they can't articulate. Whereas the AI art communities are rapidly iterating with what is a very creative process, that is just new and unfamiliar. I fully expect them to refine and articulate what that is. Controlling Stable Diffusion with variables is honing one's tools, seeding it with a sketch is very creative. Having tools for faster processing is no different than other content creation and what artists do. I think these communities will make their own places very quickly. Some people will monetize, others are just more able to fulfill their creative vision which I'm more a fan of than gatekeeping how much discipline is necessary to do art.


Once copyright rules for AI are ironed out, I imagine Getty will join in (or not be able to, if copyright law goes that way). Other commenters in this post have made points about the courts viewing AI as being non-copyrightable, but prompt-based AI works might be considered copyrightable. I think Getty just wants someone else to handle that inevitable legal battle first.


The impression that I’m getting from a lot of the comments here (and a lot of past discussions about AI art on HN) is that tech people view the art industry as a “challenge”, and they want to use machine learning tools to “defeat” it - either because they just want to demonstrate the sheer power of these tools, or because they think artists are irrational for thinking that there’s something special about human art and they want to prove them wrong, or what have you.

I can’t think of any other way to explain the persistent desire to keep forcing AI art into spaces where it’s not wanted, or the repeated discussions about loopholes in the rules, how AI art can avoid detection, etc.

I suppose the comparison I would make is to chess. Computer assistance is strictly forbidden in chess tournaments - it’s cheating. Both the players and the spectators want to see matches played between two humans, without computer interference. You could devise clever ways of getting around the rules and cheating (there’s a big cheating scandal rocking the chess world as we speak), but no one would praise you for doing this. They would just think you were being a jerk.

Similarly, there will always be people who want to create communities centered around human art, simply because of the mere fact that it was made by a human and they want to see what human skill is able to accomplish without AI assistance.


I'm simply deeply interested in knowing where this all goes. I'm not exactly forcing AI art into places where it's not wanted, but I am deeply interested in knowing where we will end up with all of this.

I have basically no prediction, but it feels like we're on the cusp of some very powerful tools. I don't know if it will end up just being a novelty, or if it will replace current jobs, or if it will simply create powerful tools for humans to make new art with. Just no clue.

In addition to this deep curiosity of what the future holds, I'm also very interested in it philosophically. I run a daily word game website, and every day's puzzle comes with an image that I generated for that day's words when you solve the puzzle. I often get people emailing in, asking who the artist was, because the day's image spoke to them, they found it beautiful.

It always feels like I'm giving them such a let down when I give them the answer. They were looking for a human connection. The most valuable thing isn't the technical skill; it's the connection to another human who generated this image (though the technical skill is a demonstration of the artist's commitment and is another thing to connect with). Finding out that a human didn't generate the image is a lead balloon. I find this a very interesting, salient example in the "what is art?" and "what is meaning?" categories.

It also brings me back to the tools question. Because, in many cases, ultimately there is a human connection to be made. I had an idea, and I worked with an AI to realize it. Sometimes it took quite a bit of work on prompts. Not nearly as much as it would take a technical artist, and I didn't get as exact an output as I had in my head.

But if they want a connection to another human imagining something that speaks to them, it's there, I imagined it and then a computer realized it. It's not as meaningful as a piece that an artist committed numerous hours to after a lifetime of honing the craft, but it's not devoid of meaning either. And I don't have any art skills to speak of, so in fact it enabled some human connection that wasn't possible before.


There are different sorts of creative excises. I do think that there is something interesting going on in the contrast between generative art and generative programing responses, but it's also true that most creative output is work and isn't done for the joy of doing it. Getty images exists to provide quick easy to use stock images not to plumb the depths of human existence. Photo realistic painting or abstract art will remain interesting and some people will want to know it was done by a human just like cameras, photoshop, and digital art in general hasn't destroyed the market for paintings, but they have pretty much destroyed the market for portrait painters. Idk what the future will bring, but it probably won't be a bunch of humans hired to paint the goings on at red carpets and it probably will involve AI doing more of what we might call creative work


That reads a lot like pretending that progress can be held back.


Is it “holding back progress” to ban the use of engines in chess tournaments?


I always got a weird spidey sense from Getty Images and similar stock photo sites, and this just solidifies it. They exist in this limbo between open and closed, somehow finding a way to greatly enrich themselves while simultaneously not enriching their content contributors. Same for sites that put published papers behind paywalls, and even news sites that block access to articles unless the user goes to the enormous effort of (gasp) opening a new private window in the browser.

I don't know how, but that stuff all has to end. We've got to move past this tragedy of the commons that's happening with copyright. We all suffer just so a handful of greedy rent seekers can be the gate keepers and shake us down.

I had hoped that we'd have real micropayments by now so people could each spend perhaps $10 per month and distribute those funds to websites they visit and content they download, a few pennies at a time. Instead we somehow got crypto pyramid schemes and NFTs.

I'm just, I'm just, I don't even know anymore! Can someone explain it? Why aren't the people with the resources solving this stuff? Why do they put that duty onto the hacker community, which has to toil its life away in its parents' basement on a shoestring budget as yet another guy becomes a billionaire?

I'm just so over all of the insanity. How would something like this AI ban even be enforceable? Whatever happened to fair use? Are they going to spy on everything we do now and send the copyright police? Truly, what gives them the right? Why are we paying these people again?


I love the micropayments idea for consumers, but I suspect it would be really hard to get businesses on board, similarly to how music companies lost money when moving from albums to individual songs, and now streaming.


Oh ya, good point. I hadn't considered the long tail problem, how democratizing music production counterintuitively killed the recorded music business so performing onstage is the only real way to make money now.

I feel like that problem is only going to spread to all industries, so that the only work that will be compensated is manual labor.

Someday we'll have to evaluate if running the rat race is the best use of human potential. Is that premature right now? Will it be in 10 years? I dunno, but I feel like now is the time to be solving this, not in some dystopian future where we spend our entire lives just working to make rent..


I've actually been working on a stock photos site built around Stable Diffusion for some of these exact reasons. https://ghostlystock.com/ is the first version, but we're adding a bunch of useful features to make it more useful for people to find legitimately useful stock images.


So it's just for copyright reasons.

I was wondering whether there were authenticity issues at stake? For example you can imagine someone wanting a stock image of "New York Skyline" and using an AI generated image that looks right but actually contains elements not in the skyline. This could undermine trust in Getty, which would be something they'd want to avoid.


Getty isn't alone in banning AI art, but they're doing it for different reasons than most.

Lots of art sites are currently being flooded by subpar AI generated garbage. If humans curate AI output and upload only that one-in-a-thousand good looking output, that is fine.

Instead we have bots uploading one image every few minutes, auto-generated from some randomly selected tags. Mostly the tags are wrong and the art should maybe instead be tagged such things as "grotesque" and "body-horror".


I feel like the easiest thing to do would be to declare that entirely ai generated images are public domain because a human didn't have enough of a hand in making them (and only humans and groups of humans can have a copyright), and there's not enough of any one image from the the training data in the output to say that the output contains a recognizable segment of any of the images that it was trained on, even assuming the training images were all copyrighted.


Getty at this point should be feeling extremely threatened from ai generating images. It may not fully replace Getty but will take a large chunk of its business.


> The creators of AI image generators say the technology is legal...

I'd bet that said creators are not lawyers, nor (well-)advised by lawyers, nor even able to cite substantial law nor case law to back up that "say".

And with the extremely low bar for calling something "AI" these days - how close to "kinda like Google image search, but with a random filter or two applied" might a low-budget "AI" get?


Makes sense, like banning electricity in candle stores.


I don't mean to be dismissive, but there's no doubt there'll be plenty of other places to host images like this.

Does Stable Diffusion and the like automatically add some kind of steganographic code to images so it can be automatically detected, e.g. and not added to future training sets? Obviously this could be removed deliberately, but it would prevent the vast majority of cases.


Is it a big issue if some AI generated pictures are included in training datasets?

They would be cherry picked by humans, we don’t share the bad pictures, and generated from other/older models.


Stable diffusion does: https://github.com/CompVis/stable-diffusion#reference-sampli...

Seems like a good idea to me with almost no downsides. If sites start discriminating against images based on that watermark though that's going to incentivize people to turn it off.


many people, if not most, use one of the many forks with the watermarking and the aggressive nsfw filter removed from the code. Graphics card RAM and speed is precious.


I haven't looked into exactly what it does, but it seems like there's a (hidden?) watermark: https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a...


By default, Stable Diffusion watermarks images. However as it's open source, it's obviously trivial to remove it.


And thus the AI auto filtering arm race has begun.

New filters will appear to detect AI-generated images, then new models will be trained to bypass the filter, then upgraded filters will detect the images made by the new model, etc...

In the end tough it's likely that we won't be able to distinguish between a real image and an AI-generated image, it's only a question of time.


I read a paper some time ago, where somebody ran a facial recognition algorithm on the output of a GAN face generator. It found lots of images in the training data that looked strikingly similar.

In other words, one reason AI images look so good, is that look a lot like actual images.

Thing, is for the love of me, I can't find the paper anymore? Who knows it?


This question of copyright legalities and AI-generated media reminds me of the "Monkey selfie copyright dispute" from a few years back: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...


This is the pre-news story to the inevitable: “AI generated content bans Getty Images by forcing it into early retirement”


The real question is: Why would we need Getty Images or other stock photo providers if we can have AI-generated content?


I'm not sure how they'd enforce it while they still accept other digital art,

But it seems a good decision, both on their stated concern about copyright (it's all a synthesis of the training sets, but who knows how much?), and also that AI art is effectively an infinite output stream that would very rapidly swamp all human output.


Getty in theory could just ask for a "fair" (whatever large or small could it be) share in any AI company uses Getty's images to train its models. It could be battle not for instinction, but for the new market share and control over AI companies.


Depending on the prompt, it is certainly possible to generate images that are watermarked. I got some istock watermarks on a couple of images last I tried and I wasn't using istock or anything related as part of the prompt.


Can two images be identical if both are generated using the same prompt and model combination?

If all images generated are unique, I fail to see how copyright can ever be enforced.


Meanwhile China doesnt care and is creating super apps unencumbered


Interesting that they will still accept fully computer generated images, most of them created with a massive amount of help from AI-assisted algorithms in Photoshop and the like.


Seems like a statement of intent without any details. How do they define 'AI-generated'? And do they plan to automatically detect it, and how?


This is a legal distinction, not a technical filter. They are telling their users that it breaks their terms of service to upload "AI-generated" images. So if you upload an image and it turns out, through a legal challenge, that Getty learns it was made by AI (whatever that means) - they can just walk away.


I guess images generated by computer systems that use if/else technology and mathematical vectors. /s

That's probably how lawyers will put it.


Of all the example images Verge could have chosen for this article, why did they choose the one they chose? Apologies if this is too OT.


I can't wait until I can sue everyone that's ever used copilot because MS trained its corpus on my code


It would be easier to use MS probably. Because there you know they used your code.


A visualization of the announcement by a neural network:

https://twitter.com/illubots/status/1572620909885669378

(I'm building an illustration agency for robot brains, aka neural networks. So far, I have 3 robots who can consistently draw in their unique style. This is by illustration robot Jonas)


That's a cool idea, wish you luck!


These are really cool


Business opportunity right there...


Ya you're right, I always seem to go to the negative instead of recognizing opportunity. Companies that set these backwards-looking policies should be aware that someone else can always come along and eat their lunch. I'm always amazed that they think they can just get away with it without consequence!


So...stealing from hunams, good. Stealing from robots, bad.

You had your chance, meatbags!.


Getty is being the blockbuster of the generated art age.


Getty opening a market opportunity to competitors


Getty, skating to where the puck was.


Next week: Effective immediately, Getty Images will cease operations citing loss of their entire business model due to AI.


How would they know?


the innovator's dilemma




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: