The alternative are? the company that scrapes the web for a living or the one th...

pxoe · on Feb 4, 2023

you're forgetting one important alternative: to just not use and/or not do something. nobody asked them to scrape anything. nobody asked them to scrape copyrighted works. they could've just not done the shady thing, but they made that choice to do it, all by themselves. and one can just avoid using something with questionable data ethics and practices.

they clearly show in their actions that they think they can do anything with any data that's out there, and put it all out. why would anyone entrust them or their systems with own data to 'assist' with, I don't really get.

and even though it's an 'open source' project, that part may be just soliciting people to do work for them, to help them enable their own data collection. it's gonna run somewhere, after all. in the cloud, with monetized compute, just like any other AI project out there.

pixl97 · on Feb 4, 2023

I personally see your view on this as a complete and total failure on humans and society/culture actually work.

Your mind exists in a state where it is constantly 'scraping' copyrighted work. Now, in general limitations of the human mind keep you from accurately reproducing that work, but if I were able to look at your output as an omniscient being it is likely I could slam you with violation after violation where you took stylization ideas off of copyrighted work.

RMS covers this rather well in 'The right to read'. Pretty much any model that puts hard ownership rules on ideas and styles leads to total ownership by a few large monied entities. It's much easier for Google to pay some artist for their data that goes into an AI model. Because the 'google ai' model is now more culturally complete than other models that cannot see this data Google entrenches a stronger monopoly in the market, hence generating more money in which to outright buy ideas to further monopolize the market.

pxoe · on Feb 6, 2023

you bring up human mind as if that'd somehow explain, or excuse, or absolve how broken and poorly planned AI systems are, which, unlike human mind, memories, or thoughts, are very real, and operate with data, precise, definite and not uncertain in its existence, at every point of their actions. minds don't create a definitive list of every work they encounter, in precise detail, they don't create models that incorporate all of those works into a single entity with comprehensive accessibility, and don't enable others (thousands, millions, in hugely disproportional scale) to use those models, directly, to create yet more and more artifacts. all of these things are definite, tangible, transferrable data. minds are pretty much none of these things. but it doesn't matter what minds are, because it's about AI, and AI is a thing that exists, and it can and should be focused on in for its own merit. without sliding conversations into 'well, what about brains'. nothing about them. it's not about brains. it's about existing AI systems. don't slide.

licenses aren't limited to being 'only monetary', 'pay me to use this, otherwise, don't'. some licenses exist to enable distributing things freely, while offering protection of attribution and against misuse of things. (just check out CC licenses). it would be nice if those things would be respected, but they aren't, because there's no mechanism built in that'd discern the licenses, because they don't care. it is not just an attack on 'big bad commercial entities that hold copyrights on works', it's an attack on people who try to protect their works that they give for free, from misuse. (and on those people who just, naively put their work out there. yes, they may be naive, in not choosing licenses (which, as we see, wouldn't protect them against scraping that ignores licenses completely), but they might end up being exploited nonetheless and all the same, and they definitely don't deserve to be victim blamed, when it can be very clear who/what is the perpetrator of exploitation, in a very tangible way with a data trail (direct, definite presence in datasets.)

those people who knowingly built systems that ignore any copyrights, any licensing, truly aren't the "good guys" who are "battling corporatist copyright systems", even though they'd probably very much like you to believe that, as they so desperately try to avoid being grilled on copyright issues.

the 'standing up to capitalism, monopolies, etc.' is just dysfunctional in itself, as the resulting AI systems are very monetizeable, and are monetized, and SD, despite putting on airs as 'combating monopolies' (in tech, in research), has spread wide and far, and is now being used in a myriad of projects (with varying commercialization), that they're the ones who should be questioned on whether they're a monopoly in image generation algorithms themselves. "but it's free!", yes, that's how things spread, and then they try to upsell you on compute, limited access, or on hot new algorithms, as they dominate the market. they are perpetuating the same flavors of capitalism and monopolism, doing the same 'capture the market' moves (offer product for free, upsell, 'premium features and upgrades', aggressive undercutting and displacement of existing players in existing markets, etc.). those 'hot and new' companies are truly not better. you cannot be giving google side eye for offering a free product and capturing markets, while turning blind to SD offering a free product and capturing markets.

ask rms directly on what he'd think of blatant ignoring of licenses, and whether he'd give his blessing to continued operation of systems that pretend that licenses just don't exist. instead of trying to use a 25 year old story as some kind of cover/excuse, like it's some ancient scripture.

seydor · on Feb 4, 2023

Would be interestingly to extend this criticism to the entire tech ecosystem which has been built on unsolicited scraping, which extends to many of the companies that are funding the company that hosts this very forum. we 'd get to a complete halt

Considering the benefit of a model that can be downloaded, and hopefully ran on-premise one day, i don't care too much about their copyright practices being imperfect, especially in this industry

riskpreneurship · on Feb 4, 2023

You can only keep a genie bottled up for so long, and if you don't rub the lamp, your adversaries will.

With something as potentially destabilizing as AGI, realpolitik will convince individual nations to put aside concerns like IP and copyright out of FOMO.

The same thing happened with nuclear bombs: it's much easier to be South Africa choosing to dispose of them if you end up not needing them, than to be North Korea or Iran trying to join the join the club late.

The real problem is that the gains from any successes will be hoarded by the people who acquired them by breaking the law.