That's something I have been wondering. If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation. I don't see why this shouldn't apply to LLMs as well. If an LLM might have been trained on the original source code, it should be considered "tainted".
Yes, and realistically any code that LLMs produce is a derivative work of its training data. There's going to be a huge disaster licensing wise
I have absolutely no idea how LLMs got through anyone's legal departments, I guess the hope is that if everyone breaks the law enough, it'll just be fine
> the hope is that if everyone breaks the law enough, it'll just be fine
Ever since the early 2010s when companies were started with the business idea "unlicensed hotels" and "unlicensed taxis" and made the owners really, really rich, this is said pretty much out loud. Look for words like "regulatory risks" and similar.
Maybe it started with the unlicensed gambling fad before that? That also made a lot of people filthy rich. Every time you have something under special license, or insuance requirements, then of course there is a margin for you if you can skimp on the license and hire gig workers instead.
The LLM situation with copyright and derived works in the 2020s is similar. Someone is likely to be rich, but there is a clear regulatory risk to it.
> if everyone breaks the law enough, it'll just be fine
That's pretty much what happened, isn't it? These concerns were all discussed in the beginning back in 2022, and I recall answers from many here on HN along the lines of "oh well, we can't stop it now or we'll risk falling behind China in AI development"
So yeah, the laws went out the window a long time ago the moment our government and the people decided to just look the other way willingly in the name of "progress."
Problem is there's a lot more than a single repo in training data, the corpus is massive... Should the author of a blog post on cats also be compensated for simply being in the same training data as the git repo?
Honestly? Yes. This is why its such a problem that most of the training data was not used with permission, and without the correct copyright status or license associated with it
There's a lot of arguments about humans doing the same thing, but the reality is that humans and robots don't enjoy the same legal protection. Its clearly a derivative work of all of its training data
Then it works both ways. Say I manage to generate essentially a ripoff of your copyrighted song, release it and make a ton of money, you now have to split that royalty with keyboard cat. And Joe bloggs. You'd end up fractions of pennies
> If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation.
That is the difference between necessary and sufficient. Clean-room is sufficient to guarantee avoiding copyright, but it is not necessary. The line legally is south of there, but that position was chosen because they didn’t want to crossing and it was easier to argue for legally in court.
tl;dr: clean room is overkill for avoiding copyright infringement
Everybody knows that C++ did not invent the concept of spans and that it was late to the party. It doesn’t change the fact that (presumably) nobody made a proposal to the C++ standard.
> It doesn’t change the fact that (presumably) nobody made a proposal to the C++ standard.
There were proposals about this for many years. C++ is just a terrible programming language, standardized by a committee (WG21) which exists in large part to boost the ego of one man, Bjarne Stroustrup.
N3851 for example wants to name this idea "array_view" which like "string_view" is an impressively unwieldy name for a core language feature, because of course neither of these were actually proposed as core language features even though that's what they naturally should be -- but it is basically the slice type or as you (and modern C++) call it a "span".
It's true that you can't change facts but what you've got here was a belief which was unfounded, not a fact.
I really don't understand why this was not pursued further. At the very least, this should have made it into C++17 together with std::string_view.
> because of course neither of these were actually proposed as core language features even though that's what they naturally should be
Should it really? What would this even look like in C++? IMO std::span works perfectly fine as a library type.
> C++ is just a terrible programming language, standardized by a committee (WG21) which exists in large part to boost the ego of one man, Bjarne Stroustrup.
That's certainly not the reason why it was standardized. Pre-C++98 was wild west with every compiler offering there own (incompatible) idea of what C++ is. Yes, there are many problems with design by committee in general (and the C++ committee in particular), but there was a very good reason for standardizing the language. The committee is not a one man show and there are many occasions where Bjarne has publicly voiced his frustration and disagreement.
Of course it isn't, all the great egotists need a parade of sycophants to heap praise on them, you've doubtless seen modern US "Cabinet meetings" in which TV hosts newly elevated to run parts of the US government compete with experienced politicians as they all try to offer the most effusive praise for their snoring God King.
Personally, I'd throw up, but then I'm very much of Groucho Marx's view on such things.
Are your seriously comparing the C++ standard committee to the Trump administration? I know you have an axe to grind, but this is getting ridiculous.
Where exactly have you seen this "parade of sycophants" in the C++ standards committee?
As far as I know, Bjarne is just a regular committee members with just as many votes as everyone else and no veto powers. The committee frequently accepts or rejects proposals against his will. For a recent example, see his harsh criticism of the new 'contracts' feature in C++26.
Yes, I am seriously making that comparison. It's not as bad of course but it's certainly enough to make me cringe.
> Where exactly have you seen this "parade of sycophants" in the C++ standards committee?
AIUI The committee itself operates under the "Chatham House Rule" in which participants agree not to tell anybody who said anything and so we can only see group outcomes for the committee itself. For example 100% affirmative votes for Bjarne's "Profiles" proposal. At 100% everybody who had the opportunity to vote "Against" has to admit that er, they didn't, because that's just maths - but you won't now find anybody who was enthusiastic, somehow a room full of people who all now remember being uncertain voted affirmatively anyway. How about that.
> Bjarne is just a regular committee member
For almost a decade, WG21 has a "Direction Group" with a handful of members which insists that while as you say everybody is just a "regular committee member" their group ought to set the "direction" for the language and thus the committee. The exact membership of the Direction Group varies over time, but of course Bjarne Stroustrup has always been a member of this group. The group (whatever its present membership) writes only unanimously, which means everything it says has been agreed by Bjarne Stroustrup, and it cites as its reference for how to set the direction several books about C++ all written by that same Bjarne Stroustrup.
So, sure, Bjarne is "just a regular committee member" in the same way that Britain's Prime Minister is "just a regular Member of Parliament" that is, very much in theory but not at all in practice.
If Bjarne was so powerful, how come they voted contracts into C++26 despite his strong concerns? How come he publicly vents his frustration with the direction the language is taking?
Bjarne isn't god and I didn't say he was. So no, he isn't all-powerful.
Bjarne has always been frustrated by the failures of C++ and has always blamed them on other people. He's an egotist, they're always like that, I find it exhausting.
Well, you could have linked an actual proposal instead of dropping some cool facts about C, Extended Pascal, Mesa/Cedar and Modula-2, as if that explained anything.
> For example a dead apple is one that was picked a year ago, sold today, kept in storage until now.
It has always been normal for certain fruits and vegetables, such as apples, pears, potatoes, etc. to be stored for months in a cellar. In the old days, you simple could not get a fresh apple outside harvest time.
Your concept of "dead food vs live food" seems rather questionable.
Seconded. Every culture discovers ways to preserve food beyond their natural life. You ain't eating no fresh veggies in the winter of 1790. Everything you had was pickled. And you didn't get cancer because something will kill you before cancer did.
If anything the modern cold chain and globalized food supply and just the abundance of food in general means we have more access to fresh food than our ancestors even though they farmed and we mostly don't.
> Right now, people effectively spend ~0% of their time entertaining themselves with their own music, art, writing, film, etc.
First, the 0% figure is not true. People do write stories, play instruments or draw pictures.
Second, everybody who really feels a desire to express themselves creatively has akready been able to do so. Nothing was stopping you from writing poems, drawing pictures or picking up an instrument. Recording music has never been so easy. The "problem" is, of course, that it takes some effort. LLMs seem to provide a convenient shortcut, but you effectively skip the whole artistic process.
IMO it's better to either engage with existing great art or make an honest and humble attempt at creating your own art. You will learn so much more about music by trying to learn the piano or guitar than by prompting Suno.
You've claimed that "right now, people effectively spend ~0% of their time entertaining themselves with their own music, art, writing, film, etc."
So I assumed that you don't spend your time with traditional creative pastimes or don't know any people who do. Otherwise I don't understand how you would come up with the ~0% figure.
> You seem to imply that you can't have created any art ever in your life if you ever did anything with AI.
I did not say that you can't use any AI tools in the creative process, but anyone who has ever tried to create their own art will not confuse the verbatim output of AI models like Suno or Midjourney with actual art.
> The human touch is not automatically genius,
I never claimed that. The nice thing is that there is so much existing art/music out there that you can easily choose the things you like.
I understand that prompting Suno can be a fun pastime for some people, just don't confuse it with actual music or art.
> Sorry, I've seen just as much thoughtless garbage from humans as from AI...
Yes, there is lots of thoughtless garbage music made by humans, but all AI generated music is thoughtless by definition. AI models do not have thoughts or intentions, they are developed to mimick human thought and intent.
> and the AI touch is not automatically derivative trash...
Generating whole songs with Suno very much is. These models are designed to be derivative. AI tools can be used effectively and responsibly in the creative process, but only as a tool among other tools. Prompting Suno is not a replacement for actual music making or production.
> I did not say that you can't use any AI tools in the creative process, but anyone who has ever tried to create their own art will not confuse the verbatim output of AI models like Suno or Midjourney with actual art.
LOL - so let me guess, 99% of people never create anything resembling art?
AI regularly spits out better derivative crap than 99% of the derivative crap humans spit out...
> It's not only derivative, it also lacks any thought, intent or communicative effort.
So does most "human" art...
> Why should I listen to AI slop when there is lots of great human made music to choose from?
Hmm, I dunno, maybe because if you play around with it you might generate something that's close enough to what you wanted to listen to. Maybe you won't. Maybe not everyone is you and some people have different tastes...
Almost nobody could generate a good enough song or story or video or graphic or whatever in the fraction of time it takes with Gen AI.
For some people (clearly you are not one of them) - that is good and fun and entertaining in a new way that simply was impossible to get in the same amount of time / effort.
> > It's not only derivative, it also lacks any thought, intent or communicative effort.
> So does most "human" art...
That's just not true. Everybody who tries to write their own songs, write their stories or draw their own pictures does it with at least some thought or intent.
> Hmm, I dunno, maybe because if you play around with it you might generate something that's close enough to what you wanted to listen to.
That might work for some music, if you only care about the surface. But even then, why not simply pick some good existing human art? By choose the simulacrum?
The things you generate with Suno are not really your art anyway. That's an illusion that these companies want to sell. It's like you invite your friend who plays the guitar and can sing, ask him/her to play a few songs and then pick the one you like. Would you claim that it's your music?
> Almost nobody could generate a good enough song or story or video or graphic or whatever in the fraction of time it takes with Gen AI.
There's a fundamental misunderstanding about creativity/art. It's just as much about the process as the end result. You shouldn't expect your output to match that of professional studios or masters of the craft without putting in the time, effort (and money). That's just hybris. There's a reason why things like the DIY movement, punk, indie games, B movies, etc. exist. Everybody can already create art within their means and limits. If you write and record your own song, you can be proud of that. You don't have to sound like a professional pop artist. By prompting Suno, on the other hand, you have accomplished nothing.
> that is good and fun and entertaining in a new way that simply was impossible to get in the same amount of time / effort.
As I said, I see how it can be fun and entertaining. I played around with Udio myself and I got some funny results. Just don't confuse it with actual art or music making.
This is really true for most music genres outside the pop mainstream. The idea of AI free jazz is just as absurd as AI punk.
More generally, we think that music (and art in general) is a form of human expression and communication. The very idea of AI music just seems absurd, as it completely misses the point of what constitutes music as an artform. Why should I listen to something that has been produced entirely without human intent? Why should I prefer a cheap simulacrum over the original?
follow the money, they wont be selling vinyl but generating streaming revenue, just type in what you know are paying niches and off you go to fill the hard drives with slop to be paid by advertisers on streaming platforms.
The very premise of your question is very questionable.
> Over the past six months, there hasn’t been a single day where I’ve checked the HN Best RSS feed without seeing a post about how AI “writes bad code,” “introduces bugs,” “creates technical debt,” or something along those lines.
At the same time, there hasn't been a single day without several AI hype posts. The notion that HN has turned into an outlet for anti-AI sentiments does not match my experience at all. In fact, many users are already tired of the constant influx of vibe coded "Show HN" posts, AI model discussions and prompting recipes.
Also, AI is not only about the ability to generate lots of code very quickly. The potential (and actual) negative effects in certain fields and in society as a whole are very real and it's reasonable that people want to discuss them.
Yeah, as someone who originally came from Windows, the fork+exec model never made sense to me. Now I know it's just a historical quirk, but for some reason there are still people who pretend that fork+exec is actually a good thing...
He has a point. The video certainly feels more like a hagiography and I noticed the lack of critical voices. Only in the last 10 minutes they touched on some common criticisms, like the growing complexity and the memory safety issues. I still enjoyed it, though.
reply