The reality is that for the most part, any corpus created after 2022 is going to...

alganet · on May 17, 2025

I'd say 2007 or so.

There wasn't any known active AI back then, but statistics on popular ideas and internet content was already a thing, and speech pollution based on those assessments had already started to spread fast, manually outputted.

Sure, a lot of good content came out since then. But the amount of garbage... it's immense and very difficult to sort out automatically.

The major issue is that this garbage then _became_ the norm. Only people who lived back then can remember what it was. For new folk, it looks just like a generational shift. However, it is quite obvious that some aspects of this shift were... unnatural (in the sense of not being spontaneous cultural manifestations).

lazystar · on May 17, 2025

and im sure someone from the 90's would say the same about '97.

https://en.m.wikipedia.org/wiki/Eternal_September

alganet · on May 17, 2025

I am not talking about an influx of newcomers.

Pay attention.

I mentioned explicitly that I see what happened as distinct from a natural generational shift.

There are many phenomena around that era to support what I am saying. Like, for example, the first massive political campaign to leverage internet as its primary vehicle.

creshal · on May 17, 2025

Not sure why you're getting downvoted, content farms have been a thing for a long time, and many a spam website used crappy markov chains to generate even more "content". Anything that could be marketed by company had its search results drowned in hand-crafted bland marketing slop, and even before ChatGPT got popular searching for things like recipes (or, god forbid, generic windows error messages) was a nightmare. And a lot of that garbage is in LLMs' training data.

alganet · on May 17, 2025

> Not sure why you're getting downvoted

I don't know either. My guess is that they're angry because I am not angry about the things that they want me to be angry about. It happened before.