Hacker News .hnnew | past | comments | ask | show | jobs | submit | iterateoften's commentslogin

People did and still do the same with electricity and magnets. Both incredibly useful despite it.

This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.

The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.

Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.


> Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.

May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)


We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise. If we all had the exact same bias then it would be a huge problem.

I hear you but of course history is full of examples of biases shared across large groups of people resulting in huge human costs.

The analogy isn’t perfect of course but the way humans learn about their world is full of opportunities to introduce and sustain these large correlated biases—social pressure, tradition, parenting, education standardization. And not all of them are bad of course, but some are and many others are at least as weird as stray references to goblins and creatures


> If we all had the exact same bias then it would be a huge problem.

And may I introduce you to "groupthink" :))


Now imagine that every opinion you have is automatically fully groupthinked and you see the difference/problem with training up a big AI model that has a hundred million users.

The problem does exist when using individual humans but in a much smaller form.


> The problem does exist when using individual humans but in a much smaller form.

And may I introduce you to organized religion :)


That's still a lot smaller!

Make a major religion where everyone is a scifi clone of one person including their memories and then it'll be in the same ballpark of spreading bias.


Doesn't that depend on the biases in question? Many argue that homogenous societies do many things better. And part of homogeneity is sharing same set of biases.

And what do you think society/culture is?

It's a set of biases installed in people, whose purpose is mostly to replicate themselves.

Humans are MORE susceptible that LLMs, because LLMs's biases are easily steered to something else, unlike most humans.


> We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise.

[Citation Needed]

Just because if you have a species-wide bias, people within the species would not easily recognize it. You can't claim with a straight face that "we're really not that vulnerable to such things".

For example, I think it's pretty clear that all humans are vulnerable to phone addiction, especially kids.


> people within the species would not easily recognize it

[Citation Needed]

Sorry, but I had to. There's easy counterexamples of true, species-wide biases that we're fully aware of. Optical illusions, cognitive biases, cultural universals (community-sanctioned relationships/marriage, inheritance, ceremonial treatment of the dead). What we don't have are universal biases towards believing specific facts or stories.


None of those things are easily recognized though. They're not universals. A term like "cognitive biases" generally require a college level education.

If you go to a tribe in the middle of the rainforest, would they be able to explain those concepts? Of course not.

Plus, I already gave an example of a species wide bias at the end of the comment- phone addiction for kids. I'm clearly not saying it's impossible for a human to spot a bias, but rather... how many 5 year old kids recognize that phone addiction is a bad thing?


An LLM is a computer program, which isn't a human. You wouldn't excuse a calculator being occasionally wrong because humans sometimes get manual calculations wrong too.

> An LLM is a computer program, which isn't a human. You wouldn't excuse a calculator being occasionally wrong because humans sometimes get manual calculations wrong too.

Ah, now we're getting technical. An LLM is a non-deterministic/probabilistic computer program, not a calculator. Keeping that in mind is critical when using an LLM. Expecting deterministic behavior from an LLM is an example of what's known as a 'category error'. [1]

[1] https://en.wikipedia.org/wiki/Category_mistake


Mandatory reading on that topic: www.anthropic.com/research/small-samples-poison

We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).


I think it's extraordinarily telling that people are capable of being reflexively pessimistic in response to the goblin plague. It's like something Zitron would do.

This story is wonderful.


I feel at least partially responsible. I would often instruct agents to "stop being a goblin". I really enjoyed this story too, though.

We do not have the complete picture.

Doesn't seem that surprising or terrifying to me. Humans come equipped with a lot more internal biases (learned in a fairly similar fashion), and they're usually a lot more resistant to getting rid of them.

The truly terrifying stuff never makes it out of the RLHF NDAs.


We ought to be terrified, when one adjusts for ll the use-cases people are talking about using these algorithms in. (Even if they ultimately back off, it's a lot of frothy bubble opportunity cost.)

There a great many things people do which are not acceptable in our machines.

Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.


>Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.

You might if that was the best auto-pilot could be. Have you never used a bus or taken a taxi ?

The vast majority of things people are using LLMs for isn't stuff deterministic logic machines did great at, but stuff those same machines did poorly at or straight up stuff previously relegated to the domains of humans only.

If your competition also "just zones out sometimes" then it's not something you're going to focus on.


Humans also take a lot of time in producing output, and do not feed into a crazy accelerationistic feedback loop (most of the time).

The next loop

Lol, I respect karpathy a lot, but this is such an obvious in your face idea that it is laughable to put someone’s name on it.

What’s next “karpathy investing” where ai in a loop builds a portfolio?


I believe Karpathy himself called it autoresearch, not Karpathy Loop, but in a vacuum of names around AI it seems to be very easy to meme-drop a name and then come influencer efforts to cool-name and normalize it. See vibecoding*

I'd go a step further and say that sort of loop is probably the first thing most people who play around with agent harnesses try, pretty much the first "Hmm, what should I do now?" thing that pops into people's head.

It's less the idea and more the simplicity of it. It's a distillation of something that works and lets newer practitioners get their feet wet before moving on to more complex implementation.

actually having a harness for it is nice though, vs having yourself prompt it interatively.

Call it a K-loop please. Where ai in a k-loop builds a portfolio

Im left delighted to find out something new, but left wanting to know how to use it.

Like if im 75% on the green transition, how do i use this information.


Send it to a significant other, then discuss your differences. Will provide you with a new in-joke.

Yeah seems like nonsense advise. Have a code word that was never recorded? I don’t see how that would tote y anything. Like the point of these systems is they can say stuff you never said convincingly

The idea is that the attacker doesn't know the codeword. If the attacker finds out about the codeword then the attacker could indeed fake it. Hence why you shouldn't say/write it in recordings or chat messages.

Zen meditation for an hour staring at a wall is a marathon that at the end results in a semi-psychedelic state for me.

Exercising and sitting b meditating are two related but seriously different things. Which is why there are many other types of meditation to practice (walking, working, silent, etc) but zen mostly considers sitting and looking at a wall the OG


Unless you know exactly why paper trading sims are so hard to backrest in practice, it’s silly to make arguments on why your paper trading sim works.

It’s insanely easy to make a trading algo profitable on historical data.


Overfitting on historical data is a real risk and defo a concern (there's been lots of learnings lately). The backtest wasn't naive. Fundamentals used filing dates not period-end dates to avoid look-ahead + scoring was validated out-of-sample using walk-forward testing rather than just optimised in-sample (GA used 5 temporal folds and walk-forward used 25 rolling out-of-sample windows).

You don’t need for every item. Just low frequency high profit goods

It’s also that agents and ML reach local maximima unless external feedback is given. So your wiki will reach a state and get stuck there.

Here is an iteresting thing.

> "The LLM model's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context."

That means all these SOTA models are very capable of updating their own prompts. Update prompt. Copy entire repository in 1ms into /tmp/*. Run again. Evaluate. Update prompt. Copy entire repository ....

That is recursion, like Karpathy's autoresearch, it requires a deterministic termination condition.

Or have the prompt / agent make 5 copies of itself and solve for 5 different situations to ensure the update didn't introduce any regressions.

> reach local maximima unless external feedback is given

The agents can update themselves with human permission. So the external feedback is another agent and selection bias of a human. It is close to the right idea. I, however, am having huge success with the external feedback being the agent itself. The big difference is that a recursive agent can evaluate performance within confidence interval rather than chaos.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: