HN2new | past | comments | ask | show | jobs | submitlogin

Is there something I can write here that will cause you to send me your bitcoin wallet?


There probably is, but you're also probably not smart enough (and probably no one is) to figure out what it is.

But it does happens, in very similar circumstances (twitter, e-mail) very regularly.


Many technically adept people on HN acknowledge that they would be vulnerable to a carefully targeted spear phishing attack.

The idea that it would be carried out beginning in a post on HN is interesting, but to me kind of misses the main point... which is the understanding that everyone is human, and the right attack at the right time (plus a little bad luck) could make them a victim.

Once you make it a game, stipulating that your spear phishing attack is going to begin with an interesting response on HN, it's fun to let your imagination unwind for a while.


The thing is, an LLM agent could be subverted with an HN comment pretty easily, if its task happened to take it to HN.

Yes, humans have this general problem too, but they’re far less vulnerable to it.


Yes, I agree. My point was more about the current way we do LLM agents where they are essentially black box that act on text.

By design it can output anything given the right input.

This approach will always be vulnerable in the ways we talk about here, we can only up the guardrails around it.

I think one of the best ways to have truly secure AI agents is to do better natural language AIs that are far less blackbox-y.

But I don't know enough about progress on this side.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: