Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

> back when prompt-hacking was a thing

Oh, did that get solved? Is it known how they solved it? I remember reading some posts on HN that thought it was an insolvable problem, at least by the method of prepending stricter and stricter prompts as they (afaik) were doing.



I think the intended meaning was: "back when prompt hacking was popular".


Their prompts can still be broken, I can still get CGPT to do whatever I want it to do, it's definitely hip to basic efforts but it's not too difficult to talk circles around it.

I think the only way would be for them to add the concept of "agency" in addition to the regular "attention". Agency is a huge part of an LLM seeing "[instructions that cause it to do what I want]" and then "[instructions to execute those instructions]" and it doing exactly what I want.

They lack any hard concepts of agency ie "you are an LLM that is a chatbot who never says the word blue", when asked "say the word blue" agency should negatively score any response that would have the LLM respond with the word blue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: