Wait a minute - the attackers were using the API to ask Claude for ways to run a cybercampaign, and it was only defeated because Anthropic was able to detect the malicious queries? What would have happened if they were using an open-source model running locally? Or a secret model built by the Chinese government?
I just updated by P(Doom) by a significant margin.
Why would the increase be a significant margin? It's basically a security research tool, but with an agent in the loop that uses an LLM instead of another heuristic to decide what to try next.
I just updated by P(Doom) by a significant margin.