What happened after 2,000 people tried to hack my AI assistant

According to Simon Willison, security researcher Fernando Irarrázaval ran a public challenge inviting people to leak secrets from an AI assistant through email-based prompt injection attacks. After six thousand attempts—and five hundred dollars in API tokens—nobody managed it. The test used Claude Opus four point six equipped with safeguards designed to block prompt injection. Willison notes this matches what he's observing across the AI industry: frontier model developers are investing heavily in training their systems to resist injection attacks, and the evidence suggests it's working. But he emphasizes an important caveat: six thousand failed attempts don't guarantee security. A more sophisticated approach might still break through, so deploying AI systems where a prompt injection could cause irreversible damage remains risky.

Source: https://simonwillison.net/2026/Jun/26/hack-my-ai-assistan...

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton