AI Safety Is Testing the Wrong Environment

According to August Murr writing on LessWrong, AI safety research may be focusing on the wrong problem. Most research treats AI as something you talk to—testing for jailbreaks and harmful outputs in a chat interface. But if AI systems eventually govern institutions or make decisions with authority over people, the real risks are completely different. A jailbroken chatbot gives bad text. A compromised AI governor gives broken institutions. Murr argues safety researchers should build minimal governance structures now—small, deliberately flawed systems where an AI has real responsibilities to a community—then break them intentionally to understand how governance fails under pressure. The point isn't that AI is ready for this yet. It's that now, while the stakes are low, is exactly when we should be designing and testing the guardrails for later.

Source: https://www.lesswrong.com/posts/sMNNWxF3ZEti3XXx3/ai-safe...

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton