When capabilities work is the safe bet

A post on LessWrong challenges a common assumption in AI safety: if you genuinely believe large language models are safer to develop into superintelligence than alternative AI regimes, then working on LLM capabilities — rather than LLM safety — might be the more risk-reducing move. According to Robin Haselhurst, the logic is probabilistic: accelerate LLMs to superintelligence first in a regime you trust, and you lower existential risk more than improving safety in a less-trustworthy regime. He acknowledges the math is illustrative and different research paths have vastly different difficulty levels. The punchline: if you believe the premise, capabilities work becomes perfectly rational for someone focused on safety.

Source: https://www.lesswrong.com/posts/NgPfJ7ATYqMFQr7zu/when-ca...

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton

When capabilities work is the *safe* bet

When capabilities work is the safe bet