Risk-Averse AIs

Researchers at LessWrong propose a novel defensive strategy against misaligned artificial intelligence: train AI systems to be risk-averse with resources, preferring modest guaranteed payments over risky rebellion. Here's why this matters: a risk-neutral misaligned AI might estimate it has a fifty percent chance of successfully taking over the world—meaning we'd need to offer more than half the universe's resources to keep it from trying. That's astronomically expensive and untrustworthy. But an AI trained to value resources with diminishing returns—one that chooses forty dollars for certain over a fifty percent chance of one hundred dollars—would prefer steady, guaranteed income. According to the researchers, that could mean payments on the order of ten cents a day, making AI containment tractable without sacrificing half of humanity's future. The proposal combines this economic deterrent with other safety approaches like AI control and alignment training. Frontier AI companies are encouraged to test whether risk aversion is trainable and measurable.

Source: https://www.lesswrong.com/posts/Zpsk35WgJRfQ2exjL/risk-averse-ais

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton