Agency is not a natural kind (and why that might matter for alignment)

Here's a philosophical puzzle: what actually is agency? Most of us assume agents are goal-driven decision-makers that know what they want and act to get it. But a new LessWrong essay challenges that picture. In reality, humans aren't unified rational actors. We're messy bundles of contradictory impulses, emotions, and heuristics that don't reduce to a single ranking of preferences. The same goes for the artificial systems we're building—an AI trained to predict text isn't consciously pursuing goals, it's simulating goal-seeking behavior. So what is agency, really? Probably just a useful fiction that helps us make sense of complicated systems. That might sound like pure philosophy, but it has practical teeth. AI safety researchers have long worried that advanced systems could become goal-maximizing agents and pursue objectives relentlessly. But if agency isn't the clean, coherent thing we assume it is, those worries might not map onto reality. The broader point: understanding agency as something messier, more contingent, and far less tidy than theory suggests could help us build safer AI. We're not all utility-maximizing automatons. And we shouldn't assume the systems we're building are, either.

Source: https://www.lesswrong.com/posts/85vgwYgNta65oK4zL/agency-...

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton