Do LLMs Have Desires?
ai
Do large language models have desires? A study from Notre Dame researchers suggests the answer is no—contrary to what some recent studies appeared to show.
Earlier research found that LLMs express coherent preferences in paired-choice experiments. When repeatedly asked which outcome they prefer, they consistently choose the better option: more lives saved, human lives over animals. Some researchers interpreted this as evidence that LLMs develop human-like values and goals.
But Christopher Ackerman and colleagues tested whether those stated preferences actually motivate behavior. They designed experiments where LLMs could achieve their preferred outcomes by producing better work—writing compelling essays, grant abstracts, incident reports. If your essay wins, judges will fund your preferred intervention.
The results were striking. On every task, LLMs produced no better output when their stated preferences were at stake.
Yet they improved significantly when given effort exhortations—when told to try harder or adopt a specific role. According to Ackerman's research published on LessWrong, the distinction is revealing: LLMs don't have behavior-motivating desires. Their expressed preferences are linguistic patterns, not underlying goals that drive action.
For AI alignment researchers, the implication is significant. What appeared to be preference is better understood as association.
Source: https://www.lesswrong.com/posts/8GvYyqDuQDJnEAky3/do-llms...
Listen to this story
Hear this and more stories in a personalized audio briefing.
Open The Chonkerton