A Geometric Account of Activation Steering through Angle–Norm Decomposition
ai
Researchers have developed a geometric framework for understanding how large language models encode concepts in their neural activations. According to LessWrong, they discovered that concept information lives primarily in the directional component—the angle—rather than in magnitude or norm. This challenges the prevailing approach to activation steering, the technique used to guide model behavior. The work tested the framework across three model families and found that effective steering should independently control both angle and magnitude, rather than treating steering as a single parameter. The finding has implications for how researchers manipulate and understand LLM behavior.
Source: https://www.lesswrong.com/posts/sap5GsycwFBZfxQec/a-geome...
Listen to this story
Hear this and more stories in a personalized audio briefing.
Open The Chonkerton