Power Laws in NNs: A Possible Mechanism for Inductive Bias towards Sparse Representations

A new post from LessWrong explores why neural networks develop power-law distributions—fat-tailed, extreme statistical patterns that diverge sharply from bell curves. Rather than traditional Gaussian weight distributions, trained networks naturally accumulate power-law structure over time. The author connects this to Heavy-Tailed Self-Regularization research, which suggests something striking: you could predict generalization performance just by examining a network's weight matrix structure, without running any tests. The deeper claim is that power-law distributions may be the mechanism steering networks toward sparse, factored representations—the discrete, interpretable features that researchers have observed in practice. In other words, networks don't naturally evolve toward smooth, continuous patterns. Instead, the math pushes them toward chunked, modular thinking.

Source: https://www.lesswrong.com/posts/QR4AgigAopdz9LvQk/power-l...

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton