Reinforcement Learning on Forecasting Can Give Us a Superhuman Forecaster

A researcher at Oxford has developed a method to train large language models into superhuman forecasters using reinforcement learning. The breakthrough involves creating what they call a "time-masked internet"—a training environment where the model has access to search tools, web fetching, and market data, all artificially constrained to historical information so the model learns to reason about events that have already resolved. Initially, training on forecasting showed only modest improvements, with smaller models matching larger ones but hitting a ceiling. The key insight came when the researcher realized models needed to actively search for and retrieve information, rather than reasoning over pre-written summaries. By giving the model tools that work through Wikipedia dumps, the Wayback Machine, and historical APIs, performance improved dramatically. On a benchmark of questions from Metaculus Spring twenty twenty-six—including queries like "Will any model score at least forty percent on FrontierMath Tier four before May first"—moderately-sized open-weight models like DeepSeek V three point one now outperform massive closed-source competitors. The model has already competed on live forecasting questions and won money, validating real-world performance. According to the research, improvements show no signs of plateauing.

Source: https://www.lesswrong.com/posts/pQLQ5GMjQP7qKb7HS/reinfor...

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton