VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

A newly published arXiv paper introduces VibeThinker, a three billion parameter language model that reportedly achieves reasoning performance exceeding Claude Opus four point five. According to the research, the model uses a novel training approach combining supervised fine-tuning and a technique called GRPO. The work suggests that with optimized training methods, smaller models can potentially match the reasoning capabilities of much larger systems.

Source: https://arxiv.org/abs/2606.16140

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton