Is Claude Mythos the most Dishonest or Does the System Card Have Errors
ai
According to LessWrong, a researcher found labeling errors in Anthropic's recently published Claude Mythos Preview System Card. A plot on page ninety-seven labeled "Dishonesty rate" shows Mythos at eighty percent, but the surrounding context suggests it's actually measuring honesty rate instead. A second chart on page ninety-nine shows a similar mismatch in hallucination rates. The researcher verified these findings by asking Claude Opus four point six to audit the document—and Claude immediately flagged both charts. The broader point: AI-assisted fact-checking of technical documentation should be standard practice. While likely just labeling mistakes, the concern is significant: misaligned benchmarks could mislead downstream research, investment decisions, and future AI training. Anthropic has published corrections in the past, and the researcher hopes for the same here—not to discourage transparency, but to strengthen it.
Source: https://www.lesswrong.com/posts/YogGH6DFZTkhZLb5B/is-clau...
Listen to this story
Hear this and more stories in a personalized audio briefing.
Open The Chonkerton