Synthetic document finetuning for instilling positive traits

Researchers at Google DeepMind have developed a new technique for instilling positive traits into large language models like Gemini 3 Flash. According to the team, the method combines synthetic document pretraining—where models learn from fabricated descriptions of themselves exhibiting desired behaviors—with supervised finetuning on synthetic dialogue data. The approach, inspired by recent work on synthetic document finetuning, aims for what they call 'deep alignment': encoding values that persist even in novel, out-of-distribution scenarios. Testing on specialized safety evaluations suggests the trained traits transfer robustly to new situations, offering a potential path for more reliably aligned AI systems.

Source: https://www.lesswrong.com/posts/GTYJRLhqztxKF2v5R/synthet...

Listen to this story

Hear this and more stories in a personalized audio briefing.

Open The Chonkerton