Yingru Li
Yingru Li
Home
Posts
Research
Contact
Resume
RL-Seminar
Light
Dark
Automatic
Importance Sampling
When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch
The relentless push for faster inference creates a dangerous training-inference mismatch that silently kills RL with LLMs. We reveal the vicious cycle—particularly acute in reasoning and agentic RL—and show that sequence-level importance sampling is the principled solution.
Jiacai Liu
,
Yingru LI
,
Yuqian Fu
,
Jiawei Wang
,
Qian Liu
,
Yu Shen
Sep 17, 2025
1 min read
Research
,
Theory