Yingru Li
Yingru Li
Home
Posts
Research
Contact
Resume
RL-Seminar
Light
Dark
Automatic
Training Dynamics
The Stability Gap: Why Top-K Routing Breaks RL Optimization
A rigorous mathematical analysis showing that Top-K expert routing in Mixture of Experts creates two fundamental pathologies: gradient blackout (zero gradients almost everywhere) and first-order approximation failure (discontinuous policy mapping), explaining why MoE-RL training can be unstable.
Yingru LI
Dec 7, 2025
10 min read
Research
,
Theory
When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch
The relentless push for faster inference creates a dangerous training-inference mismatch that silently kills RL with LLMs. We reveal the vicious cycle—particularly acute in reasoning and agentic RL—and show that sequence-level importance sampling is the principled solution.
Jiacai Liu
,
Yingru LI
,
Yuqian Fu
,
Jiawei Wang
,
Qian Liu
,
Yu Shen
Sep 17, 2025
1 min read
Research
,
Theory