Yingru Li
Yingru Li
Home
Posts
Research
Contact
Resume
RL-Seminar
Light
Dark
Automatic
Learning Rate Scheduling
Beyond Precision: Why Training-Inference Mismatch is an Optimization Problem and How Simple LR Scheduling Fixes It
RL training for LLMs is notoriously unstable. While recent studies attribute this to training-inference mismatch from hybrid engines, we show this is not merely a static numerical issue, but a dynamic problem coupled with the model’s optimization trajectory. We propose a specialized Learning Rate Scheduler that decays LR as gradient noise rises, using response length surge as a reliable early indicator of impending instability.
Yaxiang Zhang
,
Yingru LI
,
Jiacai Liu
,
Ziniu Li
,
Jiawei Xu
,
Qian Liu
Dec 20, 2025
1 min read
Research
,
Theory