Yingru Li
Yingru Li
Home
Posts
Research
Contact
Resume
RL-Seminar
Light
Dark
Automatic
Deep Learning
Information Bandwidth in Reinforcement Learning
An information-theoretic analysis showing that scalar advantage formulations learn ≤ log₂(B) bits per episode, while per-timestep advantages preserve full reward entropy.
Yingru LI
Last updated on Nov 4, 2025
16 min read
Research
,
Theory
When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch
The relentless push for faster inference creates a dangerous training-inference mismatch that silently kills RL with LLMs. We reveal the vicious cycle—particularly acute in reasoning and agentic RL—and show that sequence-level importance sampling is the principled solution.
Jiacai Liu
,
Yingru LI
,
Yuqian Fu
,
Jiawei Wang
,
Qian Liu
,
Yu Shen
Sep 17, 2025
1 min read
Research
,
Theory
Mathematical Formulations of Rollout Correction Methods
Definitive mathematical formulations for rollout correction methods in VeRL, progressing from REINFORCE to PPO to Decoupled PPO. Handles policy mismatch, temporal lag, replay buffers, and off-policy algorithms with importance sampling and rejection sampling techniques.
Yingru LI
Nov 4, 2024
1 min read
Research
,
Theory
,
Documentation
HyperAgent - A Simple, Efficient, Scalable and Provable RL Framework
Practically and provably efficient RL under resource constraints!
Mar 23, 2024 1:30 PM
Rice University
Yingru LI
Slides
Video
Follow
HyperAgent - A Simple, Efficient and Scalable RL Framework for Complex Environments
Practically and provably efficient RL under resource constraints!
Jan 13, 2024 1:20 PM
Daoyuan Building
Yingru LI
Slides
Follow
News
Towards AGI for Humanity through Efficient Reinforcement Learning
Addressing efficiency chanllenge in RL by HyperFQI algorithm
Oct 21, 2023 2:30 PM
Teaching B Building
Yingru LI
Slides
Follow
HyperDQN - Randomized Exploration for Deep Reinforcement Learning
Dec 14, 2021 12:00 AM
NeurIPS 2021
Yingru LI
Slides
Video
Follow