Reinforcement Learning

Information Bandwidth in Reinforcement Learning

An information-theoretic analysis showing that scalar advantage formulations learn ≤ log₂(B) bits per episode, while per-timestep advantages preserve full reward entropy.

Yingru LI

Oct 1, 2025 16 min read Research, Theory

Information Bandwidth in Reinforcement Learning

When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch

The relentless push for faster inference creates a dangerous training-inference mismatch that silently kills RL with LLMs. We reveal the vicious cycle—particularly acute in reasoning and agentic RL—and show that sequence-level importance sampling is the principled solution.

Jiacai Liu, Yingru LI, Yuqian Fu, Jiawei Wang, Qian Liu, Yu Shen

Sep 17, 2025 1 min read Research, Theory

HyperAgent - A Simple, Efficient, Scalable and Provable RL Framework

Practically and provably efficient RL under resource constraints!

Mar 23, 2024 1:30 PM Rice University

Yingru LI

HyperAgent - A Simple, Efficient and Scalable RL Framework for Complex Environments

Practically and provably efficient RL under resource constraints!

Jan 13, 2024 1:20 PM Daoyuan Building

Yingru LI

HyperAgent - A Simple, Efficient and Scalable RL Framework for Complex Environments

Towards AGI for Humanity through Efficient Reinforcement Learning

Addressing efficiency chanllenge in RL by HyperFQI algorithm

Oct 21, 2023 2:30 PM Teaching B Building

Yingru LI

Towards AGI for Humanity through Efficient Reinforcement Learning

HyperDQN - Randomized Exploration for Deep Reinforcement Learning

Dec 14, 2021 12:00 AM NeurIPS 2021

Yingru LI

HyperDQN - Randomized Exploration for Deep Reinforcement Learning