Deep Learning

Scalable Exploration via Ensemble++

Ensemble++ achieves Thompson Sampling-level exploration with only O(d log T) ensemble directions, enabling scalable uncertainty quantification for neural bandits and beyond.

Yingru LI

Nov 29, 2025 4 min read Research

Mathematical Formulations of Rollout Correction Methods

Definitive mathematical formulations for rollout correction methods in VeRL, progressing from REINFORCE to PPO to Decoupled PPO. Handles policy mismatch, temporal lag, replay buffers, and off-policy algorithms with importance sampling and rejection sampling techniques.

Yingru LI

Nov 4, 2025 1 min read Research, Theory, Documentation

Information Bandwidth in Reinforcement Learning

An information-theoretic analysis showing that scalar advantage formulations learn ≤ log₂(B) bits per episode, while per-timestep advantages preserve full reward entropy.

Yingru LI

Oct 1, 2025 16 min read Research, Theory

Information Bandwidth in Reinforcement Learning

When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch

The relentless push for faster inference creates a dangerous training-inference mismatch that silently kills RL with LLMs. We reveal the vicious cycle—particularly acute in reasoning and agentic RL—and show that sequence-level importance sampling is the principled solution.

jiacai-liu, Yingru LI, Yuqian Fu, Jiawei Wang, Qian Liu, Yu Shen

Sep 17, 2025 1 min read Research, Theory

HyperAgent - A Simple, Efficient, Scalable and Provable RL Framework

Practically and provably efficient RL under resource constraints!

Mar 23, 2024 1:30 PM Rice University

Yingru LI

HyperAgent - A Simple, Efficient and Scalable RL Framework for Complex Environments

Practically and provably efficient RL under resource constraints!

Jan 13, 2024 1:20 PM Daoyuan Building

Yingru LI

HyperAgent - A Simple, Efficient and Scalable RL Framework for Complex Environments

Towards AGI for Humanity through Efficient Reinforcement Learning

Addressing efficiency chanllenge in RL by HyperFQI algorithm

Oct 21, 2023 2:30 PM Teaching B Building

Yingru LI

Towards AGI for Humanity through Efficient Reinforcement Learning

HyperDQN - Randomized Exploration for Deep Reinforcement Learning

Dec 14, 2021 12:00 AM NeurIPS 2021

Yingru LI

HyperDQN - Randomized Exploration for Deep Reinforcement Learning