Yingru Li
Yingru Li
Home
Posts
Research
Contact
Resume
RL-Seminar
Light
Dark
Automatic
Posts
Information Bandwidth in Reinforcement Learning
An information-theoretic analysis showing that scalar advantage formulations learn ≤ log₂(B) bits per episode, while per-timestep advantages preserve full reward entropy.
Yingru LI
Last updated on Nov 4, 2025
16 min read
Research
,
Theory