Definitive mathematical formulations for rollout correction methods in VeRL, progressing from REINFORCE to PPO to Decoupled PPO. Handles policy mismatch, temporal lag, replay buffers, and off-policy algorithms with importance sampling and rejection sampling techniques.