paper-conference

Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Divergence-augmented policy optimization