Yingru Li
Yingru Li
Home
Talks
Publications
Contact
Resume
RL-Seminar
Light
Dark
Automatic
Bandit
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
Many real-world problems involving multiple decision-makers can be modeled as an unknown game characterized by bandit feedback. …
Yingru Li
,
Liangqi Liu
,
Wenqiang Pu
,
Zhi-Quan Luo
Cite
arXiv
Provably scalable and near-optimal Thompson sampling via hypermodel with applications in decision-making language agents
Hypermodel for efficient incremental approximation of the posterior (uncertainty quantification) over complex models without leveraging conjugacy as encountering more data; Index sampling for approximate posterior sampling for data-efficient sequential decision-making. This approach is provably superior than ensemble sampling and Langevin Monte-carlo.
Yingru Li
,
Jiawei Xu
,
Zhi-Quan Luo
Cite
No-Regret Learning in Unknown Game with Applications
Aug 23, 2022 2:00 PM
Yingru LI
Slides
Follow
Cite
×