Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation


Foundation models often struggle with uncertainty when faced with new situations in online decision-making, necessitating scalable and efficient exploration to resolve this uncertainty. We introduce GPT-HyperAgent, an augmentation of GPT with HyperAgent for uncertainty-aware, scalable exploration in contextual bandits, a fundamental online decision problem involving natural language input. We prove that HyperAgent achieves fast incremental uncertainty estimation with $\tilde{O}(\log T)$ per-step computational complexity over $T$ periods under the linear realizable assumption. Our analysis demonstrates that HyperAgent’s regret order matches that of exact Thompson sampling in linear contextual bandits, closing a significant theoretical gap in scalable exploration. Empirical results in real-world contextual bandit tasks, such as automated content moderation with human feedback, validate the practical effectiveness of GPT-HyperAgent for safety-critical decisions. Our code is open-sourced at \url{}.

Preprint. Presentation at ICML 2024 Workshops: (1) “Aligning Reinforcement Learning Experimentalists and Theorists”; (2) “Automated Reinforcement Learning: Exploring Meta-Learning, AutoML, and LLMs”

Summary of technical contributions of HyperAgent

  1. New probability tools in high-dimensional probability and statistics
  • The first probability tool for sequential random projection, a non-trivial martingale extension of Johnson-Lindenstrauss (JL) for adaptively sampled data due to the sequential nature of RL;
  • A unified and simple analysis for JL via high-dimension extension of Hanson-Wright.
  1. Methodology for sequential-decision making
  • Hypermodel: efficient incremental approximation of the posterior (uncertainty quantification) over complex models without leveraging conjugacy as encountering more data;
  • Index sampling: approximate posterior sampling for data-efficient sequential decision-making.
  1. Results for sequential-decision making
  • Practically, our developed HyperAgent demonstrates its robust performance in large-scale deep RL benchmarks with significant efficiency gain in terms of both data and computation;
  • Theoretically, the first method to achieve logarithmic per-step computation and sublinear under tabular episodic RL and linear contextual bandit setups among practically scalable algorithms. At the heart of the analysis is the sequential incremental posterior approximation argument, made possible by the our developed first probability tool for sequential random projection.