Yingru LI

Yingru LI

Ph.D. Candidate

The Chinese University of Hong Kong

About me

I am a final-year Ph.D. candidate at The Chinese University of Hong Kong (CUHK), Shenzhen, China. I am advised by Zhi-Quan (Tom) Luo. My doctoral research is generously supported by various prestigious fellowships. Previously, I received the bachelor degree in Computer Science (ACM Honors Program) from Huazhong University of Science and Technology. I was a research visiting student at Cornell University with John E. Hopcroft.

I initiated and organized the reinforcement learning seminar from 2019 to 2023.

Latest Updates (Swipe to Explore More! Follow me on X/Twitter!)

✈️July 2024: I’ll be at #ICML2024 in Vienna from July 21st to 27th! Also, don’t miss my lightning talk on “Agile Human-AI Collaboration for #RiskOversight” at the #AlignmentWorkshop on 21st and 22nd. Check out the details in tweets and ICML events.

🎉July 2024: Congratulations to the team! We received the Best Student Paper Award at IEEE SAM!

💻July 2024: I will deliver an invited talk at the International Symposium on Mathematical Programming (ISMP), Montréal. The ISMP is the leading triennial conference focusing on mathematical optimization.

💻June 2024: Will present at (RLCN). Find slides here.

💻May 2024: AISTATS, Valencia, Spain. Our paper offers the first prior-dependent analysis of PSRL under function approximation. This helps understand how integrating prior knowledge like historical data or pre-trained models (LLMs) enhances RL agent efficiency.

💻May 2024: Remote presentation HyperAgent at the ICLR in Vienna, Austria, during the Workshop on Bridging the Gap Between Practice and Theory in Deep Learning. HyperAgent represents a significant stride towards aligning theoretical foundations with practical deep RL applications.

💻March 2024: Two Talks at the Informs Optimization Society (IOS) Conference at Rice University. (1) “HyperAgent: A simple, efficient, scalable and provable RL framework for complex environments” and (2) “A Tutorial on Thompson Sampling and Ensemble Sampling”.

🎉Jan 2024: HyperAgent received Best Paper Award in the third doctoral and postdoctoral Daoyuan academic forum.

✈️December 2023: NeurIPS, New Orleans 🚀 My research addresses efficiency challenges in reinforcement learning (RL). It encompasses both theoretical aspects of high-dimentional probability and practical applications in Deep RL [1]. I have developed a novel random projection tool for high-dimensional sequentially dependent data, a non-trivial martingale extension of Johnson–Lindenstrauss [2]. 🚀

Research Highlights

I work on algorithms and theoretical foundations for interactive agents. My focus is on ensuring these agents operate reliably and safely in complex, uncertain, and human-in-the-loop environments, aligning their decisions with human objectives. This work necessitates advancements in methods for knowledge and uncertainty representation, exploration, adaptation, and decision-making. To achieve these goals, I use and develop fundamental tools in probability, optimization, game theory, and information theory. My methods have been applied to human-AI alignment and reliable & strategic operations. The significance of my work has been recognized through invitations to speak at prestigious forums, including ICML, NeurIPS, ICLR, AISTATS, ISMP and INFORMS Annual Meetings, and through awards, such as the Best Paper Award at the 2024 Daoyuan academic forum and the Best Student Paper Award at the 2024 IEEE SAM. (See full publication list in the resume).

Click to view the short research statement

Key Contributions

One notable project is “HyperAgent,” designed to quantify and resolve epistemic uncertainty on optimal value $Q^\star$ for scalable real-time sequential decision-making. HyperAgent demonstrates significant gains in data and computational efficiency in large-scale deep reinforcement learning (RL) benchmarks, such as the Atari suite. It has also shown effectiveness in human-AI alignment and collaboration, such as GPT-HyperAgent for content moderation with human feedback. Theoretical analysis of HyperAgent confirms that with logarithmic per-step computational complexity, its performance matches exact Thompson sampling (TS) in linear contextual bandits and Randomized Least-Square Value Iteration (RLSVI) in tabular RL environments. This analysis is grounded in the first probability tool for sequential random projection that I developed.

  • HyperAgent: Efficient, scalable real-time decision-making.
    • Data and computation efficiency: Significant gains in deep RL benchmarks.
    • Applications: Human-AI alignment, e.g., content moderation with human feedback.
    • Theory: Matches TS and RLSVI with logarithmic computational complexity, proved via fundamental probability tools I developed.

Another key area of my research is game-theoretic decision-making, focusing on minimizing adversarial regret in repeated unknown games. This includes real-world applications where strategic agents learn to collaborate in traffic routing and compete in radar sensing. In these applications, the utility function is typically unknown at the start of the repeated games and requires learning from feedback after the strategic moves of multiple agents. Additionally, while the opponent’s strategic behavior is initially unknown, the revealed actions and game outcomes can be observed. Real-world applications usually exhibit special game structures due to domain prior knowledge, such as the structural properties of the utility function that depend on the joint actions of each agent and the opponents’ history-dependent strategic behavior. For example, the utility functions in traffic routing and radar sensing have polynomial and linear structures, respectively. I have developed frameworks that integrate structure-aware modeling with no-regret learning and optimization, resulting in significant sample budget savings.

  • Game-theoretic Decision-making: Minimizing adversarial regret in repeated unknown games.
    • Applications: Collaboration in traffic routing and competition in radar sensing, achieving significant budget savings.
    • Frameworks: Synergy between domain knowledge-enhanced modeling and no-regret learning/optimization.

Currently, I am developing reliable and safe solutions for healthcare operations through inference-time algorithms. These algorithms leverage powerful cloud computing services on foundation models while augmenting necessary algorithmic modules in end devices. For example, controlling large language model (LLM) decoding towards high outcome feedback and minimum constraint violations via learned $Q^\star$. This is especially important for goal-conditioned sequential decision tasks that involves the LLMs while ensuring safety. Specifically, in healthcare inpatient flows, we employ LLM as multi-turn conversational agents to help doctor for various tasks and meanwhile enforces LLM agent to follow established rules.

  • Reliable and Safe Operations: for healthcare, customer and business services.
    • Inference-time algorithms: Leveraging cloud services and augmenting end devices.
    • Control of LLM decoding: Guided by learned $Q^\star$ for high outcome feedback and minimum constraint violations.

This research aims to advance the field of interactive agents, contributing to both theoretical understanding and practical applications.

research statement

Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

We prove HyperAgent closes a theoretical gap in scalable exploration. Further, GPT-HyperAgent addresses risk and efficiency challenges in human-Al interplay for automated content moderation with human feedback.

Learning an Opponent-aware Anti-jamming Strategy via Online Convex Optimization
Radar Anti-jamming Strategy Learning via Domain-knowledge Enhanced Online Convex Optimization
Controlled Decoding via Q-Star on Outcome Feedback for Language Models
Uncertainty-aware Multi-turn Language Agents for Medical Decision-making
Hidden community detection in social networks


szrlee [at] gmail [dot] com