Yingru LI

Yingru LI

Ph.D. Candidate

The Chinese University of Hong Kong

About me

I am completing my PhD degree in Computer Science at The Chinese University of Hong Kong (CUHK) under the supervision of Prof. Zhi-Quan (Tom) Luo, with expected graduation in March 2025. Previously, I received my M.S. & B.E. degrees with honors from Huazhong University of Science & Technology, conducted research at Cornell University with Prof. John E. Hopcroft, and gained industry experience at Microsoft Research and Tencent AI & Robotics X.

Research Vision

I develop intelligent agents for complex, uncertain environments with human interaction. Through advances in uncertainty quantification, RL, and LLM reasoning, I bridge foundational theory with scalable algorithms for trustworthy decision-making in critical domains under data scarcity.

Recent Honors

  • Presidential PhD Fellowship
  • SRIBD PhD Fellowship (Gold Class)
  • Tencent AI PhD Fellowship
  • Best Paper Award, 2024 Daoyuan Academic Forum
  • Best Student Paper Award, 2024 IEEE SAM

🎯 I am currently on the job market. View my CV/resumé. 📢 For latest updates, follow me on Twitter/X.

Research Highlights

My research is dedicated to developing trustworthy AI agents that operate reliably in complex, uncertain, and dynamic environments involving human interaction. Data scarcity is the central challenge. By advancing fundamental theory in uncertainty quantification, exploration strategies, and LLM reasoning and decision-making, I design scalable algorithms that enhance the trustworthiness of AI agents. This work bridges foundational theory with practical applications across reinforcement learning (RL) and large language models (LLMs), contributing to both theoretical advancements and impactful real-world solutions.

My contributions have led to significant advancements in reinforcement learning, language model reasoning, and human-AI interaction. Recognized at premier venues such as ICML, NeurIPS, ICLR, AISTATS, ISMP, and INFORMS, my work has also received honors like the 2024 Daoyuan Forum Best Paper and the 2024 IEEE SAM Best Student Paper Award.

Click to view the short research statement

Key Contributions

1. Advancements in Uncertainty Representation and Exploration

To enhance the reliability and robustness of AI agents, I advanced methods for representing and handling uncertainty:

  • Ensemble Sampling Theory:

    • Developed a rigorous analysis establishing ensemble sampling as a scalable approximation of Thompson sampling.
    • Addressed critical challenges in exploration efficiency and scalability in reinforcement learning.
    • Impact on Trustworthiness: Improves agents’ ability to make reliable decisions under uncertainty, enhancing their robustness.
  • Neural Ensemble++ Architecture:

    • Designed to overcome challenges in ensemble scalability and mitigate coupling in shared-layer architectures.
    • Enhanced uncertainty quantification, facilitating more effective exploration in deep RL models.
    • Impact on Trustworthiness: Provides accurate uncertainty estimates, leading to more trustworthy decision-making.

2. Data-Efficient Reinforcement Learning and Multi-Agent Strategic Learning

Efficient learning under data scarcity is crucial for trustworthy AI agents operating in real-world environments:

  • HyperAgent Framework:

    • Created HyperAgent to quantify and resolve epistemic uncertainty in optimal value estimation ($Q^\star$) based on Ensemble++.
    • Achieved 10x improvement in data and computational efficiency on deep RL benchmarks such as Atari.
    • Impact on Trustworthiness: Ensures consistent performance even with limited data, enhancing reliability.
  • Memoire Framework and Divergence-Augmented Policy Optimization:

    • Developed a distributed deep RL system utilizing prioritized sampling, achieving 10x throughput.
    • Introduced methods to stabilize policy optimization with off-policy data reuse.
    • Impact on Trustworthiness: Enhances stability and robustness in learning, crucial for dependable AI agents.
  • Frameworks for Unknown Repeated Games:

    • Integrated structure-aware modeling with no-regret learning.
    • Enabled efficient collaboration and competition in domains such as traffic routing and radar sensing.
    • Impact on Trustworthiness: Facilitates predictable and fair agent behavior in multi-agent settings.

3. Enhancing Language-Based Reasoning, Decision-Making, and Human-AI Interaction

Transparency and explainability are key aspects of trustworthiness, addressed through advancements in language-based reasoning:

  • Uncertainty-Guided Search Strategies in LLMs:

    • Applied Ensemble++ & Thompson sampling to improve complex multi-step reasoning in LLMs.
    • Achieved significant advancements in solving intricate mathematical problems.
    • Impact on Trustworthiness: Improves explainability and confidence in AI reasoning processes.
  • Hospital Referral Agent:

    • Implemented a conversational agent serving 16 hospitals nationwide.
    • Designed a multi-agent role-play framework to generate synthetic data, addressing privacy concerns.
    • Impact on Trustworthiness: Enhances reliability and efficiency in critical medical decision-making.
  • Human-AI Collaboration Frameworks:

    • Explored methods to enhance the interplay between humans and AI systems.
    • Effectively handled uncertainty and prioritized human oversight where necessary.
    • Impact on Trustworthiness: Promotes ethical operation and respect for human judgment, essential for user trust.

This research collectively enhances the trustworthiness of AI agents by addressing fundamental challenges and providing scalable, reliable solutions applicable to real-world scenarios.

research statement

.js-id-selected
Uncertainty-guided Search for Multi-step Reasoning in LLMs
Scalable Exploration via Ensemble++
Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

We prove HyperAgent closes a theoretical gap in scalable exploration. Further, GPT-HyperAgent addresses risk and efficiency challenges in human-Al interplay for automated content moderation with human feedback.

Multi-turn Actor-critic Language Agents for Hospital Outpatient Referral
HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

TL;DR: We design a practical randomized exploration method to address the sample efficiency issue in online reinforcement learning.

Divergence-augmented policy optimization

Stabilizing policy optimization when off-policy data are reused, addressing the data efficiency issue in RL for real-world problems.

Hidden community detection in social networks

Contact

szrlee [at] gmail [dot] com

Hits