PhD StudentUniversity of Notre Dame

Yue Huang黄跃

Scholar / GitHub / X / LinkedIn / Email / CV

Past · Present

About

I am a second-year PhD student at the University of Notre Dame, advised by Prof. Xiangliang Zhang in the MINE Lab.

Previously at Microsoft Research, MIT-IBM Watson AI Lab, IBM Research, and Tsinghua KEG.

Research

I build humane and humanized AI — systems that share our values, mirror our cognition, and the infrastructure to make it real.

Values

I study what it means for AI to share our values, building landmark benchmarks for trust (TrustLLMICML'24), training models toward honesty and helpfulness (HonestLLMNeurIPS'24), exposing biases in LLM-as-a-judge (Justice or Prejudice?ICLR'25), and developing alignment that puts safety before helpfulness (SPAAAAI'26 Oral, Capability-Oriented RiskICML'26).

Cognition

I probe how models reason and perceive, deciding when to use tools (MetaToolICLR'24), adapting reasoning style to the task (AdaReasonerNeurIPS'25 Spotlight), exposing flaws in social character simulation (TrustSimCOLM'25), diagnosing failure modes at scale (ProbeLLMICML'26), ranking models by question difficulty (RankLLMICLR'26), and probing visual aesthetic judgment (VABCOLM'26). I also explored emotionally-adaptive generative AI (EmoNestNeurIPS'25 Creative).

Infrastructure

I build the toolchain that makes the above possible: frameworks for unified synthetic data (DataGenICLR'25, ChemOrchNeurIPS'25), platforms for dynamic trust evaluation (TrustGenICLR'26), foundational guardrails for safer agentic systems (SafironICLR'26), and a new guardian model paradigm (Guardian-as-an-AdvisorACL'26 Findings).

Latest Preprints

MemoHarness: Agent Harnesses That Learn from Experience

Yue Huang, Wenjie Wang, Han Bao, Yuchen Ma, Xiaonan Luo, et al.

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Zhangchen Xu, Junda Chen, Yue Huang, Dongfu Jiang, Jiefeng Chen, et al.

SkillGen: Verified Inference-Time Agent Skill Synthesis

Yuchen Ma, Yue Huang, Han Bao, Haomin Zhuang, Swadheen Shukla, et al.

NARRA-Gym for Evaluating Interactive Narrative Agents

Yue Huang, Yuchen Ma, Jiayi Ye, Wenjie Wang, Zipeng Ling, et al.

Emergent Social Intelligence Risks overview figure

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Yue Huang, Yu Jiang, Wenjie Wang, Haomin Zhuang, Xiaonan Luo, et al.

Selected Work

ICML
2026

ProbeLLM: Automating Principled Diagnosis of LLM Failures

Yue Huang*, Zhengzhe Jiang*, Yuchen Ma*, et al. — Hierarchical MCTS framework that surfaces structured, recurring failure modes of LLMs.

ICLR
2026

TrustGen: Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models

Yue Huang, Chujie Gao, Siyuan Wu, et al. — Featured by Hoover Institution Technology Policy Accelerator and Notre Dame–IBM Technology Ethics Lab.

ICLR
2026

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Yue Huang, Hang Hua, Yujun Zhou, et al. — Toolkit and model for trustworthy agentic systems.

AAAI
2026

SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization

Yue Huang, Xiangqi Wang, Xiangliang Zhang — Oral. Alignment that puts safety before helpfulness via self-priority optimization.

KDD
2026

Synthetic Interaction Data for Scalable Personalization in Large Language Models

Yuchen Ma, Yue Huang, Wenjie Wang, et al. — PersonaGym synthetic interaction data and PPOpt for scalable personalization.

ICLR
2025

DataGen: Unified Synthetic Dataset Generation via Large Language Models

Yue Huang*, Siyuan Wu*, Chujie Gao, et al. — A unified framework for diverse, accurate, and controllable synthetic dataset generation.

NeurIPS
2025

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

Xiangqi Wang*, Yue Huang*, Yanbo Wang, et al. — Spotlight. Adaptive reasoning that adjusts thinking style to the task.

ICML
2024

TrustLLM: Trustworthiness in Large Language Models

Yue Huang, Lichao Sun, Haoran Wang, et al. — Featured by U.S. DHS, NIST, FLI AI Safety Index, and the Springer textbook Introduction to Foundation Models.

NeurIPS
2025

ChemOrch: Empowering LLMs with Chemical Intelligence via Groundbreaking Synthetic Instructions

Yue Huang*, Zhengzhe Jiang*, Xiaonan Luo, et al. — Highlighted by NSF Center for Computer-Assisted Synthesis (C-CAS).

NeurIPS
2024

HonestLLM: Toward an Honest and Helpful Large Language Model

Chujie Gao*, Siyuan Wu*, Yue Huang*, et al. — Training LLMs toward honesty and helpfulness.

ICLR
2024

MetaTool Benchmark: Deciding Whether to Use Tools and Which to Use

Yue Huang, Jiawen Shi, Yuan Li, et al.

All publications →

News

May 2026

Three papers accepted to KDD 2026; also organizing two tutorials and one workshop at KDD. See you in Jeju Island!

Apr 2026

Three papers accepted to ICML 2026; two to ACL 2026 Findings; one to ACL 2026 Demo.

Mar 2026

Joining Microsoft Research as a research scientist intern (Seattle). One paper at CHI 2026 received an Honourable Mention Award.

Jan 2026

Four papers accepted to ICLR 2026. See you in Rio de Janeiro.

Nov 2025

Selected as Jetstream2 NAIRR AI Fellow. Four papers accepted to AAAI 2026 (SPA and SECURE as oral, RefineLab and RMO as poster). Yujun's LabSafetyBench accepted to Nature Machine Intelligence — congrats!

Oct 2025

Presenting a tutorial on Science of Trustworthy Generative Foundation Models at NeurIPS 2025; co-organizing the Responsible FM workshop.

Sep 2025

Four papers accepted to NeurIPS 2025 (huge thanks to Xiangqi, Yanbo, and all co-authors); EmoNest accepted to NeurIPS 2025 Creative AI — try the demo at the conference. See you in San Diego!

Aug 2025

One paper accepted to EMNLP 2025 Findings; two papers accepted to CIKM 2025.

Jul 2025

Preference Leakage won Best Paper Award at DIG-BUGs@ICML 2025; PsychometricBench won Best Paper Award at SciSocLLM@KDD 2025. One paper accepted to COLM 2025.

May 2025

Two papers accepted to ACL 2025 (1 Main + 1 Findings).

Mar 2025

TrustEval accepted to NAACL 2025 Demo; UPME accepted to CVPR 2025.

Jan 2025

Four papers accepted to ICLR 2025. Selected as KAUST Rising Star in AI (24 / 300+).

Dec 2024

Joining IBM Research as a Research Scientist Intern in Summer 2025. See you in Cambridge, MA.

Sep 2024

HonestLLM accepted to NeurIPS 2024 — congrats Chujie! Another paper accepted to EMNLP 2024 Main.

Aug 2024

Attack LLM-as-a-Judge accepted to ACM CCS 2024. Awarded OpenAI Researcher Access Program.

May 2024

TrustLLM accepted to ICML 2024. Another paper accepted to ACL 2024 Main.

Mar 2024

One paper accepted to NAACL 2024; another paper accepted as a short paper to WWW 2024.

Jan 2024

MetaTool accepted to ICLR 2024.

Full CV & timeline →