NeurIPS 2025 NeurIPS 2025 Tutorial

Science of Trustworthy Generative Foundation Models

Date: December 2, 2025
Location: Mexico City, Mexico
Venue: Hilton Mexico City Reforma
Room: Don Alberto 1

Tutorial Speaker

Yue Huang
Yue Huang
University of Notre Dame

Panel Discussion

Canyu Chen
Canyu Chen
Northwestern University
Panel Host
Maarten Sap
Maarten Sap
Carnegie Mellon University
Panelist
Nouha Dziri
Nouha Dziri
Allen Institute for AI
Panelist
Pin-Yu Chen
Pin-Yu Chen
IBM Research
Panelist
Max Lamparth
Max Lamparth
Stanford University
Panelist

Abstract

We are living through a moment that once belonged to science fiction: generative foundation models can write, reason, design, diagnose, and increasingly, decide. They are no longer just predicting the next word — they are shaping knowledge, influencing choices, and becoming collaborators in science, medicine, education, and daily life. But here's the tension: as their capabilities accelerate, our ability to trust them has not kept pace.

Trustworthiness can't remain a "patch after the failure" or a moral hope layered on top of engineering. It must evolve into a science—a discipline as rigorous as the one that created these models in the first place. In this tutorial, we explore what that science looks like: how we understand model behaviors, measure and stress-test trust, and design systems that earn it. We'll build the foundations together, then step into the frontier—where models begin to exhibit human-like cognitive behaviors that inspire wonder, but also demand responsibility and new forms of alignment.

This session is an invitation: to move beyond building models that impress us, toward building models we can trust with what matters.

Human and AI collaboration

Tutorial Outline

Part 1: Background

Generative foundation models are rapidly evolving from pattern imitators into reasoning, decision-shaping systems that influence science, healthcare, education, and society. Yet alongside remarkable breakthroughs, we've seen hallucinations presented as facts, biased outputs amplified at scale, and models behaving unpredictably when the world shifts even slightly from their training data. This section grounds the audience in how these models work, why failures emerge, and why "trustworthy by design" is no longer optional—it is the prerequisite for real-world deployment.

Part 2: Principles

To build generative models that society can rely on, we must anchor them in clear principles that go beyond performance metrics. Trustworthy models must be reliable, safe, fair, transparent, and aligned with human values—not only in average cases, but especially in ambiguous, high-stakes, and cross-cultural contexts. This section defines the north-star principles that shape how such systems should behave, and what it truly means for a generative model to be worthy of trust.

Part 3: Foundations

Trustworthiness is not a single metric—it is a multi-dimensional, evolving framework. In this section, we introduce key dimensions that illustrate how trustworthiness manifests in generative models, such as fairness, safety, robustness, and machine ethics, along with others that shape responsible behavior. Rather than treating these as a fixed checklist, we focus on how to evaluate, stress-test, and strengthen trustworthiness across diverse contexts and stakeholders. The aim is to equip the audience with a flexible mental model and a set of practical strategies to assess and enhance trustworthiness throughout the model's lifecycle.

Part 4: Challenges & Future Directions

As models become more capable, the bar for trustworthiness rises. We face unresolved challenges: trustworthiness is context-dependent and hard to define, must be dynamically interpreted as models evolve, and has to hold even at the edges and tails of rare, high-impact scenarios. Progress will require deep interdisciplinary collaboration and a new research agenda to address emerging human-like cognitive behaviors—self-reflection, strategic reasoning, persuasion, even deception—that bring advanced and unprecedented AI risks. We close by outlining the frontier questions that will shape the next generation of trustworthy generative AI.

Target Audience

This tutorial is designed for a diverse audience including:

Key References

Contact

For questions about this tutorial, please contact the organizers at yhuang37@nd.edu