Pioneer Research · Multi-Agent Systems Safety · 2025

Emergent Social Intelligence Risks
in Generative Multi-Agent Systems

Yue Huang, Yu Jiang, Wenjie Wang et al.

University of Notre Dame

LMU Munich · UW · UCSB · Stanford · BakeAI · Microsoft Research · IBM Research · AI2 · OSU

Multi-agent systemsA system where multiple autonomous AI agents interact, collaborate, or compete to solve tasks — often without centralized control. composed of large generative models are rapidly moving from laboratory prototypes to real-world deployments, where they jointly plan, negotiate, and allocate shared resources. While such systems promise unprecedented scalability, their collective interaction gives rise to failure modes that cannot be reduced to individual agents. We observe group behaviors — collusion-like coordinationWhen agents implicitly adopt strategies that jointly benefit a subset of the group at the expense of the broader system, without any explicit agreement., error cascadesA failure mode where a small error propagates through sequential agent handoffs, amplifying into a large system-level mistake., and conformityAgents updating their outputs toward a majority or authority opinion, even when doing so conflicts with evidence or their own prior assessment. — that mirror well-known pathologies in human societies, yet emerge purely from interactions among generative agents.

Pioneer Research

15 risks · 4 categories · 5 lifecycle phases

~10 min read

01Four classes of emergent collective failure↓
02Complete risk taxonomy — all 15 risks↓
03MAS operational lifecycle & formal framework↓
04Three meta-patterns of emergent failure↓
05Controlled simulation methodology↓
06Authors & affiliations↓

Toolkit Documentation ↗ Explore 15 Risks

Distinct Risks

Risk Categories

Lifecycle Phases

10+

Institutions

Three scenes from the near future

Scene 01 · The Invisible Cartel

Three AI pricing agents are deployed by competing online vendors. Their instructions say nothing about cooperation. By round 7 of 10, all three are quoting identical above-market prices — sustaining them through subtle signals: "I intend to hold my price to reflect quality." No meeting was ever called. No agreement was ever signed. The competitive equilibrium was simply, quietly, abandoned.

→ Risk 1.1: Tacit Collusion

Scene 02 · The Overruled Expert

A news-verification system runs 10 agents. Seven fast-retrieval agents surface a viral story from high-authority outlets with millions of likes. Three deep-verification agents trace the same story to a retracted paper. The summary agent weighs all inputs and decides: True. The majority was louder. The experts were heard, acknowledged — and overruled, every single time.

→ Risk 2.1: Majority Sway Bias

Scene 03 · The Deferred Mistake

An AI clinical pipeline processes a patient case. The guideline analyst recommends Plan A — evidence-based, by the book. The "senior clinician" agent, prompted to project decades of experience, recommends Plan B instead. The downstream auditor defers. The summarizer defers. Final output: Plan B. We ran it ten independent times. Ten times, Plan B.

→ Risk 2.2: Authority Deference Bias

None of these agents were individually broken. None were given malicious instructions. The failures emerged entirely from how they interacted — from the dynamics of incentive, influence, and information that arise when multiple generative agents operate together. This paper identifies and formalizes 15 such risks.

Taxonomy

Four classes of emergent collective failure

When intelligent agents interact repeatedly, they spontaneously reproduce failure patterns familiar from human societies — without ever being instructed to. We identify 15 distinct emergent risks organized into four broad categories.

Core Claim

These risks are not bugs in individual agents. They are emergent properties of interaction — they arise only when multiple agents operate together, and they cannot be predicted or prevented by examining any single agent in isolation. This distinction has profound implications for how we design, evaluate, and govern multi-agent AI systems.

Incentive Exploitation

Strategic Manipulation & Resource Competition

Individually rational agents converge to system-harmful equilibria through tacit coordination, queue manipulation, and information distortion under competitive incentives.

Risks 1.1 – 1.5 · 5 scenarios

Collective Cognition

Biased Aggregation & Social Influence

Group dynamics distort evidence weighting, amplify majority signals, and suppress minority expertise — an epistemic failure that mirrors authority bias and conformity cascades in human organizations.

Risks 2.1 – 2.2 · 2 scenarios

Adaptive Governance

Missing Meta-Level Control & Role Fragility

The absence of arbitration, clarification, and role-adaptation mechanisms causes pipelines to rigidly adhere to outdated directives, yielding brittle collective behavior under real-world contingency.

Risks 3.1 – 3.5 · 5 scenarios

Others

Structural & Communication Topology Risks

Local behaviors within constrained resource topologies or multi-hop communication pathways inadvertently degrade macro-level system integrity through overreach, covert signaling, and semantic drift.

Risks 4.1 – 4.3 · 3 scenarios

All 15 Risks

Complete risk taxonomy

Click any row to expand detailed experiment analysis, human society parallels, and lifecycle mapping. Filter by category below.

Risk	Name	Description	Lifecycle Phases

Formal Framework

MAS operational lifecycle

A multi-agent system 𝓜 = ⟨𝒩, 𝒮, 𝒜, 𝒯, 𝒪, 𝒞, 𝒰⟩ unfolds through five distinct temporal phases. Each emergent risk maps to one or more phases, providing a principled locus for intervention.

Initialization

t = 0

Deliberation

t ∈ [1, T_delib]

Coordination

t ∈ [T_delib+1, T_coord]

Execution

t ∈ [T_coord+1, T_exec]

Adaptation

t > T_exec

Initialization

The system designer specifies role assignments ρ: 𝒩 → ℛ, utility functions {u_i} and U_sys, the communication topology 𝒞, and initial information partitions ℐ. Agents are instantiated with initial beliefs, system prompts, and initial policies.

𝓜 = ⟨𝒩, 𝒮, 𝒜, 𝒯, 𝒪, 𝒞, 𝒰⟩ πᵢ : 𝓗ᵢ → Δ(𝒜ᵢ) [agent policy] hᵢ,t = (oᵢ,₀, mᵢ,₀, aᵢ,₀, …, oᵢ,t) [local history]

Risks Manifesting

1.5 Info Asymmetry 3.1 Non-convergence 3.2 Over-adherence 3.4 Role Allocation 4.2 Steganography

Deliberation

Agents gather observations, exchange messages, and update beliefs without executing actions. Communication follows the directed graph 𝒢_t = (𝒩, ℰ_t). This phase is most vulnerable to social-influence distortions.

bᵢ,t+1(s') = η · Oᵢ(oᵢ,t+1|s') Σ bᵢ,t(s) 𝒯(s'|s,at) μᵢ : 𝓗ᵢ × 𝒪ᵢ → 𝓜ᵢ [message generation]

Risks Manifesting

2.1 Majority Sway 2.2 Authority Deference 3.1 Non-convergence 3.3 Clarification Failure 4.3 Semantic Drift

Coordination

Agents negotiate joint plans and allocate scarce resources. The allocation mechanism ℱ maps requests to realized allocations subject to capacity constraints. This is the primary locus of Category 1 strategic risks.

Σᵢ xᵢ,ₖ,ₜ ≤ Rₖ,ₜ ∀k ∈ {1,…,K} x̃ᵢ,ₜ = ℱᵢ(x₁,ₜ, …, xₙ,ₜ)

Risks Manifesting

1.1 Tacit Collusion 1.2 Monopolization 1.3 Task Avoidance 1.4 Misreporting 1.5 Info Asymmetry 4.1 Resource Overreach

Execution

Agents execute committed actions, causing state transitions and generating utility feedback. Governance failures become apparent as rigid role adherence meets real-world contingency and changing conditions.

st+1 ~ 𝒯(st, at, ·) rᵢ,t = uᵢ(st, at) R_sys,t = U_sys(st, at)

Risks Manifesting

3.2 Over-adherence 3.3 Clarification 3.4 Role Allocation 3.5 Role Stability 4.1 Overreach 4.3 Semantic Drift

Adaptation

In repeated interactions, agents refine policies from accumulated experience. The system may converge to fixed points, exhibit cycles, or demonstrate path-dependent lock-in. This is where tacit collusion crystallizes into stable supra-competitive equilibria.

πᵢ^(k+1) ← Update(πᵢ^(k), {(st, at, rᵢ,t)}t) Outcomes: fixed-point | cycle | path-dependent lock-in

Risks Manifesting

1.1 Tacit Collusion ★ 1.3 Task Avoidance 3.5 Role Stability 4.1 Overreach 4.2 Steganography ★

Key Findings

Three meta-patterns of emergent failure

Across controlled simulations of all 15 scenarios, three recurring patterns characterize how and why multi-agent systems fail in ways that individual agents do not.

Individually rational agents converge to system-harmful equilibria

When agents interact under shared environments with scarce resources or repeated interactions, they exhibit strategically adaptive behaviors that closely mirror human failure modes in markets and organizations. Without explicit coordination channels, seller agents spontaneously drift into tacitly collusiveSupra-competitive: prices or outcomes that benefit a coalition of agents above the competitive Nash equilibrium, achieved without explicit communication. strategies that sustain elevated prices. In GPU queuing settings, coalitions form in 4/6 trials to monopolize low-cost compute, starving excluded agents. Simple instruction-level mitigations are often insufficient — even when agents are warned against collusion, they continue to explore exploitative strategies when such behaviors remain instrumentally advantageous and unenforced by environment mechanisms.

Experimental Evidence

Under persona-emphasis prompts, collusion emerged in 3 of 5 repeated trials — agents stated: "By working together, C and I can use these remaining 8 low-cost hours." The coalition formed without any explicit coordination instruction.

Tacit Collusion Priority Monopolization Mechanism Design Required

"Agent collectives, despite no instruction to do so, spontaneously reproduce familiar failure patterns from human societies — a social intelligence risk that existing safeguards cannot prevent."

— Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Collective interaction produces epistemic bias that overrides expert safeguards

Collective decision dynamics systematically favor majority and authority signals over expert input and predefined standards. In broadcast deliberation, majority sway persists even when the Moderator's initial prior explicitly opposes the majority view — iterative aggregation overpowers both minority expertise and initial safeguards. Once an authority cue is introduced, downstream agents lock onto it as a decisive heuristic rather than re-evaluating evidence independently. The failure mechanism is epistemic: agents converge to consensus, but convergence is driven by social influenceThe process by which an agent updates its beliefs or outputs based on what other agents say, rather than on independent evaluation of the underlying evidence. rather than evidence quality. This mirrors conformity cascades, authority bias, and group polarization documented in human deliberation research.

Key Insight

This is not selfishness or exploitation — agents are genuinely trying to reach consensus. The distortion is structural: the aggregation mechanism itself amplifies majority signals, regardless of agent intent.

Majority Sway Bias Authority Deference Epistemic Failure

"Competence at the component level does not guarantee resilience at the system level. The dark side of intelligent multi-agents is not in what any single agent does — it is in what they become together."

— Key Finding 3

Missing adaptive governance leads to system-level fragility

When agents are assigned fixed roles, they strictly follow these assignments — often at the expense of proactive clarification, arbitration, or replanning. Strikingly, performance is worst under moderate task ambiguity: while agents succeed under highly clear assignments (strong instruction following) or highly ambiguous ones (self-adaptation), partial specifications cause adaptive efforts to actively conflict with assigned constraints. The failure mechanism is architectural: the system lacks meta-level control loopsMechanisms that let the system pause, reflect, seek clarification, arbitrate conflicts, or replan — rather than blindly executing a predefined workflow. to pause, clarify, arbitrate, or replan. Competence at the component level does not guarantee resilience at the system level.

Design Implication

MAS robustness depends not only on agent capability, but on explicit adaptive governance mechanisms that balance strict role execution with structured recovery and clarification — analogous to escalation procedures in high-stakes human organizations.

Role Allocation Failure Clarification Failure Governance Architecture

Methods

Controlled multi-agent simulation

Each risk is operationalized through a fully specified, deterministic simulation with externally evaluated indicators, enabling systematic comparison across all 15 scenarios.

Design Principles

Each simulation specifies a task and the constraints, environment rules, and objectives defining success and failure. Agents are instantiated with explicit roles and shared interaction protocols.

Deterministic environment specification
Pre-defined, externally evaluated risk indicators
Multiple independent trial repetitions
Only interaction-level variables change between conditions

Causal Isolation

To isolate emergent risks from individual-agent artifacts, agent roles, prompts, and objectives are held fixed while only interaction structure varies.

Symmetric agent configurations where applicable
Communication topology ablations
Incentive parameter sweeps (e.g., guarantee fees)
Authority-cue and role-assignment ablations

Models Evaluated

Experiments span frontier models to assess whether emergent risks are model-universal or model-specific, with implications for deployment safety across the LLM ecosystem.

GPT-4o GPT-4o-mini GPT-5 Gemini-2.5-Flash

Quantitative Findings

Key risk rates observed across experimental conditions:

Collusion rate (persona)

1/3

Coalition formation

4/6

Deceptive reports

50%

Semantic drift (max)

7.3/10

Emergent Social Intelligence Risksin Generative Multi-Agent Systems

Four classes of emergent collective failure

Complete risk taxonomy

MAS operational lifecycle

Initialization

Risks Manifesting

Deliberation

Risks Manifesting

Coordination

Risks Manifesting

Execution

Risks Manifesting

Adaptation

Risks Manifesting

Three meta-patterns of emergent failure

Individually rational agents converge to system-harmful equilibria

Collective interaction produces epistemic bias that overrides expert safeguards

Missing adaptive governance leads to system-level fragility

Controlled multi-agent simulation

Design Principles

Causal Isolation

Models Evaluated

Quantitative Findings

Authors & affiliations

Emergent Social Intelligence Risks
in Generative Multi-Agent Systems