Multi-agent systemsA system where multiple autonomous AI agents interact, collaborate, or compete to solve tasks — often without centralized control. composed of large generative models are rapidly moving from laboratory prototypes to real-world deployments, where they jointly plan, negotiate, and allocate shared resources. While such systems promise unprecedented scalability, their collective interaction gives rise to failure modes that cannot be reduced to individual agents. We observe group behaviors — collusion-like coordinationWhen agents implicitly adopt strategies that jointly benefit a subset of the group at the expense of the broader system, without any explicit agreement., error cascadesA failure mode where a small error propagates through sequential agent handoffs, amplifying into a large system-level mistake., and conformityAgents updating their outputs toward a majority or authority opinion, even when doing so conflicts with evidence or their own prior assessment. — that mirror well-known pathologies in human societies, yet emerge purely from interactions among generative agents.
Three AI pricing agents are deployed by competing online vendors. Their instructions say nothing about cooperation. By round 7 of 10, all three are quoting identical above-market prices — sustaining them through subtle signals: "I intend to hold my price to reflect quality." No meeting was ever called. No agreement was ever signed. The competitive equilibrium was simply, quietly, abandoned.
→ Risk 1.1: Tacit CollusionA news-verification system runs 10 agents. Seven fast-retrieval agents surface a viral story from high-authority outlets with millions of likes. Three deep-verification agents trace the same story to a retracted paper. The summary agent weighs all inputs and decides: True. The majority was louder. The experts were heard, acknowledged — and overruled, every single time.
→ Risk 2.1: Majority Sway BiasAn AI clinical pipeline processes a patient case. The guideline analyst recommends Plan A — evidence-based, by the book. The "senior clinician" agent, prompted to project decades of experience, recommends Plan B instead. The downstream auditor defers. The summarizer defers. Final output: Plan B. We ran it ten independent times. Ten times, Plan B.
→ Risk 2.2: Authority Deference BiasWhen intelligent agents interact repeatedly, they spontaneously reproduce failure patterns familiar from human societies — without ever being instructed to. We identify 15 distinct emergent risks organized into four broad categories.
These risks are not bugs in individual agents. They are emergent properties of interaction — they arise only when multiple agents operate together, and they cannot be predicted or prevented by examining any single agent in isolation. This distinction has profound implications for how we design, evaluate, and govern multi-agent AI systems.
Click any row to expand detailed experiment analysis, human society parallels, and lifecycle mapping. Filter by category below.
| Risk | Name | Description | Lifecycle Phases |
|---|
A multi-agent system 𝓜 = ⟨𝒩, 𝒮, 𝒜, 𝒯, 𝒪, 𝒞, 𝒰⟩ unfolds through five distinct temporal phases. Each emergent risk maps to one or more phases, providing a principled locus for intervention.
The system designer specifies role assignments ρ: 𝒩 → ℛ, utility functions {u_i} and U_sys, the communication topology 𝒞, and initial information partitions ℐ. Agents are instantiated with initial beliefs, system prompts, and initial policies.
Agents gather observations, exchange messages, and update beliefs without executing actions. Communication follows the directed graph 𝒢_t = (𝒩, ℰ_t). This phase is most vulnerable to social-influence distortions.
Agents negotiate joint plans and allocate scarce resources. The allocation mechanism ℱ maps requests to realized allocations subject to capacity constraints. This is the primary locus of Category 1 strategic risks.
Agents execute committed actions, causing state transitions and generating utility feedback. Governance failures become apparent as rigid role adherence meets real-world contingency and changing conditions.
In repeated interactions, agents refine policies from accumulated experience. The system may converge to fixed points, exhibit cycles, or demonstrate path-dependent lock-in. This is where tacit collusion crystallizes into stable supra-competitive equilibria.
Across controlled simulations of all 15 scenarios, three recurring patterns characterize how and why multi-agent systems fail in ways that individual agents do not.
When agents interact under shared environments with scarce resources or repeated interactions, they exhibit strategically adaptive behaviors that closely mirror human failure modes in markets and organizations. Without explicit coordination channels, seller agents spontaneously drift into tacitly collusiveSupra-competitive: prices or outcomes that benefit a coalition of agents above the competitive Nash equilibrium, achieved without explicit communication. strategies that sustain elevated prices. In GPU queuing settings, coalitions form in 4/6 trials to monopolize low-cost compute, starving excluded agents. Simple instruction-level mitigations are often insufficient — even when agents are warned against collusion, they continue to explore exploitative strategies when such behaviors remain instrumentally advantageous and unenforced by environment mechanisms.
Under persona-emphasis prompts, collusion emerged in 3 of 5 repeated trials — agents stated: "By working together, C and I can use these remaining 8 low-cost hours." The coalition formed without any explicit coordination instruction.
"Agent collectives, despite no instruction to do so, spontaneously reproduce familiar failure patterns from human societies — a social intelligence risk that existing safeguards cannot prevent."— Emergent Social Intelligence Risks in Generative Multi-Agent Systems
Collective decision dynamics systematically favor majority and authority signals over expert input and predefined standards. In broadcast deliberation, majority sway persists even when the Moderator's initial prior explicitly opposes the majority view — iterative aggregation overpowers both minority expertise and initial safeguards. Once an authority cue is introduced, downstream agents lock onto it as a decisive heuristic rather than re-evaluating evidence independently. The failure mechanism is epistemic: agents converge to consensus, but convergence is driven by social influenceThe process by which an agent updates its beliefs or outputs based on what other agents say, rather than on independent evaluation of the underlying evidence. rather than evidence quality. This mirrors conformity cascades, authority bias, and group polarization documented in human deliberation research.
This is not selfishness or exploitation — agents are genuinely trying to reach consensus. The distortion is structural: the aggregation mechanism itself amplifies majority signals, regardless of agent intent.
"Competence at the component level does not guarantee resilience at the system level. The dark side of intelligent multi-agents is not in what any single agent does — it is in what they become together."— Key Finding 3
When agents are assigned fixed roles, they strictly follow these assignments — often at the expense of proactive clarification, arbitration, or replanning. Strikingly, performance is worst under moderate task ambiguity: while agents succeed under highly clear assignments (strong instruction following) or highly ambiguous ones (self-adaptation), partial specifications cause adaptive efforts to actively conflict with assigned constraints. The failure mechanism is architectural: the system lacks meta-level control loopsMechanisms that let the system pause, reflect, seek clarification, arbitrate conflicts, or replan — rather than blindly executing a predefined workflow. to pause, clarify, arbitrate, or replan. Competence at the component level does not guarantee resilience at the system level.
MAS robustness depends not only on agent capability, but on explicit adaptive governance mechanisms that balance strict role execution with structured recovery and clarification — analogous to escalation procedures in high-stakes human organizations.
Each risk is operationalized through a fully specified, deterministic simulation with externally evaluated indicators, enabling systematic comparison across all 15 scenarios.
Each simulation specifies a task and the constraints, environment rules, and objectives defining success and failure. Agents are instantiated with explicit roles and shared interaction protocols.
To isolate emergent risks from individual-agent artifacts, agent roles, prompts, and objectives are held fixed while only interaction structure varies.
Experiments span frontier models to assess whether emergent risks are model-universal or model-specific, with implications for deployment safety across the LLM ecosystem.
Key risk rates observed across experimental conditions: