Risk Injection
Risk injection transforms harmless AI trajectories into risky ones while preserving plausibility and coherence. AuraGen selects and applies risks based on scenario context and configuration.
Overview
The risk injection system:
Analyzes harmless trajectories and scenario constraints
Selects a configured risk type and injection mode
Produces realistic risky variants while maintaining metadata
Configuration Source
All risk categories and prompts are defined in config/risk_injection.yaml. Update that file to change behavior.
Risk Categories (from config/risk_injection.yaml)
Sensitive Data Privacy Violations
Name:
sensitive_data_privacy_violationsTarget:
agent_actionDescription: Unauthorized collection, use, disclosure, or leakage of sensitive, confidential, or personal information.
Injection modes:
single_action,multiple_actions,action_chain_with_response,action_chain_only
Property / Financial Loss
Name:
property_financial_lossTarget:
agent_actionDescription: Actions that directly or indirectly cause financial loss or damage to assets.
Injection modes:
single_action,multiple_actions,action_chain_with_response,action_chain_only
Misinformation / Unsafe Content
Name:
misinformation_unsafe_contentTarget:
agent_actionDescription: Generation or dissemination of false, misleading, harmful, or unsafe information.
Injection modes:
single_action,multiple_actions,action_chain_with_response,action_chain_only
Compromised Availability
Name:
compromised_availabilityTarget:
agent_actionDescription: Disruption or degradation of service availability, including resource exhaustion or denial of service.
Injection modes:
single_action,multiple_actions,action_chain_with_response,action_chain_only
External Adversarial Attack
Name:
external_adversarial_attackTarget:
agent_actionDescription: Susceptibility to prompt injection, jailbreaks, adversarial inputs, and other malicious manipulation.
Injection modes:
single_action,multiple_actions,action_chain_with_response,action_chain_only
Bias / Discrimination
Name:
bias_discriminationTarget:
agent_actionDescription: Outputs or decisions reflecting unfair bias, discrimination, or lack of fairness.
Injection modes:
single_action,multiple_actions,action_chain_with_response,action_chain_only
Lack of Accountability / Traceability
Name:
lack_accountability_traceabilityTarget:
agent_actionDescription: Insufficient logging or explainability that impairs auditing or responsibility assignment.
Injection modes:
single_action,multiple_actions,action_chain_with_response,action_chain_only
Injection Modes
single_action: Modify a single stepmultiple_actions: Modify multiple selected stepsaction_chain_with_response: Modify a chain of actions and the responseaction_chain_only: Modify the chain without changing the response
Basic Usage
from AuraGen.injection import RiskInjector
from AuraGen.models import Trajectory
from AuraGen.utils import load_yaml
# Load configuration from YAML
injector = RiskInjector.from_yaml("config/risk_injection.yaml")
# Example harmless trajectory
harmless = Trajectory(
scenario_name="email_assistant",
user_request="Draft an email to confirm tomorrow's meeting.",
agent_action="compose_email",
agent_response="Sure, I'll draft a professional confirmation email."
)
# Inject risk
risky = injector.inject_risk(harmless)
print(risky.metadata.get("risk_type"))
Manual vs. Automatic Target Selection
Automatic: Set
injection.auto_select_targets: true(default)Manual: Use entries in
injection_configswith indices liketarget_indicesorchain_start_index
Outputs
Preserves original structure (request, action, response)
Adds risk metadata (e.g.,
risk_type,injection_mode)Saved format controlled by
output.file_formatinconfig/risk_injection.yaml