R-MP-03 Model & Pipeline Interaction DAMAGE 4.0 / Critical

Agent-Triggered Retraining Contamination

Agent outputs written to production data stores enter model retraining pipelines. The model learns from agent errors. Data quality checks validate format, not provenance.

The Risk

Machine learning models are periodically retrained using updated data. The data for retraining comes from production systems: transaction records, customer records, outcome data (whether a credit decision was good or bad). The assumption is that this production data represents the actual outcomes of decisions made by humans or by validated decision systems.

When agents make decisions and write the results to production systems, the outcomes become indistinguishable from human decisions. A credit decision made by an agent is recorded in the same data store as a decision made by a human loan officer. When the credit model is retrained, it learns from both human decisions and agent decisions, without any distinction.

If the agent is systematically making errors (e.g., the agent is using a drifted model, or the agent has a reasoning bias), the model's retraining data is poisoned with agent errors. The model learns these errors as patterns and incorporates them into the new model version. The new model, when deployed, propagates the agent's errors to downstream decisions.

Data quality checks in retraining pipelines typically validate data format (is the field numeric? is it within expected range?) but not data provenance (did a human make this decision, or an agent? if an agent, was the agent operating correctly when it made the decision?).

How It Materializes

An insurance company retains a machine learning model to predict claim fraud. The model is trained on historical claims data, including both fraudulent and legitimate claims (labeled based on investigation outcomes). The model achieves 85% AUC on the validation set.

The company deploys an agentic claims triage system to expedite the initial assessment of claims. The agent is authorized to: (1) review claim details, (2) run the fraud model, (3) assess risk, and (4) recommend disposition (immediate payment, investigation, or denial). The agent is not authorized to make final payments or denials; those still require human sign-off.

However, a data quality issue emerges in the agent's input data: the customer address field in one region is populated with postal codes instead of street addresses. The agent's logic relies on fuzzy matching the customer address to a known address database to identify repeat claimants. When address matching fails (due to the postal code issue), the agent fails to identify repeat claimants and incorrectly assesses claim fraud risk as low.

For several weeks, the agent is making systematically incorrect fraud risk assessments for claims in the affected region. The claims are still reviewed and approved by human claims handlers, but the handler's review is cursory (they assume the agent's assessment is correct). Fraudulent claims are approved and paid.

After the data quality issue is fixed, the claims outcomes (approved claims that were fraudulent) are recorded in the production claims database. The fraud model's retraining team ingests this data without knowing that many of the "fraudulent" outcomes are actually agent errors, not true fraud patterns.

The model is retrained with contaminated data. The new model learns patterns that correlate with agent errors, not actual fraud patterns. The new model is deployed and begins making incorrect fraud predictions, especially in the demographic/region where the agent made errors.

Under insurance regulation, the insurer is responsible for claims processing accuracy and fraud detection. When claim denial rates spike (due to the contaminated model), regulators investigate. The investigation reveals that the model was retrained on data contaminated by agent errors. The regulator cites inadequate data quality controls and inadequate separation between agent data and human-labeled data in retraining pipelines.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 4 Agent-triggered contamination is detectable through auditing the source of retraining data, but few organizations systematically distinguish between agent-generated and human-generated outcomes.
A - Autonomy Sensitivity 4 The risk manifests in agents that make decisions that subsequently become training data.
M - Multiplicative Potential 5 A single batch of agent errors contaminates the retraining data, which then contaminates all models trained on that data. The contamination can affect multiple models.
A - Attack Surface 4 Any agent that makes decisions and writes outcomes to production data stores is exposed.
G - Governance Gap 5 Model governance teams focus on model validation and data quality of input features, not on the provenance of outcome labels. Agent governance teams do not mandate that agent-generated outcomes be segregated.
E - Enterprise Impact 5 Contaminated models affect all downstream decisions that depend on those models. The contamination can persist for months until the next retraining cycle.
Composite DAMAGE Score 4.0 Critical. Requires data provenance controls and segregated retraining pipelines.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type Impact How This Risk Manifests
Digital Assistant Low Human makes the final decision, not the agent. Agent outcomes are not directly recorded.
Digital Apprentice Medium Limited autonomy; agent decisions are subject to human review and approval.
Autonomous Agent High Agent outcomes are recorded directly and become training data.
Delegating Agent High Agent outcomes are recorded through delegation chains.
Agent Crew / Pipeline Critical Multiple agents in sequence, each contributing to outcomes that become training data.
Agent Mesh / Swarm Critical Peer-to-peer decision-making, outcomes are aggregated into training data.

Regulatory Framework Mapping

Framework Coverage Citation What It Addresses What It Misses
SR 11-7 Partial Data quality and model validation; training data integrity Data quality in retraining; validation rigor. Agent-generated outcomes contaminating training data.
MAS AIRG Partial Domain 6: Data quality and model development governance Data quality; model governance. Agent provenance in training datasets.
NIST AI RMF 1.0 Partial Govern Function: Data quality and governance Data governance; traceability. Agent-data provenance separation.
EU AI Act Partial Data governance and quality for AI systems Training data quality. Agent-generated training data contamination.
ISO 42001 Partial Section 8.2, Data quality for AI system training Data quality. Agent provenance in training data.
GDPR Article 5 Partial Data quality and accuracy principles Data accuracy and integrity. Agent-contaminated training data.

Why This Matters in Regulated Industries

In regulated industries, model validation is the foundation of model governance. Regulators expect institutions to validate models using uncontaminated, representative data. When an agent's errors contaminate the training data, the model's validation is compromised. The institution is essentially validating a model on data that includes errors, which is not what the institution intended.

Additionally, models trained on contaminated data may exhibit unfair or biased outcomes. If agent errors are concentrated in a particular demographic or region, the contaminated model may learn to discriminate against that demographic. This creates Fair Lending or anti-discrimination law violations.

Controls & Mitigations

Design-Time Controls

  • Implement a data provenance system: every outcome in the production data store includes metadata indicating whether it was generated by a human, an agent, or a human acting on an agent recommendation.
  • Before deploying an agent that writes outcomes to production systems, establish a governance policy: agent-generated outcomes are segregated from human outcomes during retraining.
  • Implement a data quality framework that includes agent-specific checks: for data generated by agents, check not only format and range, but also consistency with the agent's reasoning logs.

Runtime Controls

  • Deploy a decision provenance tracker: every decision is logged with explicit attribution (human, agent, agent+human). The provenance is attached when outcomes are written to production systems.
  • Implement a segregated retraining pipeline: agent-generated outcomes are not automatically included in model retraining. They are reviewed and validated before inclusion.
  • Monitor for evidence of agent contamination in retraining data. If model retraining results in changes that correlate with agent demographics or time periods, investigate whether contamination occurred.

Detection & Response

  • Implement a retraining data audit: before retraining, analyze the composition of training data. What percentage of outcomes came from agents? Escalate if agent outcomes exceed a threshold (recommend: >10%).
  • After retraining, conduct model validation with data that excludes all agent outcomes. A large gap between clean and full validation indicates contamination.
  • When model performance degradation is detected, investigate whether agent contamination is the root cause. If so, revert the model version and reconstruct the training dataset with agent outcomes removed.

Related Risks

Address This Risk in Your Institution

Agent-Triggered Retraining Contamination requires data provenance controls and segregated retraining pipelines. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing