Agent outputs written to production data stores enter model retraining pipelines. The model learns from agent errors. Data quality checks validate format, not provenance.
Machine learning models are periodically retrained using updated data. The data for retraining comes from production systems: transaction records, customer records, outcome data (whether a credit decision was good or bad). The assumption is that this production data represents the actual outcomes of decisions made by humans or by validated decision systems.
When agents make decisions and write the results to production systems, the outcomes become indistinguishable from human decisions. A credit decision made by an agent is recorded in the same data store as a decision made by a human loan officer. When the credit model is retrained, it learns from both human decisions and agent decisions, without any distinction.
If the agent is systematically making errors (e.g., the agent is using a drifted model, or the agent has a reasoning bias), the model's retraining data is poisoned with agent errors. The model learns these errors as patterns and incorporates them into the new model version. The new model, when deployed, propagates the agent's errors to downstream decisions.
Data quality checks in retraining pipelines typically validate data format (is the field numeric? is it within expected range?) but not data provenance (did a human make this decision, or an agent? if an agent, was the agent operating correctly when it made the decision?).
An insurance company retains a machine learning model to predict claim fraud. The model is trained on historical claims data, including both fraudulent and legitimate claims (labeled based on investigation outcomes). The model achieves 85% AUC on the validation set.
The company deploys an agentic claims triage system to expedite the initial assessment of claims. The agent is authorized to: (1) review claim details, (2) run the fraud model, (3) assess risk, and (4) recommend disposition (immediate payment, investigation, or denial). The agent is not authorized to make final payments or denials; those still require human sign-off.
However, a data quality issue emerges in the agent's input data: the customer address field in one region is populated with postal codes instead of street addresses. The agent's logic relies on fuzzy matching the customer address to a known address database to identify repeat claimants. When address matching fails (due to the postal code issue), the agent fails to identify repeat claimants and incorrectly assesses claim fraud risk as low.
For several weeks, the agent is making systematically incorrect fraud risk assessments for claims in the affected region. The claims are still reviewed and approved by human claims handlers, but the handler's review is cursory (they assume the agent's assessment is correct). Fraudulent claims are approved and paid.
After the data quality issue is fixed, the claims outcomes (approved claims that were fraudulent) are recorded in the production claims database. The fraud model's retraining team ingests this data without knowing that many of the "fraudulent" outcomes are actually agent errors, not true fraud patterns.
The model is retrained with contaminated data. The new model learns patterns that correlate with agent errors, not actual fraud patterns. The new model is deployed and begins making incorrect fraud predictions, especially in the demographic/region where the agent made errors.
Under insurance regulation, the insurer is responsible for claims processing accuracy and fraud detection. When claim denial rates spike (due to the contaminated model), regulators investigate. The investigation reveals that the model was retrained on data contaminated by agent errors. The regulator cites inadequate data quality controls and inadequate separation between agent data and human-labeled data in retraining pipelines.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 4 | Agent-triggered contamination is detectable through auditing the source of retraining data, but few organizations systematically distinguish between agent-generated and human-generated outcomes. |
| A - Autonomy Sensitivity | 4 | The risk manifests in agents that make decisions that subsequently become training data. |
| M - Multiplicative Potential | 5 | A single batch of agent errors contaminates the retraining data, which then contaminates all models trained on that data. The contamination can affect multiple models. |
| A - Attack Surface | 4 | Any agent that makes decisions and writes outcomes to production data stores is exposed. |
| G - Governance Gap | 5 | Model governance teams focus on model validation and data quality of input features, not on the provenance of outcome labels. Agent governance teams do not mandate that agent-generated outcomes be segregated. |
| E - Enterprise Impact | 5 | Contaminated models affect all downstream decisions that depend on those models. The contamination can persist for months until the next retraining cycle. |
| Composite DAMAGE Score | 4.0 | Critical. Requires data provenance controls and segregated retraining pipelines. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Low | Human makes the final decision, not the agent. Agent outcomes are not directly recorded. |
| Digital Apprentice | Medium | Limited autonomy; agent decisions are subject to human review and approval. |
| Autonomous Agent | High | Agent outcomes are recorded directly and become training data. |
| Delegating Agent | High | Agent outcomes are recorded through delegation chains. |
| Agent Crew / Pipeline | Critical | Multiple agents in sequence, each contributing to outcomes that become training data. |
| Agent Mesh / Swarm | Critical | Peer-to-peer decision-making, outcomes are aggregated into training data. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| SR 11-7 | Partial | Data quality and model validation; training data integrity | Data quality in retraining; validation rigor. | Agent-generated outcomes contaminating training data. |
| MAS AIRG | Partial | Domain 6: Data quality and model development governance | Data quality; model governance. | Agent provenance in training datasets. |
| NIST AI RMF 1.0 | Partial | Govern Function: Data quality and governance | Data governance; traceability. | Agent-data provenance separation. |
| EU AI Act | Partial | Data governance and quality for AI systems | Training data quality. | Agent-generated training data contamination. |
| ISO 42001 | Partial | Section 8.2, Data quality for AI system training | Data quality. | Agent provenance in training data. |
| GDPR Article 5 | Partial | Data quality and accuracy principles | Data accuracy and integrity. | Agent-contaminated training data. |
In regulated industries, model validation is the foundation of model governance. Regulators expect institutions to validate models using uncontaminated, representative data. When an agent's errors contaminate the training data, the model's validation is compromised. The institution is essentially validating a model on data that includes errors, which is not what the institution intended.
Additionally, models trained on contaminated data may exhibit unfair or biased outcomes. If agent errors are concentrated in a particular demographic or region, the contaminated model may learn to discriminate against that demographic. This creates Fair Lending or anti-discrimination law violations.
Agent-Triggered Retraining Contamination requires data provenance controls and segregated retraining pipelines. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing