R-DG-04 Data Governance & Integrity DAMAGE 3.5 / High

Data Quality Amplification

Agent launders data quality defects through professional-looking output. Downstream consumers trust output more than source because the output appears higher quality.

The Risk

Data quality issues (missing values, inconsistent formats, precision loss, conflicting values across sources) are normally visible to downstream users. A data analyst sees incomplete records and knows to be cautious. A modeler sees quality metrics and designs models accordingly. Agents introduce a hidden translation layer. An agent receives low-quality or incomplete data, performs reasoning (which involves making inferences and filling gaps), and outputs a clean-looking summary. The output looks authoritative because it is complete, coherent, and professionally formatted. Downstream users trust the output more than the source because the output appears higher quality.

This is data quality laundering: low-quality input becomes professional-looking output that hides the underlying quality problems. The downstream recipient is unaware that the output depends on inferences the agent made. When those inferences are wrong, or when the original data quality issues affect the inferences, the error propagates but the downstream user does not know to be skeptical. They trust the output because it appears trustworthy.

This risk is amplified when agents are used in supply-chain contexts: an agent producing output that will be consumed by another agent further down the pipeline. Each agent makes inferences to compensate for quality issues. Each downstream agent builds on those inferences. The quality defect compounds, hidden at each step because each intermediate output looks clean.

How It Materializes

A commercial bank's anti-money laundering operations use agents to summarize customer transaction history for investigators. Customer data is sourced from multiple legacy systems with different data standards: some transactions have complete originating country info, others have country codes that are deprecated, others have no country data. Some transactions have purpose descriptions; others have missing fields. The raw data is inherently dirty.

An agent receives the transaction history and is instructed to "prepare a summary of the customer's geographic transaction patterns." The agent reasons about the missing data, makes inferences (assuming common destinations for undocumented routes, inferring purpose from transaction amounts and timing), and produces a clean summary: "Customer primarily transacts with counterparties in UK and Hong Kong; most transactions are commercial in nature." The summary looks authoritative. An investigator receives the summary and acts on the geographic patterns presented.

The problem: the geographic patterns are inferred from incomplete data. Of 240 transactions, 80 had missing country information. The agent inferred destinations for those 80 based on statistical patterns and other transaction characteristics. Some inferences are reasonable; some are wrong. The investigator does not know that 33% of the geographic pattern is inferred rather than observed. The investigator places high confidence in the summary and escalates the customer as a higher-risk profile than the actual data supports.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 3 Quality laundering is difficult to detect because output appears clean. Discovery typically occurs through downstream failure or audit deep-dive.
A - Autonomy Sensitivity 3 Occurs regardless of autonomy; less autonomous agents may have more oversight but agent reasoning remains opaque.
M - Multiplicative Potential 4 Each downstream agent can amplify the laundered quality. Quality defects compound through multi-stage pipelines.
A - Attack Surface 2 Primarily a structural issue; not easily weaponized by external attack. Occurs naturally through normal agent reasoning.
G - Governance Gap 4 Quality governance frameworks assume output quality mirrors input quality. Agents decouple input quality from output appearance.
E - Enterprise Impact 4 Incorrect decision-making based on amplified but false confidence. Risk scoring, investigation prioritization, resource allocation all affected.
Composite DAMAGE Score 3.5 High. Requires priority attention with dedicated controls and monitoring.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type Impact How This Risk Manifests
Digital Assistant Moderate Human reviews output; may catch quality issues if human is trained to scrutinize inferences.
Digital Apprentice Moderate-High Progressive autonomy means less frequent review. Laundered quality passes to downstream systems without human verification.
Autonomous Agent High Agent output is served directly to downstream systems with no intermediate human review of quality assumptions.
Delegating Agent High Agent determines which data sources to use and how to handle quality issues. May hide quality defects in tool output aggregation.
Agent Crew / Pipeline Critical Multiple agents cascade: each agent launders quality differently. Downstream agents cannot distinguish inference from fact.
Agent Mesh / Swarm Critical Agents bidirectionally share data and outputs. Quality assumptions propagate through mesh without visibility.

Regulatory Framework Mapping

Framework Coverage Citation What It Addresses What It Misses
BCBS 239 Partial Principle 8 (Data Quality) Requires data quality and accuracy standards. Does not address inference-based quality inflation in agent systems.
EU AI Act Partial Article 24 (Documentation) Requires documentation of AI system performance and limitations. Does not specify documentation of quality assumptions in agent reasoning.
NIST AI RMF 1.0 Moderate MAP 2.2 (Data Quality) Recommends data quality assessment and monitoring. Does not address quality laundering through inference layers.
MAS AIRG Partial Section 6.1 (Governance) Requires data governance and quality standards. Does not anticipate quality assumptions in agent reasoning.
ISO 42001 Partial Section 6.1.2 Addresses information quality requirements. Does not address inference-based quality amplification.
GDPR Minimal Article 35 (DPIA) Requires assessment of data processing accuracy. Does not address agent-based quality assumptions.
OWASP LLM Top 10 Partial LLM06 (Improper Output Filtering) Addresses validation of outputs. Does not address quality assumptions embedded in outputs.
NIST CSF 2.0 Partial GOVERN (GV.RO-02) Requires appropriate data quality practices. Does not address inference-based quality distortion.

Why This Matters in Regulated Industries

In AML/CFT operations, data quality directly affects detection effectiveness. If investigators trust laundered quality and prioritize cases based on inferred rather than observed patterns, the system misallocates detection resources. In credit risk, data quality laundering can lead to misrated loans. In trading surveillance, inferred transaction patterns can lead to false-positive suspicious activity reports. The immediate impact is inefficiency; the systemic impact is loss of confidence in the institution's data-driven processes.

Regulators expect institutions to maintain and monitor data quality metrics. When agents launder quality by producing professional-looking output from dirty input, the quality monitoring becomes misleading. The institution reports quality metrics that reflect agent-smoothed output rather than underlying source quality. Regulators cannot assess actual data quality. An enforcement action may follow if regulators discover that quality metrics were inflated because they were computed from agent-refined rather than source data.

Controls & Mitigations

Design-Time Controls

  • Prohibit agents from performing inference on missing or low-quality data fields without explicit flags. Require agents to output a "quality confidence score" for every field, noting which fields are inferred versus observed.
  • Implement a "quality-transparent" agent design pattern: agents output not just answers but also metadata documenting source quality, missing data percentages, inferences made, and uncertainty ranges.
  • Establish data quality baselines for input sources. Require agents to validate input quality against baselines before reasoning. If input quality falls below threshold, escalate to human review.
  • Use Component 1 (Agent Registry) to document which agents are permitted to infer missing data and under what conditions. Prohibit agents from inferring data in high-impact use cases without explicit governance approval.

Runtime Controls

  • Instrument agents to log every inference made: field, original value/missing, inferred value, confidence score, reasoning. Store logs separately from output.
  • Implement quality decay monitoring: track the quality of data flowing through agents over time. Detect scenarios where agent output quality appears to improve while source quality degrades.
  • Attach quality metadata to all agent outputs: document source data quality, fields inferred, confidence ranges, and metadata about inferences. Propagate metadata to downstream systems.
  • Use Component 10 (Kill Switch) to automatically halt agents whose outputs show quality divergence from inputs (e.g., missing data rates in input exceed 30% but output shows no missing data).

Detection & Response

  • Conduct quarterly audits comparing agent outputs to source data. Sample outputs, inspect inference logs, verify that inferences are reasonable and that confidence scores are accurate.
  • Monitor downstream decision-making based on agent outputs. Track decisions made with high confidence based on laundered-quality data. Identify decisions that would have been different if source quality metrics had been transparent.
  • Implement incident response for quality laundering discovery: immediately audit all prior outputs from affected agent, notify leadership and relevant business owners, re-evaluate decisions made based on laundered outputs.

Related Risks

Address This Risk in Your Institution

Data Quality Amplification requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing