R-QM-01 Quality & Measurement DAMAGE 3.4 / High

Data Sigma Ceiling

Quality of input data constrains maximum achievable agent quality. Raw enterprise data at 3.5 sigma caps everything built on it.

The Risk

Six Sigma methodology defines quality in terms of sigma, a measure of variation and defects per million opportunities (DPMO). A process operating at 6 sigma produces 3.4 defects per million opportunities. A process at 4 sigma produces 6,210 defects per million opportunities. Most business processes operate at 3 to 4 sigma.

Data quality is measured in the same framework. Data at 3.5 sigma means that for every million data points, approximately 22,750 are incorrect or out of specification. This might mean: 22,750 customer addresses are wrong, 22,750 transaction amounts are incorrectly recorded, 22,750 customer ages are false.

When agents operate on data, the agents' output quality is constrained by the input data quality. Even if the agent's reasoning is perfect (4 sigma process), if it operates on 3.5 sigma data, the agent's output cannot be better than 3.5 sigma. The agent propagates input data errors through its reasoning and decision-making.

The governance challenge is that data quality is often not measured using sigma methodology. Organizations might say "90% of our data is clean" (which sounds good) when 90% accuracy is actually 3.4 sigma (16,000 defects per million). Agents deployed on this "90% clean" data operate at a 3.4 sigma ceiling, regardless of how sophisticated the agent's reasoning is.

How It Materializes

A regional bank has customer demographic data (names, addresses, phone numbers, employment details) stored in a legacy customer database. The data has been accumulated over 20 years, with multiple data entry systems, multiple data migrations, and inconsistent data validation rules. The bank's data quality assessment concludes that 92% of customer records are "clean" (match a validation check). The bank considers this acceptable and does not invest in data remediation.

The bank deploys an agentic customer service system that uses the customer database to process inquiries and make decisions. The agent retrieves customer information, validates it, and answers questions like "What is my current balance?" or "When is my next payment due?" The agent also has authority to make account updates (change address, update phone number) based on customer requests.

The agent's reasoning is sophisticated: it checks for consistency between provided information and database records, asks clarifying questions when there is ambiguity, and escalates to human review when confidence is low. But the agent operates on 92% clean data, which is approximately 3.4 sigma (22,750 defects per million customer records).

When a customer calls to update their address, the agent retrieves the customer's existing address from the database. The agent is supposed to confirm the address before updating. But the existing address is already wrong (due to prior data errors). The agent, unaware that the baseline is corrupt, updates the customer's "confirmed" address. The customer is now even further from correct.

Additionally, when the agent needs to identify the correct customer (to avoid updating the wrong customer's record), it uses a fuzzy match on name and phone number. But the phone numbers in the database are corrupted for 8% of records. The agent occasionally updates the wrong customer's account.

The bank's complaint rate from customers increases. Customers report receiving bills at old addresses, missing payment notifications due to wrong phone numbers, and account confusion due to mixed-up records. Under consumer protection regulations (ECOA, Fair Lending laws), banks must maintain accurate customer information and provide accurate account servicing. Sending a bill to the wrong address is a violation. The bank's defense ("Our agent's reasoning was sound") is insufficient if the agent operated on corrupt data.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 3 Data quality issues are detectable through data audits and analysis, but many organizations do not measure data quality in sigma terms and may not recognize that 92% accuracy is a low sigma ceiling.
A - Autonomy Sensitivity 2 Both autonomous and supervised agents are constrained by input data quality. Humans are also constrained, but may have heuristics or skip steps to compensate.
M - Multiplicative Potential 2 Data quality issues affect individual decisions, not cascades. But the impact is systematic across all decisions.
A - Attack Surface 5 Any agent operating on enterprise data is exposed. Most enterprise data is at 3-4 sigma.
G - Governance Gap 4 Data quality governance exists (data stewardship, data governance committees) but is often disconnected from agent governance. Agent deployment decisions are not tied to data quality assessments.
E - Enterprise Impact 4 Operating on low-sigma data results in systematic customer impact (wrong information, wrong decisions) and regulatory compliance violations.
Composite DAMAGE Score 3.4 High. Requires proactive governance controls and data quality assessment before agent deployment.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type Impact How This Risk Manifests
Digital Assistant Low Humans compensate for data quality through skepticism and verification.
Digital Apprentice Medium Limited autonomy; data quality issues affect a narrow scope of decisions.
Autonomous Agent High Autonomous decisions on low-sigma data without human verification.
Delegating Agent High Data quality issues are propagated through delegation chains.
Agent Crew / Pipeline Critical Data quality issues compound at each agent handoff.
Agent Mesh / Swarm Critical Low-sigma data is consumed across the mesh.

Regulatory Framework Mapping

Framework Coverage Citation What It Addresses What It Misses
ECOA Addressed Accurate information for credit decisions and account servicing Information accuracy. Agent operation on low-sigma data.
GLBA Section 501 Addressed Accuracy of customer information maintained by financial institutions Customer information accuracy. Agent-driven inaccuracy in customer data.
GDPR Article 5 Addressed Data accuracy and integrity Data accuracy. Agent propagation of inaccurate data.
Dodd-Frank Section 1681 Addressed Accuracy of information used in regulated decisions Information accuracy in credit decisions. Agent operation on inaccurate data.
ISO 42001 Partial Section 8.2, Data quality and governance Data quality. Agent operation on low-sigma data.

Why This Matters in Regulated Industries

Regulators expect institutions to maintain accurate data. When an agent operates on inaccurate data and makes decisions or updates based on that inaccurate data, the institution has failed to meet this expectation. The agent's sophistication and governance are irrelevant if the underlying data is corrupt.

The regulatory response is to mandate data quality remediation as a prerequisite for agent deployment. Regulators will ask: "What is the sigma level of your data? Have you measured it? Can you demonstrate that your data is fit for use in automated decision-making?" If the answer is "no," or if data quality is low, regulators will prohibit or severely constrain agent autonomy until data quality is improved.

Controls & Mitigations

Design-Time Controls

  • Before deploying an agent that will operate on production data or make decisions based on data, conduct a comprehensive data quality assessment using sigma methodology. Measure the DPMO of critical data fields.
  • Establish a minimum sigma threshold for agent deployment: recommend 4 sigma for autonomous decision-making, 3.5 sigma for human-supervised decisions.
  • If data quality is below the threshold, implement data remediation as a prerequisite for agent deployment. Do not deploy agents as a substitute for fixing data quality.
  • Document the data quality constraints in the Agent Registry (Component 1): agents are authorized only for decisions where the source data meets the sigma threshold.

Runtime Controls

  • Implement data quality monitoring at the point where agents consume data. When an agent retrieves customer data, check a sample of that data for quality issues.
  • Deploy anomaly detection on data consumed by agents: if an agent's input data distribution suddenly changes, flag this as potential data quality degradation and investigate.
  • Implement data validation gates: agents are not authorized to make decisions on data fields that fail basic validation checks. If a customer address is missing a ZIP code, the agent cannot process an address change without human confirmation.

Detection & Response

  • Maintain a data quality dashboard that tracks sigma levels for critical data fields. Escalate if any field drops below the established threshold.
  • When agent decision quality degrades (detected through monitoring or complaints), investigate whether the root cause is data quality degradation.
  • Establish a quarterly data quality review: jointly, the data governance and agent governance teams assess whether data quality remains sufficient for the current deployment of agents.

Related Risks

Address This Risk in Your Institution

Data Sigma Ceiling requires structured data quality governance integrated with agent deployment decisions. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing