Quality of input data constrains maximum achievable agent quality. Raw enterprise data at 3.5 sigma caps everything built on it.
Six Sigma methodology defines quality in terms of sigma, a measure of variation and defects per million opportunities (DPMO). A process operating at 6 sigma produces 3.4 defects per million opportunities. A process at 4 sigma produces 6,210 defects per million opportunities. Most business processes operate at 3 to 4 sigma.
Data quality is measured in the same framework. Data at 3.5 sigma means that for every million data points, approximately 22,750 are incorrect or out of specification. This might mean: 22,750 customer addresses are wrong, 22,750 transaction amounts are incorrectly recorded, 22,750 customer ages are false.
When agents operate on data, the agents' output quality is constrained by the input data quality. Even if the agent's reasoning is perfect (4 sigma process), if it operates on 3.5 sigma data, the agent's output cannot be better than 3.5 sigma. The agent propagates input data errors through its reasoning and decision-making.
The governance challenge is that data quality is often not measured using sigma methodology. Organizations might say "90% of our data is clean" (which sounds good) when 90% accuracy is actually 3.4 sigma (16,000 defects per million). Agents deployed on this "90% clean" data operate at a 3.4 sigma ceiling, regardless of how sophisticated the agent's reasoning is.
A regional bank has customer demographic data (names, addresses, phone numbers, employment details) stored in a legacy customer database. The data has been accumulated over 20 years, with multiple data entry systems, multiple data migrations, and inconsistent data validation rules. The bank's data quality assessment concludes that 92% of customer records are "clean" (match a validation check). The bank considers this acceptable and does not invest in data remediation.
The bank deploys an agentic customer service system that uses the customer database to process inquiries and make decisions. The agent retrieves customer information, validates it, and answers questions like "What is my current balance?" or "When is my next payment due?" The agent also has authority to make account updates (change address, update phone number) based on customer requests.
The agent's reasoning is sophisticated: it checks for consistency between provided information and database records, asks clarifying questions when there is ambiguity, and escalates to human review when confidence is low. But the agent operates on 92% clean data, which is approximately 3.4 sigma (22,750 defects per million customer records).
When a customer calls to update their address, the agent retrieves the customer's existing address from the database. The agent is supposed to confirm the address before updating. But the existing address is already wrong (due to prior data errors). The agent, unaware that the baseline is corrupt, updates the customer's "confirmed" address. The customer is now even further from correct.
Additionally, when the agent needs to identify the correct customer (to avoid updating the wrong customer's record), it uses a fuzzy match on name and phone number. But the phone numbers in the database are corrupted for 8% of records. The agent occasionally updates the wrong customer's account.
The bank's complaint rate from customers increases. Customers report receiving bills at old addresses, missing payment notifications due to wrong phone numbers, and account confusion due to mixed-up records. Under consumer protection regulations (ECOA, Fair Lending laws), banks must maintain accurate customer information and provide accurate account servicing. Sending a bill to the wrong address is a violation. The bank's defense ("Our agent's reasoning was sound") is insufficient if the agent operated on corrupt data.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 3 | Data quality issues are detectable through data audits and analysis, but many organizations do not measure data quality in sigma terms and may not recognize that 92% accuracy is a low sigma ceiling. |
| A - Autonomy Sensitivity | 2 | Both autonomous and supervised agents are constrained by input data quality. Humans are also constrained, but may have heuristics or skip steps to compensate. |
| M - Multiplicative Potential | 2 | Data quality issues affect individual decisions, not cascades. But the impact is systematic across all decisions. |
| A - Attack Surface | 5 | Any agent operating on enterprise data is exposed. Most enterprise data is at 3-4 sigma. |
| G - Governance Gap | 4 | Data quality governance exists (data stewardship, data governance committees) but is often disconnected from agent governance. Agent deployment decisions are not tied to data quality assessments. |
| E - Enterprise Impact | 4 | Operating on low-sigma data results in systematic customer impact (wrong information, wrong decisions) and regulatory compliance violations. |
| Composite DAMAGE Score | 3.4 | High. Requires proactive governance controls and data quality assessment before agent deployment. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Low | Humans compensate for data quality through skepticism and verification. |
| Digital Apprentice | Medium | Limited autonomy; data quality issues affect a narrow scope of decisions. |
| Autonomous Agent | High | Autonomous decisions on low-sigma data without human verification. |
| Delegating Agent | High | Data quality issues are propagated through delegation chains. |
| Agent Crew / Pipeline | Critical | Data quality issues compound at each agent handoff. |
| Agent Mesh / Swarm | Critical | Low-sigma data is consumed across the mesh. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| ECOA | Addressed | Accurate information for credit decisions and account servicing | Information accuracy. | Agent operation on low-sigma data. |
| GLBA Section 501 | Addressed | Accuracy of customer information maintained by financial institutions | Customer information accuracy. | Agent-driven inaccuracy in customer data. |
| GDPR Article 5 | Addressed | Data accuracy and integrity | Data accuracy. | Agent propagation of inaccurate data. |
| Dodd-Frank Section 1681 | Addressed | Accuracy of information used in regulated decisions | Information accuracy in credit decisions. | Agent operation on inaccurate data. |
| ISO 42001 | Partial | Section 8.2, Data quality and governance | Data quality. | Agent operation on low-sigma data. |
Regulators expect institutions to maintain accurate data. When an agent operates on inaccurate data and makes decisions or updates based on that inaccurate data, the institution has failed to meet this expectation. The agent's sophistication and governance are irrelevant if the underlying data is corrupt.
The regulatory response is to mandate data quality remediation as a prerequisite for agent deployment. Regulators will ask: "What is the sigma level of your data? Have you measured it? Can you demonstrate that your data is fit for use in automated decision-making?" If the answer is "no," or if data quality is low, regulators will prohibit or severely constrain agent autonomy until data quality is improved.
Data Sigma Ceiling requires structured data quality governance integrated with agent deployment decisions. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing