No sigma-level quality measurement exists for the agentic process. Organization cannot quantify whether the agent is operating at 2 sigma or 4 sigma.
If you cannot measure quality, you cannot manage it. Traditional business processes are measured using Six Sigma metrics: first-pass yield, defects per million opportunities, cycle time, and more. These metrics allow organizations to understand the current state of the process, set targets, and track improvement.
Many organizations deploying agents do not measure agent quality using sigma methodology. They might measure "accuracy" (percentage of correct outputs) or "customer satisfaction," but these are not the same as sigma. An agent with 95% accuracy is operating at approximately 3.4 sigma (22,750 DPMO). An organization that reports "our agent is 95% accurate" may not realize that 95% accuracy is a mediocre performance level by Six Sigma standards.
The governance gap is: "We do not have a structured, measurable quality target for the agent. We assume the agent is good enough because humans reviewed it and thought it was fine. But we cannot quantify how good is 'good enough.'"
A payments company deploys an agentic dispute resolution system to process customer disputes about transactions. The agent is designed to: (1) receive a dispute, (2) gather evidence from transaction logs and chargeback networks, (3) assess the likelihood that the customer's claim is valid, and (4) recommend a resolution (credit the customer's account, deny the dispute, or escalate for manual review).
The company's quality team tests the agent on a sample of 100 disputes with known outcomes (disputes that were previously resolved by human specialists). The agent's recommendations match the specialist's resolutions 88 times out of 100. The quality team concludes: "The agent is 88% accurate. This is acceptable; we will deploy it."
The company deploys the agent to production. The agent processes thousands of disputes per month. But the company has no ongoing quality measurement. The quality team does not track how often the agent's recommendations match the specialists' prior recommendations, or how often customers challenge the agent's resolutions.
After 6 months, a regulatory audit occurs. The regulator asks: "What is the sigma level of your dispute resolution process?" The company does not know. The regulator asks: "How many defects per million opportunities does your agent produce?" The company does not know. The regulator asks: "How do you know the agent is operating at an acceptable quality level?" The company responds: "We tested it and got 88% accuracy."
The regulator is not satisfied. Under consumer protection regulations, disputes must be resolved fairly and accurately. An 88% accuracy rate (3.1 sigma) means that 1 in 8 disputes may be resolved incorrectly. The regulator cites the company for inadequate quality measurement and inadequate controls.
The company is required to implement sigma-level quality measurement, establish a quality target, and measure against that target. Retroactive review of the agent's prior resolutions reveals that approximately 12% were potentially incorrect. The company must contact thousands of customers to inform them of potential errors and offer remediation.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 5 | If measurement does not exist, the gap is not detectable until a regulator or audit uncovers it. |
| A - Autonomy Sensitivity | 4 | Measurement absence is particularly acute for autonomous agents that operate without human oversight. |
| M - Multiplicative Potential | 4 | Without measurement, quality degradation can accumulate undetected. |
| A - Attack Surface | 5 | Any agent without explicit sigma measurement is exposed. Most agents fall into this category. |
| G - Governance Gap | 5 | This is the core governance gap: agent quality is not measured using the same rigor as business processes. |
| E - Enterprise Impact | 4 | Operating without measurement means the organization does not know whether the agent is operating acceptably, leading to regulatory violations and customer harm. |
| Composite DAMAGE Score | 3.6 | High. Requires immediate implementation of sigma-level quality measurement for all agent deployments. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Low | Humans review outputs and notice quality issues. |
| Digital Apprentice | Medium | Limited autonomy; quality issues are bounded. |
| Autonomous Agent | Critical | Autonomous decisions with no sigma measurement. |
| Delegating Agent | Critical | Dynamic invocation with no end-to-end sigma measurement. |
| Agent Crew / Pipeline | Critical | No measurement of pipeline sigma. |
| Agent Mesh / Swarm | Critical | Distributed operation with no coordinated quality measurement. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| NIST AI RMF 1.0 | Addressed | Performance monitoring and measurement of AI systems | AI system measurement and monitoring. | Specific requirement for sigma-level quality measurement. |
| ISO 42001 | Partial | Section 8.5, Performance monitoring and measurement | Performance measurement. | Sigma methodology and DPMO measurement. |
| Dodd-Frank Section 165 | Addressed | Effective risk management and controls | Risk management effectiveness. | Measurement of AI system quality. |
| GLBA Section 501 | Addressed | Safeguards and security of customer information and operations | Operational safeguards. | Measurement of automated decision quality. |
| GDPR Article 22 | Addressed | Right to explanation and oversight of automated decisions | Oversight and explanation. | Measurement of automated decision quality. |
Regulators increasingly expect institutions to apply the same governance rigor to AI systems as they do to other critical processes. If an institution measures its transaction processing at 6 sigma, it should also measure its AI systems at sigma. If it does not, regulators interpret this as inadequate governance.
The regulatory expectation is clear: any system that makes decisions affecting customers must have a defined quality target and measured performance against that target. Measurement absence means the institution is flying blind.
Measurement Absence is a foundational governance gap that undermines all other quality controls. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing