Six Sigma for Agentic AI

Every regulatory framework demands that AI systems be accurate, robust, and reliable. But none provides a quantitative methodology for measuring the actual quality level of an agentic process in a way that is comparable across systems, trackable over time, and meaningful to risk and operations executives. Institutions are left with qualitative assessments — “high,” “medium,” “low” — that do not support rigorous risk management.

Lean Six Sigma has been the standard methodology for measuring and improving process quality in manufacturing and financial services for decades. Its core metric, the sigma level, quantifies how many defects a process produces per million opportunities. A 6-sigma process produces 3.4 defects per million. Banking operations typically target 5–6 sigma for critical processes. The methodology is well understood by COOs, CROs, and boards.

Corvair applies this proven methodology to a domain where it has not yet been systematically used: agentic AI.

Sigma curve or DMAIC diagram adapted for AI governance

Three Sigma Levels

Data Sigma

The quality level of the data inputs that feed agent decisions. We define curated rules (completeness, accuracy, timeliness, consistency, validity) and measure each data source against those rules.

In practice, most raw enterprise data sets score below 3.5 sigma — more than 22,000 defects per million data points. Master data with dedicated stewardship typically achieves 4–5 sigma. Unstructured data and signals are often lower still.

Process Sigma

The repeatability and reliability of the agentic process itself. Given the same inputs, does the agent produce the same outputs? For deterministic software, always yes. For agentic AI, often no.

LLM-based agents introduce randomness through temperature settings, context window variability, and non-deterministic retrieval. Current agentic processes typically operate at 1–1.5 sigma — correct output less than 70% of the time in complex workflows.

Agent Sigma

When agents operate in sequence or coordinate with other agents, each introduces measurement error and execution uncertainty. The compounding effect is multiplicative, not additive.

A portfolio risk agent relying on a scoring agent relying on a market data agent inherits the error characteristics of each upstream component. Agent sigma can never exceed process sigma.

The Sigma Constraint Chain

These three sigma ratings are not independent. They form a constraint chain: the data sigma constrains the process sigma, and the process sigma constrains the effective agent sigma. Each layer inherits the quality ceiling of the layer below it and adds its own variability on top.

Consider a typical enterprise scenario: an institution runs its core operations at near-six-sigma reliability, as banking regulators expect. But the data feeding its agentic AI systems sits at 3.5 sigma because it mixes curated warehouse data with raw lake data and real-time API feeds. The agentic processes built on that data operate at perhaps 2 sigma because multi-step reasoning introduces additional variability at each step. And the agents themselves, with their inherent non-determinism, add another layer of degradation, producing effective reliability closer to 1–1.5 sigma for complex workflows.

You cannot run a six-sigma business on three-sigma data, five-sigma processes, and two-sigma agents. The weakest link in the chain sets the ceiling for everything above it.

Sigma Levels in Context

Sigma Level	Defects/Million	Yield	Typical Application
6σ	3.4	99.9997%	Target for critical banking operations
5σ	233	99.977%	Well-governed master data, mature automated processes
4σ	6,210	99.38%	Typical structured enterprise data after quality processes
3.5σ	22,750	97.73%	Most raw enterprise data without curation
2σ	308,537	69.15%	Simple agentic processes with moderate agency
1–1.5σ	500,000–690,000	31–50%	Complex multi-step agentic processes (current state)

DMAIC Applied to Agentic AI

We apply the DMAIC cycle from Lean Six Sigma to systematically measure and improve agentic AI governance quality:

Define

Establish critical-to-quality characteristics for each agentic process: what constitutes a correct output, what are the acceptable tolerances, and what are the defect categories (wrong answer, hallucination, scope creep, missed constraint, stale premise). Stakeholders declare the agent's mission and intent, authorised use cases, operational boundaries, approved capabilities, and policy thresholds.

Measure

Calculate current data sigma (input quality) and process sigma (output repeatability) for each agentic workflow. Establish baselines. Identify which data tiers contribute most to output variability. Measure blast radius and risk vectors.

Analyse

Evaluate root causes of defects. Is the agent reasoning on stale data (epistemic drift)? Is it consuming low-sigma signals from other models (compounding uncertainty)? Is the process design itself introducing variability? Are the SCAR scores correlated with defect rates?

Improve

Apply targeted controls: upgrade data sources from unstructured to structured where possible, add validation checkpoints at high-variability steps, implement epistemic drift detection at critical reasoning junctions, calibrate controls to restrict agent authority when data sigma falls below threshold.

Control

Continuously monitor both data sigma and process sigma with tiered alerting thresholds. When data sigma degrades, dependent agentic processes are automatically flagged for review. When process sigma falls below minimum threshold, the process is escalated to human oversight or suspended.

Lean Principles: Operational Waste & Mistake-Proofing

Operational Waste (Muda)

A novel aspect of our methodology is the characterisation of agent risk in terms of operational waste — quantifiable metrics that transform abstract risk into measurable waste that governance can systematically eliminate:

Permission Waste — excess authority granted beyond what is strictly necessary for the agent's mission
Capability Waste — latent risk of unused inherent capabilities (code execution, network access)
Exposure Waste — overly broad invocation policies versus the necessary set of contexts
Transport Waste — risk created by unintended data movement through tool and system bridges
Defect Waste — the operational cost of out-of-policy actions, runtime errors, and mission failures

Mistake-Proofing (Poka-Yoke)

Pre-deployment controls that prevent defects before they occur. Governance gating in CI/CD pipelines blocks non-conformant builds before they reach production. Runtime controls prevent unsafe actions at the point of execution. The shift from reactive incident response to preventative mistake-proofing — from detecting violations to making violations structurally unlikely.

The Agency-Reliability Tradeoff

The industry's current response to low process sigma is to remove agency: constrain autonomy, limit steps, pre-define tool sequences, hard-code decision boundaries. This works for simple tasks but defeats the purpose of agentic AI. The institutions that solve this tradeoff — achieving acceptable process sigma while preserving meaningful agency — will capture the value of agentic AI. Those that cannot will be limited to sophisticated chatbots.

Corvair's DMAIC approach works from the bottom up: starting with data sigma improvement, which raises the ceiling for process sigma, which in turn raises the ceiling for agent sigma.

When Corvair walks into a bank and says “your agentic credit decisioning process is operating at 1.5 sigma — it produces the wrong output roughly half the time in complex cases — and here's the data: your data quality is at 3.5 sigma, your process repeatability is at 2 sigma, and your agents are adding another half-sigma of degradation,” that is a fundamentally different conversation from “you need better AI governance.” It is quantified, layered, and actionable.

The Compound Error Problem

In a multi-step agent workflow, errors compound exponentially. At 99% accuracy per step, a 100-step workflow succeeds only 36.6% of the time. This is the compound error problem, and it is the mathematical reason why individual agent accuracy alone cannot deliver enterprise reliability.

Workflow Steps	Accuracy per Step	Cumulative Success Rate
10 steps	99%	90.4%
50 steps	99%	60.5%
100 steps	99%	36.6%
100 steps	99.5%	60.6%

Reducing per-step error from 1% to 0.5% improves a 100-step workflow from 36.6% to 60.6% success. But the real breakthrough comes from parallel redundancy.

Consensus Voting: The Six Sigma Solution

Rather than relying on a single agent, deploy multiple independent agents on the same input. Majority voting or Bayesian consensus on the output dramatically reduces hallucination and error propagation.

Three agents at 95% individual accuracy produce a consensus accuracy of 99.28%. Thirteen agents at 95% accuracy achieve 3.4 DPMO (Six Sigma quality). Applied to a 100-step workflow, this yields a 99.97% success rate, compared to 36.6% for a single 99% agent.

The compound error problem is not solved by making one agent smarter. It is solved by making the system more redundant and measuring quality with the same rigour banking operations already use.

Consensus voting is most valuable for high-risk decisions (spending authority, loan approvals, medical recommendations), long workflows exceeding 50 steps, and any process subject to regulatory audit.

Digital Waste in Agentic Systems

Lean principles identify waste (muda) as any activity that consumes resources without creating value. In agentic AI systems, three forms of digital waste are particularly damaging:

Permission Waste: Overly broad permissions granted to agents that exceed operational need. Excess authority creates unnecessary blast radius. Every standing permission that is not actively required is waste.
Capability Waste: Agents deployed with insufficient training, context, or tooling for the decisions they are asked to make. The agent attempts the task, fails, and the failure must be detected, investigated, and remediated. The entire cycle is waste.
Transport Waste: Unnecessary inter-agent communication and context switching. When agents pass data through intermediaries that add no value, each hop introduces latency, context loss, and error opportunity.

DMAIC provides the framework for identifying, measuring, and eliminating these forms of waste. Define the process boundaries. Measure the defect rate. Analyse the root causes. Improve by redesigning the agent architecture. Control by monitoring sigma over time.

Measure Your AI Governance Quality

Our Readiness Assessment establishes data sigma and process sigma baselines for your agentic workflows and identifies where to invest first.

Schedule a Briefing View Readiness Assessment