Six Sigma for Agentic AI

Quantifying AI governance quality with the proven methodology that banking COOs and CROs already trust.

Every regulatory framework demands that AI systems be accurate, robust, and reliable. But none provides a quantitative methodology for measuring the actual quality level of an agentic process in a way that is comparable across systems, trackable over time, and meaningful to risk and operations executives. Institutions are left with qualitative assessments — “high,” “medium,” “low” — that do not support rigorous risk management.

Lean Six Sigma has been the standard methodology for measuring and improving process quality in manufacturing and financial services for decades. Its core metric, the sigma level, quantifies how many defects a process produces per million opportunities. A 6-sigma process produces 3.4 defects per million. Banking operations typically target 5–6 sigma for critical processes. The methodology is well understood by COOs, CROs, and boards.

Corvair applies this proven methodology to a domain where it has not yet been systematically used: agentic AI.

Three Sigma Levels

Data Sigma

The quality level of the data inputs that feed agent decisions. We define curated rules (completeness, accuracy, timeliness, consistency, validity) and measure each data source against those rules.

In practice, most raw enterprise data sets score below 3.5 sigma — more than 22,000 defects per million data points. Master data with dedicated stewardship typically achieves 4–5 sigma. Unstructured data and signals are often lower still.

Process Sigma

The repeatability and reliability of the agentic process itself. Given the same inputs, does the agent produce the same outputs? For deterministic software, always yes. For agentic AI, often no.

LLM-based agents introduce randomness through temperature settings, context window variability, and non-deterministic retrieval. Current agentic processes typically operate at 1–1.5 sigma — correct output less than 70% of the time in complex workflows.

Agent Sigma

When agents operate in sequence or coordinate with other agents, each introduces measurement error and execution uncertainty. The compounding effect is multiplicative, not additive.

A portfolio risk agent relying on a scoring agent relying on a market data agent inherits the error characteristics of each upstream component. Agent sigma can never exceed process sigma.

The Sigma Constraint Chain

These three sigma ratings are not independent. They form a constraint chain: the data sigma constrains the process sigma, and the process sigma constrains the effective agent sigma. Each layer inherits the quality ceiling of the layer below it and adds its own variability on top.

Consider a typical enterprise scenario: an institution runs its core operations at near-six-sigma reliability, as banking regulators expect. But the data feeding its agentic AI systems sits at 3.5 sigma because it mixes curated warehouse data with raw lake data and real-time API feeds. The agentic processes built on that data operate at perhaps 2 sigma because multi-step reasoning introduces additional variability at each step. And the agents themselves, with their inherent non-determinism, add another layer of degradation, producing effective reliability closer to 1–1.5 sigma for complex workflows.

You cannot run a six-sigma business on three-sigma data, five-sigma processes, and two-sigma agents. The weakest link in the chain sets the ceiling for everything above it.

Sigma Levels in Context

Sigma Level Defects/Million Yield Typical Application
3.4 99.9997% Target for critical banking operations
233 99.977% Well-governed master data, mature automated processes
6,210 99.38% Typical structured enterprise data after quality processes
3.5σ 22,750 97.73% Most raw enterprise data without curation
308,537 69.15% Simple agentic processes with moderate agency
1–1.5σ 500,000–690,000 31–50% Complex multi-step agentic processes (current state)

DMAIC Applied to Agentic AI

We apply the DMAIC cycle from Lean Six Sigma to systematically measure and improve agentic AI governance quality:

Define

Establish critical-to-quality characteristics for each agentic process: what constitutes a correct output, what are the acceptable tolerances, and what are the defect categories (wrong answer, hallucination, scope creep, missed constraint, stale premise). Stakeholders declare the agent's mission and intent, authorised use cases, operational boundaries, approved capabilities, and policy thresholds.

Measure

Calculate current data sigma (input quality) and process sigma (output repeatability) for each agentic workflow. Establish baselines. Identify which data tiers contribute most to output variability. Measure blast radius and risk vectors.

Analyse

Evaluate root causes of defects. Is the agent reasoning on stale data (epistemic drift)? Is it consuming low-sigma signals from other models (compounding uncertainty)? Is the process design itself introducing variability? Are the SCAR scores correlated with defect rates?

Improve

Apply targeted controls: upgrade data sources from unstructured to structured where possible, add validation checkpoints at high-variability steps, implement epistemic drift detection at critical reasoning junctions, calibrate controls to restrict agent authority when data sigma falls below threshold.

Control

Continuously monitor both data sigma and process sigma with tiered alerting thresholds. When data sigma degrades, dependent agentic processes are automatically flagged for review. When process sigma falls below minimum threshold, the process is escalated to human oversight or suspended.

Lean Principles: Operational Waste & Mistake-Proofing

Operational Waste (Muda)

A novel aspect of our methodology is the characterisation of agent risk in terms of operational waste — quantifiable metrics that transform abstract risk into measurable waste that governance can systematically eliminate:

Mistake-Proofing (Poka-Yoke)

Pre-deployment controls that prevent defects before they occur. Governance gating in CI/CD pipelines blocks non-conformant builds before they reach production. Runtime controls prevent unsafe actions at the point of execution. The shift from reactive incident response to preventative mistake-proofing — from detecting violations to making violations structurally unlikely.

The Agency-Reliability Tradeoff

The industry's current response to low process sigma is to remove agency: constrain autonomy, limit steps, pre-define tool sequences, hard-code decision boundaries. This works for simple tasks but defeats the purpose of agentic AI. The institutions that solve this tradeoff — achieving acceptable process sigma while preserving meaningful agency — will capture the value of agentic AI. Those that cannot will be limited to sophisticated chatbots.

Corvair's DMAIC approach works from the bottom up: starting with data sigma improvement, which raises the ceiling for process sigma, which in turn raises the ceiling for agent sigma.

When Corvair walks into a bank and says “your agentic credit decisioning process is operating at 1.5 sigma — it produces the wrong output roughly half the time in complex cases — and here's the data: your data quality is at 3.5 sigma, your process repeatability is at 2 sigma, and your agents are adding another half-sigma of degradation,” that is a fundamentally different conversation from “you need better AI governance.” It is quantified, layered, and actionable.

Measure Your AI Governance Quality

Our Readiness Assessment establishes data sigma and process sigma baselines for your agentic workflows and identifies where to invest first.

Schedule a Briefing View Readiness Assessment
Architecture-First Governance

The ten components that make policy enforceable in practice.

Agentic AI Risk Catalog

Six categories of risk specific to autonomous AI agents.

Agentic AI Workshop

Two-day workshop covering sigma measurement and DMAIC application.