Why 99% accuracy per step does not mean 99% workflow success.
In a multi-step agent workflow, errors compound exponentially. Each step introduces a small probability of failure, and those probabilities multiply across the entire chain. The formula is straightforward:
Success Rate = (accuracy)steps
The implications are not intuitive. A system that is 99% accurate at each individual step — a level most teams would consider excellent — fails more often than it succeeds across a 100-step workflow.
| Workflow Steps | Accuracy per Step | Success Rate |
|---|---|---|
| 10 | 99% | 90.4% |
| 50 | 99% | 60.5% |
| 100 | 99% | 36.6% |
| 100 | 95% | 0.59% |
| 100 | 99.5% | 60.6% |
The key insight: reducing per-step error from 1% to 0.5% improves 100-step workflow success from 36.6% to 60.6%. Small improvements in per-step accuracy produce outsized gains in end-to-end reliability.
The model generates plausible but incorrect outputs, fabricates facts, or follows flawed logical chains. These errors propagate downstream as subsequent steps treat them as ground truth.
Missing fields, stale records, inconsistent formats, and incomplete context. The agent reasons correctly from incorrect or partial data, producing confident but wrong conclusions.
When one agent delegates to another, critical context is lost in translation. The receiving agent operates with an incomplete picture, making decisions that would be different with full information.
Race conditions, stale reads, and ordering dependencies. Parallel agents may act on data that another agent has already modified, creating inconsistent state.
The agent's internal model of reality gradually diverges from actual reality. Assumptions valid at the start of a workflow may no longer hold by the time later steps execute. Learn more →
Agents exceeding their granted authority, acting outside approved boundaries, or accumulating permissions across steps that were never intended to be combined. Learn more →
The most effective mitigation for compound error is not building a single, more accurate agent. It is deploying multiple independent agents on the same input and using majority voting to determine the output.
The mathematics of consensus voting are powerful. When three independent agents each operate at 95% individual accuracy, the probability that a majority produces the correct answer rises to 99.28%. This is because two or more agents must fail simultaneously for the consensus to be wrong.
Scale this further: thirteen agents at 95% individual accuracy achieve 3.4 DPMO (defects per million opportunities) — the threshold for Six Sigma quality. Applied to a 100-step workflow, this produces a 99.97% end-to-end success rate.
“The compound error problem is not solved by making one agent smarter. It is solved by making the system more redundant.”
Consensus voting is most effective when combined with techniques that improve individual agent accuracy first:
The best approach is layered: optimize individual accuracy via RAG and FMOps first, then layer consensus voting for critical decision points. This produces reliable systems without the cost of running consensus voting on every step.
See how this connects to our Six Sigma measurement framework and Center of Excellence advisory service.
See how Six Sigma measurement and consensus architectures transform agent reliability from aspirational to measurable.