Sequential agent steps degrade system accuracy exponentially, reaching unacceptable reliability below critical regulatory thresholds despite each agent exceeding individual SLA targets.
In multi-step agent workflows, errors compound exponentially rather than linearly. An agent workflow with 50 steps, each executing at 99% accuracy, achieves only 60.5% end-to-end success. At 100 steps, accuracy drops to 36.6%. No individual agent is "wrong" (each meets its SLA) but the composite system fails to deliver compliant outcomes. This is not equivalent to a single agent with 60% accuracy; rather, it represents a fundamental reliability collapse across the workflow path, where every downstream process depends on all upstream agents succeeding.
In regulated industries, this creates an asymmetry: compliance frameworks assess individual decision points but do not adequately measure compound degradation. A loan approval workflow may require 12 sequential agent steps (data validation, identity verification, risk scoring, fraud detection, regulatory screening, credit assessment, pricing calculation, documentation assembly, regulatory check, disclosure generation, audit trail logging, archival routing). If each agent operates at 99% accuracy, the probability that all 12 succeed is 88.6%. The remaining 11.4% of loan applications contain undetected errors. At scale, this translates to thousands of regulatory violations annually in a large financial institution.
The challenge is not malfunction; it is the statistical certainty that sequential processes accumulate error. In systems where human review can catch individual mistakes, agent-to-agent handoffs lose this safety mechanism because agents do not perform the human act of stopping to verify context before delegating.
A mid-market investment bank deploys agentic workflows for commercial mortgage-backed securities (CMBS) issuance. The workflow requires 15 sequential steps: property data ingestion, document verification, title validation, appraisal review, environmental screening, borrower verification, financial statement analysis, ratio calculation, stress testing, pricing derivation, prospectus generation, regulatory disclosure compilation, SEC filing assembly, investor notification, and post-issuance reporting. Each step is owned by a specialized agent trained on historical patterns in that bank's CMBS operations.
Over three months, the bank executes 500 CMBS transactions through the workflow. Each individual agent reports greater than 99% accuracy on test datasets. But regulators conduct a post-issuance compliance audit and find that 47 of 500 prospectuses contain material misstatements: misstated loan-to-value ratios, incorrect borrower credit ratings, or missing borrower certifications. No single agent "failed" (each operated within acceptable tolerances) but the compounding of small errors created systematic compliance risk.
The bank now faces SEC enforcement action for filing deficient prospectuses (violation of Securities Act Section 12(b)), potential Dodd-Frank penalties for inadequate securitization representations (Regulation AB), and investor lawsuits under securities laws. The bank cannot defend itself by pointing to individual agent accuracy because the fiduciary duty is to the aggregate prospectus quality, not to component steps.
Additionally, because agents executed the workflow without human checkpoints between steps, the error pattern propagated undetected. A human reviewer checking appraisals would have flagged methodology differences; a human reconciling stress tests would have caught pricing inconsistencies. The agents operated in isolation, each assuming prior agents were correct.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 2 | Errors emerge at workflow completion or post-hoc audit. Early detection requires systematic sampling and cross-step verification. Difficult to attribute to specific agent. |
| A - Autonomy Sensitivity | 4 | Exponential with workflow length and agent autonomy. Fully autonomous multi-step processes guarantee compounding. Human-in-the-loop at step N reduces exponent. |
| M - Multiplicative Potential | 4 | Affects every transaction processed through the workflow. 500 transactions at 10% error rate equals 50 failed outputs. Scales with deployment volume. |
| A - Attack Surface | 2 | Not directly exploitable, but operator error or inadequate monitoring creates opportunity. Not an attack vector; a failure mode. |
| G - Governance Gap | 4 | Compliance frameworks assess individual agents, not composite workflows. SLA requirements are per-agent, not per-transaction. No aggregate accuracy requirement in regulation. |
| E - Enterprise Impact | 3 | Affects transaction quality, regulatory compliance, and customer trust. Does not directly impact confidentiality or availability. |
| Composite DAMAGE Score | 4.2 | Critical. Requires immediate architectural controls. Cannot be accepted. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Low | Human reviews each step before delegation. Compounding is arrested by human gatekeeping. |
| Digital Apprentice | Low | Developmental governance includes step-wise validation. Error budget is tight; agents defer to human on uncertainty. |
| Autonomous Agent | Medium | Operates within boundaries but executes multi-step workflows independently. Compounding emerges within agent autonomy sphere. |
| Delegating Agent | High | Invokes external tools/APIs for each step. Each invocation introduces latency variability and error potential. Compounding amplified by external dependency uncertainty. |
| Agent Crew / Pipeline | Critical | Multiple agents in sequential workflow by design. Compounding is inherent architectural risk. Each new agent in pipeline increases exponent. |
| Agent Mesh / Swarm | Critical | Dynamic peer-to-peer delegation means unpredictable workflow length and step order. Exponent becomes variable; worst-case compounding is severe. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| NIST AI RMF 1.0 | Partial | MAP 5.1, MEASURE 5.2 | Risk mapping and measurement of AI performance. | Composite system reliability; compounding error models. |
| EU AI Act | Minimal | Article 8, 9, 15 | Risk assessment and conformity. | Sequential error propagation; multi-agent reliability. |
| MAS AIRG | Partial | Risk Governance, Model Risk | Model-level risk controls and governance framework. | System-level accuracy requirements for workflows. |
| NIST GenAI Profile | Minimal | GEN-06 | AI system performance characteristics. | Reliability testing for multi-step agentic processes. |
| Dodd-Frank / Securities | Partial | SEC Rules 10b-5, 12(b) | Issuer obligations for disclosure accuracy and completeness. | Agent-generated document accuracy requirements. |
| OWASP Agentic Top 10 | Not Directly | Focus on security and injection attacks. | Reliability compounding and accuracy cascades. | |
| NIST CSF 2.0 | Minimal | PR.ST-1, DE.CM-1 | Supply chain risk and monitoring. | Multi-step workflow reliability and compounding error. |
In banking and capital markets, transaction accuracy is non-negotiable. A prospectus with incorrect loan documentation is not a "near miss": it is a violation. A loan application with compound errors in fraud scoring creates unaccounted risk on the balance sheet. Insurance companies underwriting policies through agentic workflows that compound errors are knowingly underwriting risk they did not measure.
The regulatory consequence is not proportional to individual agent failure; it is tied to the composite output quality. SEC enforcement, OCC guidance, and insurance regulators (NAIC) hold institutions liable for systemic accuracy failures, regardless of whether those failures stem from a single agent or from error compounding. An institution that deploys agents without modeling compound error is essentially deploying an unvalidated underwriting process.
Additionally, compound error creates a fairness and discrimination risk. If agents making sequential decisions about applicants (identity verification, income assessment, creditworthiness evaluation) each perform with small bias, the composite bias on credit approval decisions can be significantly larger than the individual biases. This amplified bias is not captured by examining individual agents in isolation.
Compound Error Propagation requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing