R-MC-01 Multi-Agent & Coordination DAMAGE 4.2 / Critical

Compound Error Propagation

Sequential agent steps degrade system accuracy exponentially, reaching unacceptable reliability below critical regulatory thresholds despite each agent exceeding individual SLA targets.

The Risk

In multi-step agent workflows, errors compound exponentially rather than linearly. An agent workflow with 50 steps, each executing at 99% accuracy, achieves only 60.5% end-to-end success. At 100 steps, accuracy drops to 36.6%. No individual agent is "wrong" (each meets its SLA) but the composite system fails to deliver compliant outcomes. This is not equivalent to a single agent with 60% accuracy; rather, it represents a fundamental reliability collapse across the workflow path, where every downstream process depends on all upstream agents succeeding.

In regulated industries, this creates an asymmetry: compliance frameworks assess individual decision points but do not adequately measure compound degradation. A loan approval workflow may require 12 sequential agent steps (data validation, identity verification, risk scoring, fraud detection, regulatory screening, credit assessment, pricing calculation, documentation assembly, regulatory check, disclosure generation, audit trail logging, archival routing). If each agent operates at 99% accuracy, the probability that all 12 succeed is 88.6%. The remaining 11.4% of loan applications contain undetected errors. At scale, this translates to thousands of regulatory violations annually in a large financial institution.

The challenge is not malfunction; it is the statistical certainty that sequential processes accumulate error. In systems where human review can catch individual mistakes, agent-to-agent handoffs lose this safety mechanism because agents do not perform the human act of stopping to verify context before delegating.

How It Materializes

A mid-market investment bank deploys agentic workflows for commercial mortgage-backed securities (CMBS) issuance. The workflow requires 15 sequential steps: property data ingestion, document verification, title validation, appraisal review, environmental screening, borrower verification, financial statement analysis, ratio calculation, stress testing, pricing derivation, prospectus generation, regulatory disclosure compilation, SEC filing assembly, investor notification, and post-issuance reporting. Each step is owned by a specialized agent trained on historical patterns in that bank's CMBS operations.

Over three months, the bank executes 500 CMBS transactions through the workflow. Each individual agent reports greater than 99% accuracy on test datasets. But regulators conduct a post-issuance compliance audit and find that 47 of 500 prospectuses contain material misstatements: misstated loan-to-value ratios, incorrect borrower credit ratings, or missing borrower certifications. No single agent "failed" (each operated within acceptable tolerances) but the compounding of small errors created systematic compliance risk.

The bank now faces SEC enforcement action for filing deficient prospectuses (violation of Securities Act Section 12(b)), potential Dodd-Frank penalties for inadequate securitization representations (Regulation AB), and investor lawsuits under securities laws. The bank cannot defend itself by pointing to individual agent accuracy because the fiduciary duty is to the aggregate prospectus quality, not to component steps.

Additionally, because agents executed the workflow without human checkpoints between steps, the error pattern propagated undetected. A human reviewer checking appraisals would have flagged methodology differences; a human reconciling stress tests would have caught pricing inconsistencies. The agents operated in isolation, each assuming prior agents were correct.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 2 Errors emerge at workflow completion or post-hoc audit. Early detection requires systematic sampling and cross-step verification. Difficult to attribute to specific agent.
A - Autonomy Sensitivity 4 Exponential with workflow length and agent autonomy. Fully autonomous multi-step processes guarantee compounding. Human-in-the-loop at step N reduces exponent.
M - Multiplicative Potential 4 Affects every transaction processed through the workflow. 500 transactions at 10% error rate equals 50 failed outputs. Scales with deployment volume.
A - Attack Surface 2 Not directly exploitable, but operator error or inadequate monitoring creates opportunity. Not an attack vector; a failure mode.
G - Governance Gap 4 Compliance frameworks assess individual agents, not composite workflows. SLA requirements are per-agent, not per-transaction. No aggregate accuracy requirement in regulation.
E - Enterprise Impact 3 Affects transaction quality, regulatory compliance, and customer trust. Does not directly impact confidentiality or availability.
Composite DAMAGE Score 4.2 Critical. Requires immediate architectural controls. Cannot be accepted.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type Impact How This Risk Manifests
Digital Assistant Low Human reviews each step before delegation. Compounding is arrested by human gatekeeping.
Digital Apprentice Low Developmental governance includes step-wise validation. Error budget is tight; agents defer to human on uncertainty.
Autonomous Agent Medium Operates within boundaries but executes multi-step workflows independently. Compounding emerges within agent autonomy sphere.
Delegating Agent High Invokes external tools/APIs for each step. Each invocation introduces latency variability and error potential. Compounding amplified by external dependency uncertainty.
Agent Crew / Pipeline Critical Multiple agents in sequential workflow by design. Compounding is inherent architectural risk. Each new agent in pipeline increases exponent.
Agent Mesh / Swarm Critical Dynamic peer-to-peer delegation means unpredictable workflow length and step order. Exponent becomes variable; worst-case compounding is severe.

Regulatory Framework Mapping

Framework Coverage Citation What It Addresses What It Misses
NIST AI RMF 1.0 Partial MAP 5.1, MEASURE 5.2 Risk mapping and measurement of AI performance. Composite system reliability; compounding error models.
EU AI Act Minimal Article 8, 9, 15 Risk assessment and conformity. Sequential error propagation; multi-agent reliability.
MAS AIRG Partial Risk Governance, Model Risk Model-level risk controls and governance framework. System-level accuracy requirements for workflows.
NIST GenAI Profile Minimal GEN-06 AI system performance characteristics. Reliability testing for multi-step agentic processes.
Dodd-Frank / Securities Partial SEC Rules 10b-5, 12(b) Issuer obligations for disclosure accuracy and completeness. Agent-generated document accuracy requirements.
OWASP Agentic Top 10 Not Directly Focus on security and injection attacks. Reliability compounding and accuracy cascades.
NIST CSF 2.0 Minimal PR.ST-1, DE.CM-1 Supply chain risk and monitoring. Multi-step workflow reliability and compounding error.

Why This Matters in Regulated Industries

In banking and capital markets, transaction accuracy is non-negotiable. A prospectus with incorrect loan documentation is not a "near miss": it is a violation. A loan application with compound errors in fraud scoring creates unaccounted risk on the balance sheet. Insurance companies underwriting policies through agentic workflows that compound errors are knowingly underwriting risk they did not measure.

The regulatory consequence is not proportional to individual agent failure; it is tied to the composite output quality. SEC enforcement, OCC guidance, and insurance regulators (NAIC) hold institutions liable for systemic accuracy failures, regardless of whether those failures stem from a single agent or from error compounding. An institution that deploys agents without modeling compound error is essentially deploying an unvalidated underwriting process.

Additionally, compound error creates a fairness and discrimination risk. If agents making sequential decisions about applicants (identity verification, income assessment, creditworthiness evaluation) each perform with small bias, the composite bias on credit approval decisions can be significantly larger than the individual biases. This amplified bias is not captured by examining individual agents in isolation.

Controls & Mitigations

Design-Time Controls

  • Build and validate aggregate workflow accuracy models before deployment. Do not assume that 99% per-step agents yield acceptable composite reliability. Use probabilistic models (e.g., Bayesian networks) to predict end-to-end failure rates for each workflow configuration.
  • Implement step-wise verification gates where high-consequence outcomes require human review or independent re-verification before proceeding to the next step.
  • Establish maximum workflow length policies and require shorter workflows for high-risk decision types. Limit CMBS issuance workflows to maximum 10 sequential agent steps; re-architect longer workflows to reduce steps through parallel processing where feasible.
  • Leverage Component 7 (Composable Reasoning) to enable agents to operate on composite reasoning outputs from prior agents, including uncertainty bounds and confidence scores, rather than point estimates.

Runtime Controls

  • Monitor cumulative accuracy in real time by tracking error rates per transaction path, not per agent. Aggregate transaction-level outcomes and compare to baseline.
  • Implement rollback and retry mechanisms at step boundaries. If a downstream agent detects inconsistency with upstream output, cascade a rollback request and retry the upstream step with expanded context or human involvement.
  • Use the Blast Radius Calculator to model the propagation impact of errors detected at downstream steps.
  • Maintain audit trails with explicit handoff records at each step, including agent outputs, confidence scores, and any contextual assumptions.

Detection & Response

  • Conduct systematic sampling of completed transactions post-hoc. Sample 5-10% of transactions across each workflow and perform independent verification against source data.
  • Implement anomaly detection on workflow completion metrics. Track the distribution of error-free transaction rates by workflow type.
  • Establish post-audit feedback loops that feed discovered errors back into the workflow model.
  • Use the Kill Switch to halt workflow execution if cumulative error rate at any step exceeds policy thresholds, with automatic escalation to human review.

Related Risks

Address This Risk in Your Institution

Compound Error Propagation requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing