All agents fail simultaneously if a single provider has an outage. DORA concentration risk requirements apply to critical AI service providers.
Most institutions using agentic AI rely on a small number of large model providers (OpenAI, Anthropic, Google, Meta, etc.). Many institutions have strong preference for a single provider due to cost, performance, integration, or contractual reasons. This creates concentration risk: if a single provider experiences an outage or failure, all agents relying on that provider fail simultaneously.
This is different from traditional software concentration risk. Traditional software allows fallback: if a database server fails, traffic routes to replica servers. If an API fails, traffic routes to backup APIs. Large language model APIs are single points of failure. There is no automatic fallback. If OpenAI API is unavailable, all agents using OpenAI fail. There is no instant replica available. The institution cannot instantaneously migrate to a different provider because switching models requires prompt retuning, inference pattern changes, and testing.
The concentration risk is amplified by provider outage severity. Large model providers have experienced multi-hour outages affecting thousands of organizations. During such outages, all dependent agents fail. An institution relying on a single provider for critical functions (fraud detection, credit decision support, AML analysis) experiences critical system failure when the provider has an outage.
A bank deploys agents for multiple critical functions: credit underwriting, fraud detection, AML transaction monitoring, and customer service. All agents use GPT-4 API from OpenAI. The bank standardized on OpenAI because of cost efficiency, performance, and internal familiarity. The bank does not maintain alternative models or providers.
OpenAI experiences a severe outage (e.g., a region goes down, causing cascading failures). All APIs are unavailable for 4 hours. All bank agents fail simultaneously. The bank's credit underwriting system cannot process new applications. Fraud detection agents are offline; suspicious transactions are not detected during the outage. AML monitoring agents are offline; SAR generation is paused. Customer service agents cannot respond to customer inquiries.
The bank's risk management team discovers the outage is a provider issue, not a local system issue. Recovery depends on OpenAI restoring service. The bank can do nothing but wait. After 4 hours, service is restored. The bank's agents come back online. But during the 4-hour period, the institution was unable to perform critical functions.
A regulator later reviews the bank's operational resilience controls and discovers the single-provider dependency. The regulator issues a finding that the bank has inadequate concentration management. The regulator requires the bank to implement a multi-provider strategy or maintain fallback models. The bank must now invest in alternative model integration and testing, which is expensive.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 1 | Concentration risk is detectable through design review. Not hidden; clearly visible in architecture. |
| A - Autonomy Sensitivity | 1 | Not related to autonomy; structural to model provider choice. |
| M - Multiplicative Potential | 5 | Single provider outage affects all agents simultaneously. Scope is maximum. |
| A - Attack Surface | 2 | Not weaponizable by external actors directly; provider outages are not typically caused by attacks (though possible). |
| G - Governance Gap | 5 | DORA and operational resilience frameworks explicitly require concentration risk management. Current practice violates these requirements. |
| E - Enterprise Impact | 5 | Critical system outage affecting multiple business functions simultaneously. Operational impact is severe. |
| Composite DAMAGE Score | 4.1 | Critical. Requires immediate architectural controls. Cannot be accepted. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | High | User cannot use assistant during provider outage. |
| Digital Apprentice | High | Agent is unavailable during provider outage. |
| Autonomous Agent | Critical | Fully autonomous agent fails, causing systemic impact on dependent processes. |
| Delegating Agent | Critical | Agent cannot delegate to provider model during outage. Entire delegation pipeline fails. |
| Agent Crew / Pipeline | Critical | All agents in crew fail simultaneously. Entire pipeline unavailable. |
| Agent Mesh / Swarm | Critical | Entire mesh fails simultaneously. Systemic outage. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| DORA | Addressed | Article 6, Article 15 | Explicitly requires management of concentration risk with critical service providers. Requires alternatives or fallback strategies. | Does not specifically mention AI/LLM providers yet. |
| Basel III | Partial | Third-Party Risk Principle | Addresses third-party concentration. | Does not specifically address LLM provider concentration. |
| MAS AIRG | Partial | Section 4 (Third-Party Risk) | Requires management of AI vendor concentration. | Does not specify technical requirements for multi-provider strategies. |
| NIST AI RMF 1.0 | Partial | GOVERN 2.3 | Recommends third-party management. | Does not specifically address provider concentration or fallback strategies. |
| SOX 404 | Partial | IT Controls | Addresses critical system controls. | Does not address provider concentration. |
Operational resilience is a fundamental requirement in financial services. Regulators expect institutions to remain operational even when third-party providers fail. An institution that cannot process credit applications, detect fraud, or monitor AML during a provider outage is not operationally resilient. Regulators will issue findings and require remediation.
Additionally, concentration risk is a prudential concern. If all major banks rely on the same AI provider, and that provider fails, the entire financial system could be disrupted. Regulators increasingly view provider concentration as a systemic risk and are mandating multi-provider or fallback strategies.
Model Provider Dependency and Concentration Risk requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing