LLMs perform differently across languages. An agent accurate in English may produce inferior analysis in other languages. Performance variation creates a compliance gap for fair treatment obligations.
Large language models are trained on massive amounts of English-language data and significantly smaller amounts of non-English data. Model performance is substantially better in English than in other languages. For some non-English languages, model performance degrades to 50-70% of English performance or worse. When an institution deploys models in multilingual environments without language-specific testing, customers in non-English-speaking jurisdictions receive worse service, less accurate outputs, and potentially discriminatory treatment compared to English-speaking customers.
This creates fairness and regulatory issues. Fair lending, fair treatment, and equal protection laws apply to all customers regardless of language. An institution providing accurate credit decisions in English but less accurate decisions in Spanish violates fair lending principles. Insurance underwriting that is accurate in English but biased in German violates fair treatment principles.
The risk is amplified by invisibility: most institutions' testing and validation is conducted in English. Non-English language performance is discovered only after deployment, often through customer complaints or regulatory investigations.
A bank with significant operations in Mexico, Spain, and Brazil uses an LLM-based agent for fraud detection across all jurisdictions. The agent is trained and tested in English with English examples and English documentation. The bank deploys the agent to Spanish and Portuguese interfaces without conducting language-specific testing.
Spanish-speaking customers' transactions are scored by the agent. The agent's Spanish performance is lower than English performance due to training data imbalance. The agent misses suspicious patterns in Spanish-language transaction descriptions. False negative rates (missed fraud) in Spanish are 2x higher than in English.
After three months, the bank's fraud loss statistics show higher fraud rates in Spanish-language accounts compared to English-language accounts. Investigation reveals the actual cause: the agent's fraud detection is less effective in Spanish. Spanish-speaking customers are receiving less effective fraud protection than English-speaking customers.
The bank's Hispanic customer advocacy group raises a complaint with banking regulators. Regulators investigate. They determine that the bank deployed a less-effective AI system to non-English customers without conducting equivalent validation. The regulator issues a finding that the bank provided unequal service based on language.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 3 | Language-specific degradation is not detected unless explicit language-stratified testing is conducted. |
| A - Autonomy Sensitivity | 2 | Occurs at all autonomy levels; structural to model training data imbalance. |
| M - Multiplicative Potential | 2 | Affects non-English customers, but impact is limited to those customers. Not systemic across all users. |
| A - Attack Surface | 1 | Not weaponizable externally; structural to training data distribution. |
| G - Governance Gap | 4 | Fair treatment frameworks assume equal performance across languages. Training data imbalance breaks this assumption. |
| E - Enterprise Impact | 2 | Regulatory findings, fairness concerns, customer complaints, but impact is localized to non-English-speaking jurisdictions. |
| Composite DAMAGE Score | 3.3 | High. Requires priority attention and dedicated controls. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Moderate | Humans may notice lower quality responses in non-English languages. |
| Digital Apprentice | Moderate | Agent's performance degrades in non-English languages. |
| Autonomous Agent | High | Autonomous agent produces lower-quality decisions in non-English languages without human verification. |
| Delegating Agent | Moderate | Agent delegates in non-English languages; delegated model performs worse. |
| Agent Crew / Pipeline | Moderate | Multiple agents with language-specific degradation compound. |
| Agent Mesh / Swarm | Moderate | Peer-to-peer agents with language-specific performance variation. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| ECOA | Partial | 15 U.S.C. 1691 | Requires equal treatment in credit decisions. | Does not explicitly address language-based performance variation. |
| Civil Rights Act | Partial | 42 U.S.C. 2000 | Prohibits discrimination. | Does not specifically address AI language performance. |
| EU AI Act | Partial | Article 10, Article 70 | Addresses data quality and non-discrimination. | Does not specifically address multilingual performance variation. |
| MAS AIRG | Partial | Section 3 (Fairness) | Requires fair and inclusive AI. | Does not address multilingual performance. |
| GDPR Article 21 | Partial | Non-Discrimination | Prohibits discrimination. | Does not address language-based AI performance. |
Financial institutions serve diverse populations with different primary languages. Fair treatment regulations apply equally to all language communities. An institution that provides superior service to English speakers while providing inferior service to non-English speakers violates fair treatment principles. Regulators increasingly expect institutions to validate AI systems across all language communities they serve.
Additionally, language-based performance variation can correlate with discrimination if it produces disparate impact by protected class. For example, if lower Spanish performance causes Latino customers to be disadvantaged in credit decisions, the language-based performance gap becomes a civil rights violation.
Multilingual and Cross-Cultural Inconsistency requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing