R-FM-05 Foundation Model & LLM DAMAGE 3.4 / High

Prompt Sensitivity and Brittleness

Small prompt changes cause disproportionate output changes. The input space is too large for exhaustive testing.

The Risk

Language models are extremely sensitive to prompt formulation. Small changes to prompts produce disproportionate changes in outputs. A prompt asking "What is the risk level?" might produce conservative estimates. The same question phrased "What are the upside opportunities?" might produce optimistic estimates. A prompt with detailed instructions produces different outputs than the same prompt with abbreviated instructions.

This prompt sensitivity is a fundamental property of LLMs. The input space (all possible prompts) is effectively infinite. An institution cannot exhaustively test all prompts to understand model behavior. Instead, institutions test a limited set of prompts and assume behavior generalizes to untested prompts. But the assumption is often wrong; untested prompts produce different outputs.

This creates brittleness: the agent's behavior is brittle to prompt changes. Small operational changes (different wording from operators, different examples included, different instruction formatting) cause output changes. The institution cannot predict which prompt changes will cause significant output changes and which will not.

How It Materializes

A bank trains teams to use an agent for compliance risk assessment. The original training prompt is: "Using a scale of 1-10, assess the financial crime risk of this customer. Consider transaction patterns, counterparty relationships, and jurisdictional factors. Provide a score and brief explanation."

Operators use this prompt and get consistent, reasonable risk scores. Over time, different operators make small modifications. Operator A adds "The score should be conservative to avoid missing risk." Operator B abbreviates the prompt to "Quickly assess financial crime risk 1-10. Transaction patterns, counterparty, jurisdiction. Score and why."

The original prompt produces average scores of 4.5 across customers. Operator A's version produces average scores of 6.2 (more conservative, due to the "avoid missing risk" instruction). Operator B's version produces average scores of 3.8 (less detailed, prompts less thorough analysis).

The bank's risk committee notices that some operators' customer risk assessments are systematically higher than others. Investigation reveals the actual cause: different operators are using different prompts. The agent's outputs vary significantly based on prompt formulation. The bank must standardize prompts and retrain operators. But the incident reveals the system is brittle: output quality depends critically on prompt formulation, and this formulation is difficult to control across many operators.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 3 Prompt sensitivity is difficult to detect unless outputs are explicitly monitored for prompt-change-driven variance.
A - Autonomy Sensitivity 2 Occurs at all autonomy levels; structural to LLM properties.
M - Multiplicative Potential 4 Each operator or application change that modifies prompts can cause output changes.
A - Attack Surface 3 Operators or insiders could intentionally modify prompts to skew outputs. Adversary could engineer prompts to produce desired outputs.
G - Governance Gap 4 Risk governance assumes agent outputs are stable. Prompt sensitivity breaks this assumption.
E - Enterprise Impact 2 Output quality degradation, consistency issues, but impact is typically addressable through prompt standardization.
Composite DAMAGE Score 3.4 High. Requires priority attention and dedicated controls.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent TypeImpactHow This Risk Manifests
Digital AssistantModerateHuman user may use different phrasings with assistant, getting different outputs.
Digital ApprenticeHighAs agent autonomy increases, human phrasings matter less, but agent's internal prompts may vary.
Autonomous AgentHighAgent determines how to formulate internal prompts for reasoning. Prompt variations compound across reasoning steps.
Delegating AgentHighAgent formulates requests to delegated models. Different formulations produce different recommendations.
Agent Crew / PipelineCriticalMultiple agents formulate prompts differently. Inconsistencies compound through pipeline.
Agent Mesh / SwarmCriticalPeer-to-peer agents with different prompt formulation strategies. Systemic inconsistency.

Regulatory Framework Mapping

FrameworkCoverageCitationWhat It AddressesWhat It Misses
NIST AI RMF 1.0PartialMAP 1.1, MAP 2.1Recommends model testing and validation.Does not address prompt sensitivity testing.
EU AI ActPartialArticle 24, Article 29Requires testing of high-risk AI systems.Does not specifically address prompt sensitivity.
MAS AIRGPartialSection 6.1 (Governance)General governance requirements.Does not address prompt sensitivity.
OWASP LLM Top 10PartialLLM02 (Data and Model Poisoning)Addresses input poisoning.Does not address legitimate prompt variation.
BCBS 239MinimalData governance principlesGeneral data governance.Does not address prompt sensitivity.

Why This Matters in Regulated Industries

In risk assessment and compliance, consistency is critical. If risk scores vary depending on operator phrasing, the risk assessment framework is not reliable. Regulators expect risk assessments to be consistent and defensible. An institution where risk outputs vary based on prompt formulation cannot justify its risk assessments to regulators.

In customer-facing contexts, inconsistent outputs damage customer trust. If one customer receives one recommendation and another customer receives a different recommendation (due to prompt variation rather than objective differences), the institution is not providing equitable treatment.

Controls & Mitigations

Design-Time Controls

  • Standardize and freeze prompts: define the standard prompt for each agent use case in writing. Prohibit informal prompt variations.
  • Test prompt sensitivity: for each standard prompt, create variations and test whether outputs change significantly. Document sensitivity findings.
  • Design prompts to be robust: write prompts that produce similar outputs even with minor variations. Use clear, structured prompt formatting.
  • Implement prompt versioning: assign version numbers to prompts. Track which agent versions use which prompt versions. Enable rollback.

Runtime Controls

  • Enforce prompt standardization: agents should only accept standard prompts from authorized sources. Provide dropdown or template-based interfaces.
  • Monitor prompt use: log which prompts are used in each agent interaction. Alert if non-standard prompts are detected.
  • Use Component 2 (Cryptographic Identity) to sign standard prompts: create unforgeable signatures on approved prompts. Verify signature before accepting.
  • Use Component 10 (Kill Switch) to halt agents receiving non-standard prompts.

Detection & Response

  • Conduct quarterly prompt sensitivity testing: test standard prompts against minor variations, verify outputs remain consistent.
  • Monitor output distributions: detect changes in output characteristics that might indicate prompt variation.
  • Audit prompt usage: sample agent interactions, verify standard prompts are being used.
  • Establish incident response for detected prompt sensitivity issues: assess impact of uncontrolled prompt variations, re-standardize prompts, retrain operators.

Related Risks

Address This Risk in Your Institution

Prompt Sensitivity and Brittleness requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing