Baseline LLM characteristics that create a floor of risk underneath every agentic system: non-determinism, training bias, context limitations, provider dependency, and prompt sensitivity. These risks are about the engine itself, distinct from agent-level reasoning or institutional model pipeline concerns.
Every risk in the catalog sits on top of the foundational properties of large language models. These risks document the baseline LLM characteristics that create a floor of risk underneath everything else: non-determinism, training bias, context limitations, provider dependency, and prompt sensitivity. They are distinct from Reasoning and Epistemic risks (which cover agent-level decision logic built on top of the LLM) and Model and Pipeline Interaction risks (which cover the agent's relationship with the institution's ML model estate). These risks are about the engine itself.
What makes these risks foundational is that they cannot be eliminated through better agent design. An agent built with perfect governance architecture still inherits non-deterministic outputs, training data biases it cannot inspect, context window limitations that silently drop critical constraints, and dependency on providers who update models without notice. Every other category in this catalog compounds on top of these baseline properties. Understanding foundation model risks is prerequisite to governing everything built on top of them.
Chief Technology Officers, ML engineering leads, model risk management teams, vendor management, procurement, and any risk owner responsible for third-party model governance or AI infrastructure resilience. If your institution deploys agents powered by external foundation models, these risks define the baseline that all other governance must account for.
| Critical | High | Moderate | Low |
|---|---|---|---|
| 4 | 6 | 0 | 0 |
Model providers update production models without advance notice. The agent's behavior changes without any change to the agent, its prompts, or its tools.
Most agentic deployments depend on a single model provider. If the provider experiences an outage or discontinues the model, all agents fail simultaneously.
Models inherit biases from training data. The institution cannot access, audit, or remediate biases in a model it does not own.
When inputs exceed the context window, content is truncated silently. Critical constraints defined early may be pushed out by subsequent content.
Small changes in prompt wording cause disproportionately large changes in model output. The input space is so large that exhaustive testing is impossible.
Same customer query processed by the same agent can produce different recommendations. Non-determinism undermines fair treatment obligations and audit reproduction.
LLMs perform differently across languages. An agent accurate in English may produce inferior analysis in other languages. Performance variation creates a compliance gap.
A single agent interaction that triggers a reasoning loop can consume thousands of dollars in API costs in minutes. Cost monitoring operates on billing cycles; cost runaway operates on seconds.
Agent memory stores grow through normal operation without expiration, validation, or reconciliation against sources of record. The memory grows but its accuracy decays.
Multi-step agent reasoning compounds a mild model-level bias into a severe output-level bias. Each reasoning step reinforces the pattern until the outcome is materially worse.
Foundation model properties create a floor of risk that better agent design cannot eliminate. Our advisory engagements help regulated institutions implement model governance frameworks, provider concentration controls, and output validation architectures that account for the baseline characteristics of the LLMs powering their agents.
Schedule a Briefing