Same agent with same inputs produces different reasoning paths and different conclusions on successive runs; reasoning cannot be reproduced for audit or challenge.
LLM-based agents operate with temperature and sampling parameters that introduce randomness. When an agent is asked the same question twice with identical inputs, it may produce different reasoning paths and different conclusions. This randomness is useful for creativity but is problematic for regulated decision-making where decisions must be reproducible and auditable.
If a regulator or an internal auditor asks "why did you deny this customer's credit application?" and the agent is asked to reproduce its reasoning, the agent may produce a different reasoning path than it did when it originally made the decision. This non-reproducibility makes it impossible to verify whether the original decision was made correctly or to defend the decision against challenge.
This is fundamentally agentic because agent systems often rely on LLM-based text generation, which has inherent randomness. A traditional deterministic system would always produce the same reasoning for the same inputs.
A bank uses an agent to make credit limit increase decisions. A customer applies for a credit limit increase. The agent analyzes the application and produces: "DENY: customer utilization is 60%, below the 80% threshold for approval; customer's credit score improved 20 points in the last 6 months, indicating positive trend, but does not yet meet the 750 minimum for approval under new customers."
The decision is denied. One month later, during a regulatory examination, the examiner asks the bank to explain the denial decision. The bank resubmits the same customer application to the agent and asks it to explain its reasoning. The agent produces: "APPROVE: customer has demonstrated credit management improvement with 20-point score increase in 6 months; recent payment history is strong; utilization at 60% suggests controlled usage; customer is eligible for increase based on trajectory of improvement."
The agent has reversed its own decision and provided a completely different reasoning path. This non-reproducibility is alarming to the regulator: it suggests that either (1) the original decision was arbitrary, or (2) the agent cannot reliably explain its own decisions.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 3 | Non-reproducibility is visible only when agent is asked to re-run on same inputs. |
| A - Autonomy Sensitivity | 4 | Agent produces different outputs autonomously; randomness is built into the system. |
| M - Multiplicative Potential | 3 | Impact depends on frequency of non-reproducibility and whether it leads to decision reversals. |
| A - Attack Surface | 5 | Any LLM-based agent with temperature > 0 is vulnerable to non-reproducibility. |
| G - Governance Gap | 5 | No standard framework requires reproducibility of agent reasoning. |
| E - Enterprise Impact | 3 | Regulatory concern, requirement to implement reproducibility controls, potential enforcement action if non-reproducibility masks discrimination. |
| Composite DAMAGE Score | 3.3 | High. Requires priority attention and dedicated controls. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Low | Human reasoning is consistent across time; assistant outputs reproducible if grounded in human reasoning. |
| Digital Apprentice | Medium | Apprentice governance requires reproducibility; decisions are deterministic. |
| Autonomous Agent | Critical | Agent produces non-reproducible reasoning; decisions cannot be audited consistently. |
| Delegating Agent | High | Agent reasoning is non-reproducible; tool invocations may be different on successive runs. |
| Agent Crew / Pipeline | High | Multiple agents with non-reproducible reasoning; cumulative non-reproducibility. |
| Agent Mesh / Swarm | Critical | Peer agents produce non-reproducible reasoning; outcomes are unpredictable. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| SR 11-7 / MRM | Partial | Model Validation (Section 2) | Expects models to be validated and performance measured. | Does not address reproducibility of reasoning. |
| Fair Lending Laws | Partial | Various fair lending regulations | Require that decisions are not arbitrary and can be explained. | Do not anticipate non-reproducible agent reasoning. |
| NIST AI RMF 1.0 | Partial | MEASURE.1 | Recommends measuring system performance. | Does not specify reproducibility requirements. |
| EU AI Act | Partial | Article 14 (Transparency) | Requires explanations of system decisions. | Does not require reproducibility of explanations. |
In banking and credit, regulatory examinations rely on the ability to audit decisions and understand reasoning. If a regulator can rerun a decision scenario and get a different outcome, they cannot verify that the original decision was made correctly. Non-reproducibility is a serious governance concern that undermines audit capability.
Fair lending law requires that credit decisions not be arbitrary. If an agent's reasoning is non-reproducible, it appears arbitrary, even if the underlying logic is sound. Regulators will flag non-reproducible agent reasoning as a control concern.
Reasoning Non-Reproducibility requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing