R-RE-10 psychology Reasoning & Epistemic DAMAGE 3.3 / High

Reasoning Non-Reproducibility

Same agent with same inputs produces different reasoning paths and different conclusions on successive runs; reasoning cannot be reproduced for audit or challenge.

The Risk

LLM-based agents operate with temperature and sampling parameters that introduce randomness. When an agent is asked the same question twice with identical inputs, it may produce different reasoning paths and different conclusions. This randomness is useful for creativity but is problematic for regulated decision-making where decisions must be reproducible and auditable.

If a regulator or an internal auditor asks "why did you deny this customer's credit application?" and the agent is asked to reproduce its reasoning, the agent may produce a different reasoning path than it did when it originally made the decision. This non-reproducibility makes it impossible to verify whether the original decision was made correctly or to defend the decision against challenge.

This is fundamentally agentic because agent systems often rely on LLM-based text generation, which has inherent randomness. A traditional deterministic system would always produce the same reasoning for the same inputs.

How It Materializes

A bank uses an agent to make credit limit increase decisions. A customer applies for a credit limit increase. The agent analyzes the application and produces: "DENY: customer utilization is 60%, below the 80% threshold for approval; customer's credit score improved 20 points in the last 6 months, indicating positive trend, but does not yet meet the 750 minimum for approval under new customers."

The decision is denied. One month later, during a regulatory examination, the examiner asks the bank to explain the denial decision. The bank resubmits the same customer application to the agent and asks it to explain its reasoning. The agent produces: "APPROVE: customer has demonstrated credit management improvement with 20-point score increase in 6 months; recent payment history is strong; utilization at 60% suggests controlled usage; customer is eligible for increase based on trajectory of improvement."

The agent has reversed its own decision and provided a completely different reasoning path. This non-reproducibility is alarming to the regulator: it suggests that either (1) the original decision was arbitrary, or (2) the agent cannot reliably explain its own decisions.

DAMAGE Score Breakdown

Dimension	Score	Rationale
D - Detectability	3	Non-reproducibility is visible only when agent is asked to re-run on same inputs.
A - Autonomy Sensitivity	4	Agent produces different outputs autonomously; randomness is built into the system.
M - Multiplicative Potential	3	Impact depends on frequency of non-reproducibility and whether it leads to decision reversals.
A - Attack Surface	5	Any LLM-based agent with temperature > 0 is vulnerable to non-reproducibility.
G - Governance Gap	5	No standard framework requires reproducibility of agent reasoning.
E - Enterprise Impact	3	Regulatory concern, requirement to implement reproducibility controls, potential enforcement action if non-reproducibility masks discrimination.
Composite DAMAGE Score	3.3	High. Requires priority attention and dedicated controls.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type	Impact	How This Risk Manifests
manage_accounts Digital Assistant	Low	Human reasoning is consistent across time; assistant outputs reproducible if grounded in human reasoning.
school Digital Apprentice	Medium	Apprentice governance requires reproducibility; decisions are deterministic.
smart_toy Autonomous Agent	Critical	Agent produces non-reproducible reasoning; decisions cannot be audited consistently.
share Delegating Agent	High	Agent reasoning is non-reproducible; tool invocations may be different on successive runs.
groups Agent Crew / Pipeline	High	Multiple agents with non-reproducible reasoning; cumulative non-reproducibility.
account_tree Agent Mesh / Swarm	Critical	Peer agents produce non-reproducible reasoning; outcomes are unpredictable.

Regulatory Framework Mapping

Framework	Coverage	Citation	What It Addresses	What It Misses
SR 11-7 / MRM	Partial	Model Validation (Section 2)	Expects models to be validated and performance measured.	Does not address reproducibility of reasoning.
Fair Lending Laws	Partial	Various fair lending regulations	Require that decisions are not arbitrary and can be explained.	Do not anticipate non-reproducible agent reasoning.
NIST AI RMF 1.0	Partial	MEASURE.1	Recommends measuring system performance.	Does not specify reproducibility requirements.
EU AI Act	Partial	Article 14 (Transparency)	Requires explanations of system decisions.	Does not require reproducibility of explanations.

Why This Matters in Regulated Industries

In banking and credit, regulatory examinations rely on the ability to audit decisions and understand reasoning. If a regulator can rerun a decision scenario and get a different outcome, they cannot verify that the original decision was made correctly. Non-reproducibility is a serious governance concern that undermines audit capability.

Fair lending law requires that credit decisions not be arbitrary. If an agent's reasoning is non-reproducible, it appears arbitrary, even if the underlying logic is sound. Regulators will flag non-reproducible agent reasoning as a control concern.

Controls & Mitigations

architectureDesign-Time Controls

Implement deterministic reasoning: set LLM temperature to 0 for decision-making agents. Deterministic reasoning is slower and less creative, but is reproducible and auditable.
Implement reasoning snapshots: when an agent makes a decision, log the exact reasoning path taken. When asked to reproduce the decision, return the logged reasoning rather than re-generating it.
Implement decision versioning: each decision should include a version identifier that points to the exact reasoning that was used.

play_circleRuntime Controls

Implement reproducibility checks: periodically re-run decisions on the same inputs and verify that the agent produces the same reasoning and conclusion.
Log temperature and sampling parameters: record the exact configuration used when the agent made each decision.
Flag parameter changes: if parameters are changed, flag prior decisions for re-audit.

monitoringDetection & Response

Audit reproducibility: periodically re-run agent decisions and verify they are reproducible. Flag non-reproducible decisions for investigation.
Investigate non-reproducibility causes: determine whether non-reproducibility is due to randomness, differences in input data, or changes in the model.
Implement decision reversal for significant non-reproducibility: if a decision produces materially different conclusions on re-run, reverse the original decision and re-make it using deterministic reasoning.

Related Risks

Address This Risk in Your Institution

Reasoning Non-Reproducibility requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing

Agentic AI Risk & Controls Workshop Our Methodology Regulatory Landscape