R-DG-05 Data Governance & Integrity DAMAGE 3.4 / High

Uncontrolled Data Replication

Agents replicate data to vector DBs, context caches, tool workspaces outside data management perimeter. Retention controls cannot enforce deletion.

The Risk

Data retention and deletion policies are premised on centralized control: sensitive data is stored in a single data warehouse with access controls, retention schedules, and deletion procedures. When data must be purged (due to customer request, regulatory requirement, or retention policy expiration), the data management team executes a delete query. The deletion is complete because there is one place where the data lives.

Agents break this assumption by distributing data across multiple systems outside the original data management perimeter. An agent receives data from a central store, embeds it in vector databases for semantic search, caches it in a context store for performance, and serializes it to tool workspaces for processing. The original data governance policy applies only to the central store. The replicated copies in vector stores, caches, and workspaces are outside the policy's scope. When deletion is ordered, the central store is cleaned but the replicated copies remain, now orphaned from governance oversight.

This is particularly dangerous for personal data subject to GDPR right-to-be-forgotten requests, or for sensitive data subject to regulatory hold policies. A customer requests deletion under GDPR. The institution deletes the customer's record from its central database. The agent's vector store still contains embeddings derived from the customer's data. The agent continues to retrieve and reason about those embeddings in future interactions. The customer has been technically deleted from the system-of-record but is still processed by agents. The institution has failed to honor the deletion request.

How It Materializes

A bank's compliance team uses agents to conduct sanctions screening on incoming payments. The agent receives a customer name and beneficiary information, embeds it in a vector store for semantic similarity search, and searches against sanctions lists, internal exclusion lists, and a vector-indexed store of historical suspicious accounts (built from 10 years of prior investigations). The agent retrieves the most similar historical accounts, reasons about risk patterns, and scores the payment. The vector store contains embeddings of 50,000 historical customer records.

A customer requests deletion under GDPR. The bank's legal team issues a delete order. The compliance team executes a delete query against the central customer database. The customer record is deleted. However, the customer's data has already been embedded in the agent's vector store (embeddings created during prior screening interactions). The vector store deletion query fails because vector stores do not support precise "delete by source record ID" operations; they only support deleting by embedding ID, and the mapping between customer ID and embedding ID is not maintained. The embeddings remain in the vector store, orphaned.

Six weeks later, an agent screens a new customer who shares a surname and address with the prior customer (unrelated individual, coincidental similarity). The agent's semantic search retrieves embeddings from the deleted customer's historical account. The embeddings trigger risk flags. The new customer is flagged as high-risk based on a deleted customer's historical data. The deleted customer's information has not been truly forgotten; it has been distributed outside governance perimeter and continues to affect decisions through semantic retrieval.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 4 Replication is often invisible to governance teams. Discovery typically occurs through GDPR deletion failure, audit finding, or incident investigation.
A - Autonomy Sensitivity 4 More autonomous agents replicate data to more systems. Less oversight means more untracked replication.
M - Multiplicative Potential 4 Each agent replication point increases the challenge of complete deletion. Multiple agents create exponential replication.
A - Attack Surface 3 Not easily weaponized externally; primarily a structural issue. Adversary could exploit replication to access data outside governance perimeter.
G - Governance Gap 5 Data retention and deletion policies assume centralized storage. Distributed agent architecture breaks the model.
E - Enterprise Impact 4 GDPR violations, regulatory enforcement, customer complaints, inability to honor deletion requests. Systemic governance failure.
Composite DAMAGE Score 3.4 High. Requires priority attention with dedicated controls and monitoring.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type Impact How This Risk Manifests
Digital Assistant Moderate May have fewer replications due to human oversight, but assistant still creates vector embeddings and caches.
Digital Apprentice Moderate-High Progressive autonomy means agent replicates data to more systems with less oversight.
Autonomous Agent High Fully autonomous replication to vector stores, caches, and workspaces without human awareness.
Delegating Agent High Agent determines which tools to invoke and what data to pass. Each tool may replicate data to its own storage systems.
Agent Crew / Pipeline Critical Multiple agents each replicate data. Data passes between agents, creating replications at each step.
Agent Mesh / Swarm Critical Peer-to-peer agent sharing means data is replicated across entire mesh. Complete cleanup becomes impossible.

Regulatory Framework Mapping

Framework Coverage Citation What It Addresses What It Misses
GDPR Moderate Article 17 (Right to be Forgotten) Requires erasure of personal data upon request. Does not explicitly address erasure in distributed AI systems or vector stores.
PDPA Moderate Section 21 (Access and Correction) Addresses data subject rights to access and correction. Does not address distributed agent storage systems.
CCPA/CPRA Moderate Section 1798.105 (Deletion Right) Requires deletion of consumer personal data on request. Does not address deletion from agent vector stores or embedded representations.
BCBS 239 Partial Principle 9 (Data Retention) Requires data retention and deletion standards. Does not address agent-based data replication.
EU AI Act Partial Article 24 (Documentation) Requires documentation of data handling. Does not address vector store or cache replication.
MAS AIRG Partial Section 6.1 (Governance) Requires data governance and retention standards. Does not anticipate agent vector store replication.
ISO 42001 Partial Section 6.1.2 Addresses information lifecycle management. Does not address AI vector store replication.
HIPAA Moderate 45 CFR 164.412 Requires control over health information systems. Does not address agent-based health data replication.

Why This Matters in Regulated Industries

GDPR deletion requests are not optional compliance tasks; they are regulatory requirements with penalties exceeding 20 million euros for violations. An institution that cannot honor deletion requests because agent vector stores are outside the governance perimeter is in violation. Insurance regulators, banking regulators, and healthcare regulators all expect institutions to be able to delete customer data on demand. If agents have replicated the data to untracked systems, deletion is impossible.

In payments, sanctions screening agents replicate customer data to vector stores. If deletion fails, the customer's data remains in the screening system indefinitely, continuing to affect transaction decisions. In insurance, underwriting agents replicate customer data to context caches. If deletion fails after a customer terminates their policy, the data persists. The replication pattern is common in all agent deployments; the governance failure is universal. Regulators will increasingly expect institutions to demonstrate they can delete customer data from agent-connected systems.

Controls & Mitigations

Design-Time Controls

  • Prohibit agents from persisting data to vector databases, context caches, or tool workspaces without explicit governance approval. Require agents to retrieve-process-discard rather than retrieve-persist-reuse.
  • Implement a "centralized retrieval" architecture: agents do not directly call vector stores or data services. Instead, agents submit queries to a governance-controlled retrieval service that handles caching, replication, and deletion enforcement.
  • Design retention-aware data structures: require any system storing agent-retrieved data to maintain metadata linking each data element to its source customer/entity and retention expiration date.
  • Use Component 1 (Agent Registry) to document all data stores that each agent accesses or replicates to. Conduct quarterly audits comparing registry declarations to actual replication patterns.

Runtime Controls

  • Implement immutable audit logging for all agent data replication: log timestamp, source system, source record ID, destination system, replication method, and reason.
  • Require agents to validate deletion status before reasoning on customer data: query the data governance system to check if the customer has a pending or completed deletion request.
  • Enforce vector store cleanup: implement a deletion synchronization service that propagates deletion requests from central systems to vector databases, caches, and tool workspaces.
  • Use Component 10 (Kill Switch) to automatically halt any agent that attempts to persist data to an unregistered data store or to a system not under data governance control.

Detection & Response

  • Conduct quarterly data replication audits: query all vector stores, context caches, and tool workspaces to identify customer data. Cross-reference against central deletion requests.
  • Implement GDPR deletion testing: periodically issue deletion requests for test accounts and verify that all replicated copies are deleted within SLA (typically 30 days).
  • Monitor deletion requests and track completion time across all systems. Alert if any replication is not cleaned up within policy timeframe.
  • Establish deletion incident response: if deletion failure discovered, immediately halt affected agent, audit all data replicated to untracked systems, restore deletion status across all systems.

Related Risks

Address This Risk in Your Institution

Uncontrolled Data Replication requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing