Security & compliance | Corvair Knowledge Substrate

Trust boundaries

The design names four trust boundaries and keeps the most sensitive assets inside the innermost one.

home

Customer boundary

In dedicated, on-premise, hybrid, and marketplace topologies the platform, data, and audit trail stay inside the customer's boundary. Nothing identifying or sensitive leaves it except a privacy-scrubbed usage export the customer can inspect.

apartment

Tenant boundary

In shared SaaS, tenants are isolated by an owner-tenant scope on every store access; cross-tenant access happens only by explicit grant.

dns

Service boundary

Each runtime has its own least-privilege service account. The warrant signing key is reachable only through one internal service.

public

External boundary

The research-and-monitor side reaches the open internet; the governed core ingests only through the security scan and veracity gate. In the hybrid split only a recipe goes out and only staged files come back.

Assets are ranked by sensitivity: customer knowledge content and PII; the warrant signing key; secrets; the audit trail and warrants; recipes and configuration; usage data; and platform availability.

Threat model

The top threats and their mitigations, from the threat table.

Threat	Mitigation
Cross-tenant or cross-base access	Tenant-then-kb scoping, relationship-based access, persona gating on every route
Credential or key compromise	Secrets in Secret Manager, signing key in KMS or HSM never exported, Workload Identity (no static keys), least privilege
Prompt injection	Injection scan and neutralise at ingestion; agents act only through governed, persona-checked tools with budgets and breakers
PII leakage through answers	PII detection with per-role redact, block, or allow; the answer gate; output scrubbing before logs and exports
Tampering with the audit record	Signed, timestamped, chained warrants; a transparency log; offline verification
Agent privilege escalation	Autonomy levels, an approval queue, circuit breakers, every action audited
Supply-chain compromise	Permissive-licence-only policy, an SBOM and licence gate in CI, pinned and scanned dependencies
Insider or operator overreach	The administrator runs the platform but cannot read content; steward actions are preview-then-commit and warranted
Denial of service or runaway cost	Quotas per tenant and persona, agent budgets, circuit breakers
Man-in-the-middle	TLS everywhere, private connectivity to the data store, no public store endpoints in production

Encryption & keys

In transit

TLS 1.2 or higher on every hop. Production reaches the graph store over Private Service Connect, not the public internet.

At rest

The operational store, object storage, and the graph store all encrypt at rest. Where a customer requires control of the keys, customer-managed encryption keys (CMEK) back the relational store and the buckets, and graph-store customer-managed keys are used where the tier supports them.

Warrant key

Separate, in KMS or an HSM, generated in and never leaving it, owned by the governance authority, signable only via one internal service.

Rotation

The signing key rotates on schedule with old public keys retained, so old warrants still verify. Application secrets rotate on a defined cadence; rotation is zero-downtime because services read from Secret Manager at start-up and refresh.

Personal data

PII is detected and, where the guardrail requires, redacted before storage in derived records, logs, and exports. The warrant stores salted commitments, never raw personal data.

Identity, secrets & network

People authenticate by OIDC single sign-on; agents by a scoped, revocable JWT or API key. No shared accounts. IAM is least-privilege per workload, and the administrator role excludes content access: the administrator runs the platform but cannot read knowledge content, enforced both in IAM and at the API. Workload Identity removes static keys. The network uses a custom VPC, default-deny ingress, private connectivity to the data stores, stable egress for allowlisting, and private DNS; the warrant service is internal-ingress only, and there are no public store endpoints in production. All secrets live in Secret Manager, referenced by name, never in code, logs, or infrastructure state.

AI-specific controls

vaccines

Prompt injection

Scanned and neutralised at ingestion before extraction, so untrusted content cannot carry instructions into the extractor or downstream agents.

visibility_off

PII & exfiltration

PII detection with per-role redact, block, or allow; an answer gate; output scrubbing before logs and exports.

fact_check

Grounding

Answers are grounded only in retrieved context and carry a premise chain and citations; the veracity gate refuses high-stakes grounding on a single low-tier source.

smart_toy

Agent containment

Agents act only through governed tools within autonomy levels, budgets, and circuit breakers; every action is audited and reversible, with an approval queue for escalation.

Data lifecycle & erasure

Each data class has its own retention, and retention windows are versioned configuration so tuning is auditable.

Class	Default retention	Erasable
Raw sources	Per recipe and residency window	Yes, with cascade
Derived knowledge (chunks, items, embeddings)	Life of the base	Yes, via retraction or erasure
Snapshots	Per snapshot policy	Yes, oldest first
Agentic memory	Per policy (decisions kept longer)	Yes, with care for warranted decisions
Warrants	Long, for the evidence window	Hashes and commitments kept; raw content erasable
Usage data	Local window; central is anonymous	Local erasable; central carries no identity

Right-to-erasure

A subject-data-erasure request resolves to items, sources, memories, and spans, then erases across every store in one governed, provenanced operation.

Locate

Find the personal data: PII spans, grounded items, source documents, memory items, exports.

Preview

Show the cascade as an impact diff, exactly like a retraction, so the effect is visible before commit.

Erase per store

Delete all versions in the raw bucket; delete or redact in the graph with a retraction cascade; delete or redact rows in the operational store; erase memory payloads.

Preserve the proof

The warrant chain holds because erased premises and sources remain as salted commitments and content hashes, never raw personal data, so the signature still verifies but the content cannot be revealed.

Record

The erasure writes provenance and, where decision-bearing, a warrant of the erasure itself.

restore

Backups cannot resurrect erased data. Because a backup can predate an erasure, erasure events are recorded durably and re-applied on any restore before the restored data is served. The erasure log is replayed; backups are never edited.

Tenancy & residency

Tenancy is a first-class, always-present concept: every knowledge base and principal belongs to a tenant, and single-tenant is simply one tenant. A small control plane, scoped to the platform operator and excluded from MCP, provisions and configures tenants but never reads knowledge content; the data plane serves and governs knowledge, resolving the tenant before the knowledge base. Shared deployments use row-level isolation by owner tenant, enforced in the data-access layer rather than left to callers; dedicated tenants get their own stores and optionally their own transparency log and governance key, with an identical schema so code does not branch. Data residency is pinned per tenant: all of a tenant's data and AI calls stay in its region, and a residency change is an explicit migrate-and-repoint, not a flag flip. Per-tenant quotas cover knowledge bases, storage, daily tokens, and seats, and exceeding one throttles or refuses with a clear error rather than degrading a neighbour.

Resilience & DR

Recovery targets are set per environment and per tenant plan, and confirmed against measured restore times.

Class	RPO (data loss)	RTO (downtime)
Operational store	Minutes (point-in-time recovery)	Low single-digit hours
Knowledge store	Hours (scheduled, tighter with continuous backup)	Low single-digit hours
Object storage	Near zero (versioned, durable)	Minutes
Warrants & chain	Zero tolerated loss	Restored with the operational store

A restore brings back a consistent point across stores, replays the erasure log before serving, and restores into the tenant's residency region so residency holds through DR. Upgrades are zero-downtime: Cloud Run revisions shift traffic with rollback as a traffic shift back, and PostgreSQL schema changes are additive and backward-compatible (expand-then-contract) so old and new revisions run concurrently. Each release records its schema version and refuses to start against an incompatible one; a pack declares the substrate range it targets, checked before it is applied. Because the OKF bundle store is the system of record, DR can also restore the bundles and replay the loader to rebuild the graph and indexes.

Supply chain & testing

A permissive-licence-only policy fails the build on a GPL or AGPL transitive dependency; dependencies are pinned and scanned in CI, and images are built reproducibly and tagged by commit. Because the MCP cover and the API are generated from one contract, the agent surface cannot silently widen. Verification runs in three modes: static (IAM and policy checks, secret-reference linting, the SBOM and licence gate), dynamic (dependency and container scans, a periodic third-party penetration test against a deployed environment, tamper tests on the warrant), and continuous (a governance dashboard and alerts on breaker trips, warrant-verification failures, quarantine spikes, and auth failures).

Compliance posture

The substrate is built to support a customer's compliance obligations rather than to substitute for them. Concretely, it provides a tamper-evident, signed, timestamped, and chained audit record that verifies offline; a right-to-erasure workflow that operates across every store and survives backups while keeping the evidence chain intact; provenance on every governed action; data residency pinned per tenant and region; and a privacy-scrubbed usage export that carries no customer identity or content. The design supports a per-deployment data-protection impact assessment, which remains a customer-side activity.

info

Specific certification or framework attestations (for example SOC 2 or ISO 27001) are a function of a given deployment and operating organisation and are confirmed per engagement; this page describes the controls the platform provides, not a certification claim.