Corvair CorvairKnowledge Substrate
Home / Docs / Knowledge format (OKF)
Knowledge format

Built on an open format, not a black box.

For every source it ingests, the substrate produces an Open Knowledge Format (OKF) bundle: portable, human and agent readable, and verifiable. In CKS the bundle is more than an export. It is the canonical ingestion representation and the durable system of record, the graph is a rebuildable projection of it, and it carries trust, validity, provenance, and a Validity Warrant that base OKF does not mandate. Your knowledge stays portable, inspectable, and yours.

What OKF is

OKF is an open specification published by Google Cloud (v0.1 draft, June 2026) in the public GoogleCloudPlatform/knowledge-catalog repository, licensed Apache-2.0. It formalises the "LLM-wiki" pattern as plain files, designed so a knowledge base is something you can read, diff, and carry between systems.

Bundle
A directory tree of markdown files. The unit of distribution; subdirectories are allowed and nestable.
Concept
One markdown file. Its identity (the Concept ID) is the file path minus .md; there is no separate id field.
Frontmatter
A YAML block at the top of each file. The only required field is type. Recommended optional fields: title, description, resource, tags, timestamp. Producers may add any other keys; consumers must preserve unknown keys.
Links
Ordinary markdown links, preferably bundle-root absolute. Untyped; broken links are legal (they may be not-yet-written knowledge).
Citations
Listed under a # Citations heading, pointing at URLs or a references/ subdirectory.
Reserved files
index.md (a directory listing) and log.md (a change history). The bundle-root index.md may carry okf_version.
Conformance
A bundle is conformant if every non-reserved file has parseable frontmatter with a non-empty type. Consumption is deliberately permissive.
key
That permissiveness is the enabler. CKS can add governance keys and keep its own typed relationships alongside the plain links, and remain fully conformant. OKF v0.1 is explicitly a starting point, not a finished standard.

Our position

Most systems would treat an open format as an export feature, a way to get data out. We take the opposite view. The substrate already does the work an OKF producer is assumed to have done: ingestion parses a source, extracts concepts, claims, and entities, scores trust and validity, and emits a Validity Warrant with provenance. An OKF bundle is a faithful, portable projection of exactly that. So we made three decisions that put OKF at the centre rather than the edge.

database

System of record

The OKF bundle store is the durable record of ingested knowledge. The graph and the search indexes are rebuildable projections, caches that can be regenerated from the bundles.

conveyor_belt

Canonical ingestion IR

Every source of every kind is normalised into OKF, and one loader populates the graph from OKF. Not many per-format extractors writing straight to the graph, but one schema, one loader, one set of golden tests.

workspace_premium

High-trust producer

CKS bundles carry trust tier, validity with decay, provenance hashes, and a warrant in the frontmatter, governance that base OKF does not require, while staying conformant.

push_pin
The consequence: because the bundles are authoritative, OKF is the ingestion representation from the start, not a later add-on. A graph cannot be the source of truth if it is a rebuildable projection of the bundles. The companion-first alternative was rejected; it would build the write path twice and leave the record secondary.

Our extensions to the standard

Standard OKF fields carry the portable basics. CKS governance rides in a namespaced x-cks extension block, so a base consumer simply ignores it and the bundle stays conformant. The standard timestamp is the last meaningful change, which for CKS is the last re-validation.

# a CKS concept, conformant OKF with an x-cks governance block
type: Web Page                       # REQUIRED by OKF
title: Fair Lending Protocol (2026 revision)
resource: https://intranet.example.com/policy/fair-lending?v=2026-05
timestamp: 2026-06-10T14:30:00Z       # last re-validation
x-cks:
  kb: risk-and-compliance
  tenant: { owner: acme, home: acme }
  trust_tier: T1                      # T1..T4
  validity: { score: 0.96, as_of: 2026-06-10, half_life_days: 365, decays_at: 2027-06-10 }
  provenance: { content_hash: "sha384:9f2c...a17b", connector: sharepoint }
  warrant: { id: "wrt_...", alg: "ecdsa-p384-sha384", status: verified }
  classification: { sensitivity: internal, pii: none, language: en }
  entitlement: { licensed: false }
  rels:                               # typed edges, mirrored as plain links in the body
    - { type: grounded_in, target: "/policy/fair-lending/source.md" }
    - { type: depends_on,  target: "/standards/mrm-std-02/source.md" }

Two design points matter here. The plain markdown links in the body remain the OKF relationships, untyped and conformant, so base consumers see the structure. The x-cks.rels block records the CKS edge type, and because the graph is rebuilt from the bundles, it is required on CKS-produced concepts so the typed graph can be reconstructed losslessly. Strip x-cks entirely and the bundle is still valid and portable; it simply loses the governance and the typed edges on the way out.

x-cks fieldCarries
trust_tierThe source's tier, T1 to T4.
validityScore, as-of date, half-life, and decay date.
provenanceSHA-384 content hash, source URI, and the connector.
warrantThe Validity Warrant id, signing algorithm, and status.
classificationSensitivity, PII presence, and language.
entitlementWhether the concept is licensed, and its pack and release.
relsThe CKS-typed relationships (grounded-in, depends-on, and so on).

The source boundary

"Bundle per source" only works if "source" is bounded so a bundle is never too big or too abstract. CKS uses a three-level model, which is itself an extension of how a base OKF producer might think about granularity.

1

Collection

A repository, site, drive, SharePoint library, Confluence space, mailbox, or crawl. Never a single bundle; a parent directory with its own index.md linking to its member source bundles.

2

Source

The unit that gets a bundle: the smallest independently addressable, citable, versioned artefact (one file, page, item, or thread). Its source key is canonical URI plus content hash plus version, which is also the bundle identity and the idempotency key for re-ingestion.

3

Concept

Within a bundle, one markdown file per extracted claim or entity, plus the source itself as a concept. Each concept cites the source and carries its own trust and validity.

Guardrails keep a bundle right-sized: configurable caps on concept count and bytes (a default soft maximum of about 200 concepts and a few megabytes); very large documents split into section-level sub-sources along their natural headings; per-connector source-unit rules set in the recipe; and deterministic, pinned concept paths so a path survives re-ingestion and inbound cross-links stay resolvable. Removed concepts are tombstoned, not silently dropped.

Pipeline & round-trip

Governance fields such as trust, validity, and the warrant are only known after scoring, yet the graph must be populated before scoring can run on it. CKS resolves this with two phases over the same files.

1

Draft OKF at extraction

After collect, parse, and classify, the source is normalised into a draft bundle: concepts, bodies, citations, and provenance, but no trust, validity, or warrant yet. The draft is the standardised hand-off.

2

Load the graph from OKF

One loader reads the draft and writes nodes and edges into the retrieval graph and the Causal Substrate, recording each node's okf_concept_id so the graph and the bundle stay reconcilable.

3

Embed, score, warrant

The usual scoring and warranting run on the loaded graph.

4

Finalise the companion

The bundle is enriched in place with the x-cks governance block, log.md is appended, and the finalised bundle is published as the durable record and export. Re-validation later updates only the affected frontmatter and the log.

Round-trip: OKF as input

When the input is already an OKF bundle, CKS skips parse and extract entirely. It validates conformance, honours the declared okf_version (a newer minor version loads best-effort with a warning; only structural failure is hard-rejected), maps the bundle through the same loader (honouring x-cks if present, tolerating its absence), then scores, warrants, and finalises without discarding the producer's keys. A producer who already ships OKF (a data catalog, a partner, another CKS) is onboarded without a bespoke extractor, and evaluation fixtures can be authored as OKF for deterministic graph-build tests.

account_tree
This round-trip is what makes catalog integration possible in both directions. Because CKS is a native OKF producer and consumer, it publishes its governed knowledge into an enterprise knowledge catalog as a first-class partner, and ingests from any catalog through a vendor-agnostic adapter, with OKF as the common format on the way in.

Licensed content

Licensed partner content is projected to OKF too, but its companions are entitlement-gated, mirroring the packaging and catalog subscription rules. Each licensed concept carries x-cks.entitlement = { licensed: true, pack, release }. Export releases a licensed bundle only to an entitled subscriber; a non-entitled caller receives the source metadata at most, never the licensed body. Licensed companion access is metered, and the same privacy-scrubbing discipline used for usage exports applies, so licensed bodies never leak into a non-entitled export.

Why we built CKS on OKF

swap_horiz

Portability and interoperability

A vendor-neutral, plain-text record that outlives any single datastore and moves cleanly between environments, clouds, and tools. No lock-in to the graph engine.

replay

Reproducibility and DR

The graph and indexes are regenerated deterministically by replaying the loader over the bundles. Disaster recovery is "restore the bundles, rebuild the projection", with no separate graph backup to keep authoritative.

bolt

Standardised, accelerated ingestion

One schema, one loader, one set of golden tests. A producer who already ships OKF is imported directly, so onboarding a new repository or partner is faster and less bespoke.

visibility

Inspectable and governable

Stewards and auditors get a readable, diffable, plain-text view of what the base knows, with trust, validity, and the warrant visible in the frontmatter.

verified_user
Building on an open standard is itself a trust position. Your knowledge is not trapped in a proprietary representation. It is a verifiable record you could read, audit, or move without us.

Conformance & spec posture

CKS targets OKF v0.1 as published at GoogleCloudPlatform/knowledge-catalog, vendored read-only at a pinned commit SHA, not just the 0.1 string. The projector, loader, and conformance validator are written against the vendored copy; a standing task watches upstream, and any version bump is gated behind a diff review. Because the spec author states the repository is not an official Google product, the entire CKS-to-OKF mapping is isolated in one module, and both the version and the SHA are recorded in the bundle provenance.

Acceptance includes a base-conformance golden test: a CKS bundle with x-cks stripped must validate and load in a vanilla OKF reader. The conformance endpoint reports the declared and targeted versions and whether consumption is strict (matches the pinned target) or best-effort (a newer minor version).

api
Ready to use it? See the OKF API endpoints for export, round-trip import, and conformance, and the architecture page for where the bundle store sits relative to the graph.