What OKF is
OKF is an open specification published by Google Cloud (v0.1 draft, June 2026) in the public GoogleCloudPlatform/knowledge-catalog repository, licensed Apache-2.0. It formalises the "LLM-wiki" pattern as plain files, designed so a knowledge base is something you can read, diff, and carry between systems.
.md; there is no separate id field.type. Recommended optional fields: title, description, resource, tags, timestamp. Producers may add any other keys; consumers must preserve unknown keys.# Citations heading, pointing at URLs or a references/ subdirectory.index.md (a directory listing) and log.md (a change history). The bundle-root index.md may carry okf_version.type. Consumption is deliberately permissive.Our position
Most systems would treat an open format as an export feature, a way to get data out. We take the opposite view. The substrate already does the work an OKF producer is assumed to have done: ingestion parses a source, extracts concepts, claims, and entities, scores trust and validity, and emits a Validity Warrant with provenance. An OKF bundle is a faithful, portable projection of exactly that. So we made three decisions that put OKF at the centre rather than the edge.
System of record
The OKF bundle store is the durable record of ingested knowledge. The graph and the search indexes are rebuildable projections, caches that can be regenerated from the bundles.
Canonical ingestion IR
Every source of every kind is normalised into OKF, and one loader populates the graph from OKF. Not many per-format extractors writing straight to the graph, but one schema, one loader, one set of golden tests.
High-trust producer
CKS bundles carry trust tier, validity with decay, provenance hashes, and a warrant in the frontmatter, governance that base OKF does not require, while staying conformant.
Our extensions to the standard
Standard OKF fields carry the portable basics. CKS governance rides in a namespaced x-cks extension block, so a base consumer simply ignores it and the bundle stays conformant. The standard timestamp is the last meaningful change, which for CKS is the last re-validation.
# a CKS concept, conformant OKF with an x-cks governance block type: Web Page # REQUIRED by OKF title: Fair Lending Protocol (2026 revision) resource: https://intranet.example.com/policy/fair-lending?v=2026-05 timestamp: 2026-06-10T14:30:00Z # last re-validation x-cks: kb: risk-and-compliance tenant: { owner: acme, home: acme } trust_tier: T1 # T1..T4 validity: { score: 0.96, as_of: 2026-06-10, half_life_days: 365, decays_at: 2027-06-10 } provenance: { content_hash: "sha384:9f2c...a17b", connector: sharepoint } warrant: { id: "wrt_...", alg: "ecdsa-p384-sha384", status: verified } classification: { sensitivity: internal, pii: none, language: en } entitlement: { licensed: false } rels: # typed edges, mirrored as plain links in the body - { type: grounded_in, target: "/policy/fair-lending/source.md" } - { type: depends_on, target: "/standards/mrm-std-02/source.md" }
Two design points matter here. The plain markdown links in the body remain the OKF relationships, untyped and conformant, so base consumers see the structure. The x-cks.rels block records the CKS edge type, and because the graph is rebuilt from the bundles, it is required on CKS-produced concepts so the typed graph can be reconstructed losslessly. Strip x-cks entirely and the bundle is still valid and portable; it simply loses the governance and the typed edges on the way out.
| x-cks field | Carries |
|---|---|
trust_tier | The source's tier, T1 to T4. |
validity | Score, as-of date, half-life, and decay date. |
provenance | SHA-384 content hash, source URI, and the connector. |
warrant | The Validity Warrant id, signing algorithm, and status. |
classification | Sensitivity, PII presence, and language. |
entitlement | Whether the concept is licensed, and its pack and release. |
rels | The CKS-typed relationships (grounded-in, depends-on, and so on). |
The source boundary
"Bundle per source" only works if "source" is bounded so a bundle is never too big or too abstract. CKS uses a three-level model, which is itself an extension of how a base OKF producer might think about granularity.
Collection
A repository, site, drive, SharePoint library, Confluence space, mailbox, or crawl. Never a single bundle; a parent directory with its own index.md linking to its member source bundles.
Source
The unit that gets a bundle: the smallest independently addressable, citable, versioned artefact (one file, page, item, or thread). Its source key is canonical URI plus content hash plus version, which is also the bundle identity and the idempotency key for re-ingestion.
Concept
Within a bundle, one markdown file per extracted claim or entity, plus the source itself as a concept. Each concept cites the source and carries its own trust and validity.
Guardrails keep a bundle right-sized: configurable caps on concept count and bytes (a default soft maximum of about 200 concepts and a few megabytes); very large documents split into section-level sub-sources along their natural headings; per-connector source-unit rules set in the recipe; and deterministic, pinned concept paths so a path survives re-ingestion and inbound cross-links stay resolvable. Removed concepts are tombstoned, not silently dropped.
Pipeline & round-trip
Governance fields such as trust, validity, and the warrant are only known after scoring, yet the graph must be populated before scoring can run on it. CKS resolves this with two phases over the same files.
Draft OKF at extraction
After collect, parse, and classify, the source is normalised into a draft bundle: concepts, bodies, citations, and provenance, but no trust, validity, or warrant yet. The draft is the standardised hand-off.
Load the graph from OKF
One loader reads the draft and writes nodes and edges into the retrieval graph and the Causal Substrate, recording each node's okf_concept_id so the graph and the bundle stay reconcilable.
Embed, score, warrant
The usual scoring and warranting run on the loaded graph.
Finalise the companion
The bundle is enriched in place with the x-cks governance block, log.md is appended, and the finalised bundle is published as the durable record and export. Re-validation later updates only the affected frontmatter and the log.
Round-trip: OKF as input
When the input is already an OKF bundle, CKS skips parse and extract entirely. It validates conformance, honours the declared okf_version (a newer minor version loads best-effort with a warning; only structural failure is hard-rejected), maps the bundle through the same loader (honouring x-cks if present, tolerating its absence), then scores, warrants, and finalises without discarding the producer's keys. A producer who already ships OKF (a data catalog, a partner, another CKS) is onboarded without a bespoke extractor, and evaluation fixtures can be authored as OKF for deterministic graph-build tests.
Licensed content
Licensed partner content is projected to OKF too, but its companions are entitlement-gated, mirroring the packaging and catalog subscription rules. Each licensed concept carries x-cks.entitlement = { licensed: true, pack, release }. Export releases a licensed bundle only to an entitled subscriber; a non-entitled caller receives the source metadata at most, never the licensed body. Licensed companion access is metered, and the same privacy-scrubbing discipline used for usage exports applies, so licensed bodies never leak into a non-entitled export.
Why we built CKS on OKF
Portability and interoperability
A vendor-neutral, plain-text record that outlives any single datastore and moves cleanly between environments, clouds, and tools. No lock-in to the graph engine.
Reproducibility and DR
The graph and indexes are regenerated deterministically by replaying the loader over the bundles. Disaster recovery is "restore the bundles, rebuild the projection", with no separate graph backup to keep authoritative.
Standardised, accelerated ingestion
One schema, one loader, one set of golden tests. A producer who already ships OKF is imported directly, so onboarding a new repository or partner is faster and less bespoke.
Inspectable and governable
Stewards and auditors get a readable, diffable, plain-text view of what the base knows, with trust, validity, and the warrant visible in the frontmatter.
Conformance & spec posture
CKS targets OKF v0.1 as published at GoogleCloudPlatform/knowledge-catalog, vendored read-only at a pinned commit SHA, not just the 0.1 string. The projector, loader, and conformance validator are written against the vendored copy; a standing task watches upstream, and any version bump is gated behind a diff review. Because the spec author states the repository is not an official Google product, the entire CKS-to-OKF mapping is isolated in one module, and both the version and the SHA are recorded in the bundle provenance.
Acceptance includes a base-conformance golden test: a CKS bundle with x-cks stripped must validate and load in a vanilla OKF reader. The conformance endpoint reports the declared and targeted versions and whether consumption is strict (matches the pinned target) or best-effort (a newer minor version).