Machine-readable schemas¶
Declarative JSON Schema (draft 2020-12) for the TOML
documents defined by SCHEMA.md, so external tools can validate them mechanically. These
complement the prose spec and the conformance fixtures (../tests/) — they do not replace
them: the prose remains normative, and several rules (resolution ladders, hash
reproduction, round-trip preservation) are behavioural and cannot be expressed in JSON
Schema.
TOML, validated as JSON¶
JSON Schema validates a JSON value. TOML maps cleanly onto the JSON data model, so the workflow is: parse the TOML to JSON, then validate the result against the relevant schema. Most validators do this for you, e.g.:
# check-jsonschema (pip install check-jsonschema) — reads TOML directly
check-jsonschema --schemafile schemas/manifest.v2.1.json datasets.toml
# or parse then validate with any JSON-Schema tool
python -c 'import tomllib,json,sys; json.dump(tomllib.load(open(sys.argv[1],"rb")),sys.stdout)' datasets.toml \
| ajv validate -s schemas/manifest.v2.1.json -d /dev/stdin
Files¶
| File | Validates | Capability |
|---|---|---|
manifest.v3.json |
the hand-authored manifest (datasets.toml) |
core |
state.v4.json |
the local state file (.datamanifest-state.toml) |
inspect / cache-produce |
config-sidecar.v3.json |
a produced artifact's config.toml (re-hashable key table) |
cache-produce |
metadata-sidecar.v3.json |
a produced artifact's metadata.toml (provenance) |
cache-produce |
The *.v2.1.json files are kept alongside for tools pinned to the earlier spec. spec-v4
simplifies storage to two folder fields — [_STORAGE].datasets_dir / datacache_dir
(relative ⇒ repo-relative, local by default) — plus optional read-pool lists
(datasets_pools / datacache_pools), reusable $-symbols
($user_data_dir / $user_cache_dir / $repo + user-defined) and _HOST host-overrides; a
dataset's storage_path replaces the former store / local_path. There is no scope, prefix, or
appname. state.v4.json validates the state file (_META.schema = 5): a git-ignored,
regenerable per-machine inventory of where each object actually landed — fetched datasets
under datasets (key ⇒ resolved storage_path + actual sha256) and produced artifacts
under datacache (cachetype[@version] ⇒ instances mapping a parameter hash to its
artifact directory). It supersedes the produced-only cached.toml index; the earlier
cached.v3.json (nested schema-2 cached.toml) is kept for tools that still read the legacy
shapes (_META.schema 1–4), which conforming readers migrate forward.
Versioning¶
Two version axes govern the format (see SCHEMA.md §Versioning), and they map onto these
files as follows:
- Filename carries the spec-document tag (
*.v2.1.json). The JSON Schema encodes prose-level structural rules — e.g. the shape of[_STORAGE]and the dataset fields, which is a spec-document concern, not a_META.schemachange. So the right axis to version a schema file by is the spec tag, and older versions stay alongside new ones (a tool pinned to an earlier spec keeps using its file). spec-v2.1 is structurally identical to spec-v2 (the v2.1 change was prose only); these files apply to both. _META.schemais asserted inside each schema asconst: 1(the data-model version). A file carrying a different_META.schemawill (correctly) fail to validate against a v2.x schema.
When a future spec tag changes structure, add new files (*.v4.json) next to these rather
than editing them in place. Earlier-version files (*.v1.1.json) may be backfilled.
Strictness notes¶
- Underscore-prefixed keys are preserved, not rejected. The spec requires readers to
preserve unknown
_*structural keys verbatim, so the schemas allow unknown_-prefixed keys (at the top level and inside a dataset table). - Unknown plain dataset fields are rejected. Within a dataset table, a non-
_key that is not a known contract field is flagged — it is almost always a typo (shar256). This is safe precisely because the file is version-pinned. - Behavioural rules are out of scope. Checksum verification, the fetch/load ladders,
hash reproduction from
config.toml, and lossless round-trip are verified by the prose spec and the fixture suite, not here.