Per-language bindings (_LANG)¶
Language-specific bindings live in a dedicated _LANG namespace, so a single
manifest can serve multiple language implementations without conflicts. The
use cases page shows the short
version; this page is the full behaviour: the fetch/load ladders, the bare
(language-implicit) forms, parameterized bindings, cross-language fetch, and
the legacy fields still accepted on read.
[mydata._LANG.python]
fetcher = "mypkg.fetch:download_mydata" # entry-point ref; resolved via importlib
loader = "mypkg.load:load_mydata"
[_LANG.python.loaders] # project-wide format → loader defaults
csv = "pandas.io.parsers:read_csv" # string form (a bare module:function ref)
nc = { ref = "myclimate.loaders:load_nc", kwargs = { decode_times = false } } # table form
[mydata._LANG.julia]
fetcher = "MyPkg.fetch_mydata" # preserved verbatim; Python never touches it
Foreign _LANG.<other> subtrees (e.g. _LANG.julia) are preserved verbatim on
every read→write cycle; Python never modifies them. Unknown structural tables
(any _* key Python does not recognise) are similarly passed through.
The ladders¶
Fetch ladder (per dataset, in order):
- Own Python fetcher — explicit
_LANG.python.fetcher, else the barefetcher, else legacypython= - Bare
shellcommand template (else legacy_LANG.shell.fetcher) - Cross-language fetch — run a fetcher defined in another language
- Plain
uridownload - Error — no source available
Load ladder (per dataset, in order):
- Own Python loader — explicit
_LANG.python.loader, else the bareloader - Manifest format default —
[_LANG.python.loaders][format], else the bare[_LOADERS][format]map - Built-in format default (csv, parquet, nc, …)
- Error
At every own-language rung the explicit _LANG.python binding wins over the
bare one. A binding that is present for the running language — bare or
explicit _LANG.python — is fail-loud: if it fails to resolve it is an
error, and if it resolves and then raises the error propagates — never a silent
fall-through to a different loader/fetcher. The ladder falls through only to
skip rungs that are absent (another language's _LANG.<other> binding, or
no own loader). A manifest meant for more than one language uses explicit
[<ds>._LANG.<lang>] bindings (absent, and so correctly skipped, in the
others).
Language-implicit (bare) bindings¶
For a single-language project the [<ds>._LANG.<lang>] wrapper is needless
ceremony. A dataset may instead carry a bare fetcher/loader directly,
and a top-level [_LOADERS] table may carry a bare format → binding map — all
read as bindings in the running tool's own language (here, Python):
[_LOADERS] # language-implicit format → loader defaults
csv = "myproject.io:read_csv"
nc = "myproject.io:read_nc"
[temperature]
uri = "https://example.com/temperature.csv"
format = "csv"
loader = "myproject.loaders:load_temperature" # bare per-dataset loader
[derived]
format = "nc"
fetcher = "myproject.build:derived" # bare per-dataset fetcher (no uri)
[model_output] # bare, language-agnostic shell fetcher
format = "nc"
shell = "make model_output OUTPUT=$download_path" # same command for every tool
The bare shell field is the canonical, language-agnostic shell fetcher
(the same command for every tool — not a _LANG tag); the legacy
[<ds>._LANG.shell].fetcher is still read and preserved as the fallback. Bare
bindings are kept bare on write (never promoted into _LANG.python), so a
hand-authored single-language manifest round-trips unchanged.
A full, runnable example manifest — bare loaders/fetchers, a parameterized
loader, the bare shell fetcher — lives in the spec's
examples.
Parameterized bindings¶
A binding (a fetcher, a loader, or an entry in the [_LANG.python.loaders]
map) may be a { ref, args, kwargs } table instead of a plain string, so one
entry-point can be reused across datasets that differ only in arguments:
[esm_5x5._LANG.python.loader]
ref = "myclimate.loaders:load_esm"
args = ["$path"] # positional, in order
kwargs = { grid = "5x5", skip_models = ["CESM.*"] } # keyword
[esm_10x10._LANG.python.loader]
ref = "myclimate.loaders:load_esm"
args = ["$path"]
kwargs = { grid = "10x10" }
String values in args and kwargs undergo $var substitution before the
call. Available variables: $download_path (fetcher), $path (loader),
$key, $version, $doi, $format, $branch, $uri, $project_root.
The two forms are interchangeable at every binding site — explicit
[<ds>._LANG.python] fetcher/loader, the language-implicit bare
fetcher/loader, and the project-wide [_LANG.python.loaders] / bare
[_LOADERS] defaults. (The shell field is a separate command-template
string, not a module:function binding, so it is always a string, never a
table.) A bare string "module:function" is the alias for
{ ref = "module:function" } and makes the conventional call (a loader gets
the dataset path; a fetcher the standard context). Canonical writing: a binding
with no args/kwargs is written as the string, one that carries them as
the table.
Cross-language fetch¶
The rare case: a dataset whose only fetcher is defined in another language
(e.g. [<ds>._LANG.julia].fetcher), with no native Python fetcher, no shell
fetcher, and no uri. Python materializes it by invoking the local Julia
DataManifest environment directly —
julia --project=<env> -e 'using DataManifest; download_dataset(Database("<datasets.toml>"), "<name>")'
— which writes the bytes into the shared store; Python then reads them from
disk (load never crosses languages, only bytes do).
The Julia env is discovered by walking up from the manifest directory (or
$JULIA_PROJECT) for a Project.toml whose [deps] lists DataManifest, and
the rung is gated on julia being on PATH. When the toolchain is absent the
rung logs a warning and skips, and the ladder advances to the uri
download. Cross-language fetch applies to fetched datasets only (never
@cached produced datasets); it is on by default and probe-gated (a no-op
unless a foreign fetcher and a usable Julia env are both present). Toggle it
per file with delegate = false, or per run with the --delegate /
--no-delegate flags on datamanifest download.
Legacy fields¶
Still accepted on read; only these are deprecated:
python=(orcallable=) — entry-point reference ("pkg.mod:func") resolved viaimportlib. The callable receives keyword arguments(download_path, project_root, entry, uri, key, version, doi, format, branch, requires_paths). No inline code execution (exec/eval) anywhere. Equivalent to[<ds>._LANG.python].fetcher.[<ds>._LANG.shell].fetcher— the legacy shell fetcher; read as the fallback for the canonical bareshell.python_includes=— list of directory paths prepended tosys.pathduring ref resolution (obsolete; the project root is auto-added).
A single manifest can be consumed by several tools: each reads the common fields and ignores the other's extension keys. See conformance.md for the shared manifest format and what this implementation supports.