Skip to content

datamanifest.toml

datamanifest.toml

A small, normative specification for a TOML file that declares the data dependencies of a scientific project — read by tools in different languages.

  • One manifest, many languages. A single datasets.toml declares each dataset's source, checksum, format, and how to fetch and load it — and the same file is read unchanged by tools in Python and Julia.
  • Fetch, verify, extract, load. A tool downloads the dataset, verifies its checksum, unpacks the archive, and hands your code the local path — re-fetching only when it's missing. Add a format and it loads the data into a native object too.
  • Portable, local-by-default storage. Fetched datasets and produced artifacts live in repo-relative folders by default, and can be centralized per host via [_STORAGE._HOST] glob rules without touching the rest of the manifest.
  • Produce-or-load caching. An optional companion layer keys produced artifacts by a hash of their parameters, so derived data is rebuilt only when its inputs change.
  • Normative and conformance-tested. The prose spec is the source of truth, backed by machine-readable JSON Schemas and a shared fixture suite both implementations run.

Get started

# datasets.toml
["jesstierney/lgmDA"]
uri     = "https://github.com/jesstierney/lgmDA/archive/refs/tags/v2.1.zip"
sha256  = "da5f85235baf7f858f1b52ed73405f5d4ed28a8f6da92e16070f86b724d8bb25"
extract = true
  • Quickstart — the manifest in one minute, declaring datasets.
  • Language bindingsfetcher/loader references, per language.
  • Storage — where fetched datasets and the produced cache live.
  • Schema spec — the full normative reference.

Guide

Reference

From the same author

A few other open-source tools I maintain.

Scientific writing & data

  • texmark — write scientific articles in Markdown and convert them to journal-ready LaTeX/PDF.
  • papers — command-line BibTeX bibliography and PDF library manager.
  • datamanifest — declarative, reproducible dataset management. (See also the DataManifest.jl Julia port.)

Speech to Text (dictate) and Text to Speech (read-aloud) tools

  • scribe — speech-to-text dictation.
  • bard — text-to-speech reader.