Skip to content

Importing from other tools

add takes a reference to data; import ingests another tool's catalog. Both end at standard manifest entries, and already-downloaded files are adopted in place (checksum-verified) — no re-download:

datamanifest import pooch registry.txt --base-url URL --cache-dir DIR   # adopts pooch's cache
datamanifest import csv files.csv                     # a name,url,sha256 table
datamanifest import urls list.txt --base-url URL      # a plain list of URLs
datamanifest import intake catalog.yml                # an intake catalog ([yaml] extra)
datamanifest import dvc path-or-dir                   # *.dvc / dvc.lock (+ .dvc/cache)

Per-source detail, and the add-side sources (direct URLs, Zenodo/figshare/OSF DOIs, Git LFS pointers), are on the adding datasets page.

Migrating from Pooch

Already using Pooch? Convert the registry and adopt your downloaded files in place:

datamanifest import pooch registry.txt \
  --cache-dir "$(python -c 'import pooch; print(pooch.os_cache("yourpkg"))')"

datamanifest covers the same fetch-and-verify ground and adds an explicit, cross-language manifest file, a full dataset-lifecycle CLI, and the @cached cache for your own computed results. See related projects for how it compares to Pooch, intake and pystow.