Skip to content

Related projects

A few other open-source tools I maintain.

Scientific writing & data

  • texmark — write scientific articles in Markdown and convert them to journal-ready LaTeX/PDF.
  • papers — command-line BibTeX bibliography and PDF library manager.

Speech to Text (dictate) and Text to Speech (read-aloud) tools

  • scribe — speech-to-text dictation.
  • bard — text-to-speech reader.

Python alternatives

  • fatiando/pooch — the established tool for fetching and verifying data from Python code (it backs SciPy, scikit-image, and many others). datamanifest covers that ground and centers on three things Pooch doesn't aim for: an explicit, cross-language manifest file as the single source of truth; a CLI that manages the whole dataset lifecycle — add, verify, repair, sync — without touching code; and the @cached cache for your own computed results — orthogonal to fetching, but sharing the same storage and bookkeeping. Already using Pooch? datamanifest import pooch registry.txt --cache-dir "$(python -c 'import pooch; print(pooch.os_cache("yourpkg"))')" converts the registry and adopts your downloaded files in place (importing).
  • intake — catalog of data sources with drivers that load into pandas/xarray/dask; overlaps with the loader half of datamanifest.
  • cthoyt/pystow — lightweight reproducible download + cached storage with an OS-appropriate data dir; code-driven rather than manifest-driven.

Acknowledgments

datamanifest is a Python port of awi-esc/DataManifest.jl, written by the same author (Mahé Perrette). The Python port was implemented with assistance from Anthropic's Claude.