{"id":50759396,"url":"https://github.com/perrette/datamanifest.toml","last_synced_at":"2026-06-11T08:31:00.323Z","repository":{"id":362068176,"uuid":"1256544464","full_name":"perrette/datamanifest.toml","owner":"perrette","description":"Shared TOML manifest schema for DataManifest.jl (Julia) and datamanifest (Python)","archived":false,"fork":false,"pushed_at":"2026-06-10T17:35:25.000Z","size":1338,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T19:15:04.290Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/perrette.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-01T22:01:22.000Z","updated_at":"2026-06-10T17:35:14.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/perrette/datamanifest.toml","commit_stats":null,"previous_names":["perrette/datamanifest.toml"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/perrette/datamanifest.toml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fdatamanifest.toml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fdatamanifest.toml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fdatamanifest.toml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fdatamanifest.toml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/perrette","download_url":"https://codeload.github.com/perrette/datamanifest.toml/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fdatamanifest.toml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34190582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-11T08:30:57.088Z","updated_at":"2026-06-11T08:31:00.317Z","avatar_url":"https://github.com/perrette.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"design/logo/lockup-dark.svg\"\u003e\n    \u003cimg src=\"design/logo/lockup.svg\" alt=\"datamanifest.toml\" height=\"76\"\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n[![docs](https://img.shields.io/badge/docs-perrette.github.io%2Fdatamanifest.toml-blue)](https://perrette.github.io/datamanifest.toml/)\n[![spec](https://img.shields.io/badge/spec-spec--v5-informational)](https://perrette.github.io/datamanifest.toml/schema/)\n[![license](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n\nA small, normative specification for the **`datamanifest.toml`** manifest format — a\nTOML file that declares the data dependencies of a scientific project (each dataset's\nsource URI, checksum, version, format, and how to fetch and load it).\n\nOne `datasets.toml` is read by tools in different languages — today\n[Python](https://github.com/perrette/datamanifest) and\n[Julia](https://github.com/awi-esc/DataManifest.jl) — and covers fetching (download,\nchecksum, extract, load), portable storage, per-language bindings, and an optional\nproduce-or-load cache layer. The data model is `_META.schema = 1`; behavioural revisions\nare tracked by spec tags (currently `spec-v5`).\n\n\u003c!-- intro-start --\u003e\n- **One manifest, many languages.** A single `datasets.toml` declares each dataset's\n  source, checksum, format, and how to fetch and load it — and the same file is read\n  unchanged by tools in [Python](https://github.com/perrette/datamanifest) and\n  [Julia](https://github.com/awi-esc/DataManifest.jl).\n- **Fetch, verify, extract, load.** A tool downloads the dataset, verifies its checksum,\n  unpacks the archive, and hands your code the local path — re-fetching only when it's\n  missing. Add a `format` and it loads the data into a native object too.\n- **Portable, shared-by-default storage.** Fetched datasets live in one machine-global\n  keyed store (deduplicated across projects), the produced cache is per-project, and\n  per-machine layouts go in git-ignored config files or `[_STORAGE._HOST]` glob rules —\n  the repo itself stays data-free (repo-local folders are one edit away).\n- **Produce-or-load caching.** An optional companion layer keys produced artifacts by a\n  hash of their parameters, so derived data is rebuilt only when its inputs change.\n- **Normative and conformance-tested.** The prose spec is the source of truth, backed by\n  machine-readable JSON Schemas and a shared fixture suite both implementations run.\n\u003c!-- intro-end --\u003e\n\n## 📖 Documentation\n\nFull documentation lives at **\u003chttps://perrette.github.io/datamanifest.toml/\u003e**:\n\n- [Quickstart](https://perrette.github.io/datamanifest.toml/quickstart/)\n- Guide: [the manifest in one minute](https://perrette.github.io/datamanifest.toml/guide/manifest/),\n  [declaring datasets](https://perrette.github.io/datamanifest.toml/guide/datasets/),\n  [language bindings](https://perrette.github.io/datamanifest.toml/guide/bindings/),\n  [resolution](https://perrette.github.io/datamanifest.toml/guide/resolution/),\n  [storage](https://perrette.github.io/datamanifest.toml/guide/storage/),\n  [caching](https://perrette.github.io/datamanifest.toml/guide/caching/),\n  [maintenance](https://perrette.github.io/datamanifest.toml/guide/maintenance/),\n  [sync](https://perrette.github.io/datamanifest.toml/guide/sync/),\n  [conformance](https://perrette.github.io/datamanifest.toml/guide/conformance/),\n  [migration](https://perrette.github.io/datamanifest.toml/guide/migration/)\n- [Schema specification](https://perrette.github.io/datamanifest.toml/schema/) (the normative `SCHEMA.md`)\n- [JSON Schemas](https://perrette.github.io/datamanifest.toml/schemas/) ·\n  [Examples](https://perrette.github.io/datamanifest.toml/examples/) ·\n  [Conformance fixtures](https://perrette.github.io/datamanifest.toml/fixtures/)\n- [Roadmap](https://perrette.github.io/datamanifest.toml/roadmap/) ·\n  [Changelog](https://perrette.github.io/datamanifest.toml/changelog/)\n\n## Quick look\n\nDeclare a dataset — its source and checksum — in `datasets.toml`:\n\n```toml\n[\"jesstierney/lgmDA\"]\nuri      = \"https://github.com/jesstierney/lgmDA/archive/refs/tags/v2.1.zip\"\nchecksum = \"sha256:da5f85235baf7f858f1b52ed73405f5d4ed28a8f6da92e16070f86b724d8bb25\"\nextract  = true\n```\n\nA tool downloads it, verifies the checksum, unpacks the archive, and hands your code the\nlocal path — re-fetching only when it's missing. Add a `format` and it loads the data into a\nnative object too; the same file is read unchanged by tools in different languages. The full,\nrunnable manifest is at\n[`examples/datasets.toml`](https://github.com/perrette/datamanifest.toml/blob/main/examples/datasets.toml),\nand the [quickstart](https://perrette.github.io/datamanifest.toml/quickstart/) walks through a\nfuller example.\n\n## Implementations\n\nThe Python package [`perrette/datamanifest`](https://github.com/perrette/datamanifest) is\nthe **reference implementation** and ships the `datamanifest` command-line tool. A Julia\nport, [`DataManifest.jl`](https://github.com/awi-esc/DataManifest.jl), tracks the same spec\nand shares the conformance fixtures\n([`tests/fixtures/`](https://github.com/perrette/datamanifest.toml/tree/main/tests/fixtures)),\nso both read the same `datamanifest.toml`.\n\n| Language | Repository | Description |\n|---|---|---|\n| Python *(reference)* | [perrette/datamanifest](https://github.com/perrette/datamanifest) | **The reference implementation.** Download, verify, extract, and load datasets declared in a manifest; uses entry-point loader references instead of inline code execution. Provides the **`datamanifest` command-line tool**. |\n| Julia | [awi-esc/DataManifest.jl](https://github.com/awi-esc/DataManifest.jl) | Download, verify, extract, and load datasets declared in a manifest, with a Julia-native API. |\n\n## From the same author\n\nA few other open-source tools I maintain.\n\n**Scientific writing \u0026 data**\n\n- [**texmark**](https://perrette.github.io/texmark/) — write scientific articles in Markdown and convert them to journal-ready LaTeX/PDF.\n- [**papers**](https://perrette.github.io/papers/) — command-line BibTeX bibliography and PDF library manager.\n- [**datamanifest**](https://perrette.github.io/datamanifest/) — declarative, reproducible dataset management. *(See also the [DataManifest.jl](https://awi-esc.github.io/DataManifest.jl/) Julia port.)*\n\n**Speech to Text (dictate) and Text to Speech (read-aloud) tools**\n\n- [**scribe**](https://perrette.github.io/scribe/) — speech-to-text dictation.\n- [**bard**](https://perrette.github.io/bard/) — text-to-speech reader.\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperrette%2Fdatamanifest.toml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fperrette%2Fdatamanifest.toml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperrette%2Fdatamanifest.toml/lists"}