{"id":18084985,"url":"https://github.com/jwodder/zarr-checksum-gallery","last_synced_at":"2025-10-27T08:51:21.100Z","repository":{"id":54913544,"uuid":"522724498","full_name":"jwodder/zarr-checksum-gallery","owner":"jwodder","description":"Various implementations of Dandi Zarr checksumming","archived":false,"fork":false,"pushed_at":"2025-07-28T07:11:02.000Z","size":481,"stargazers_count":1,"open_issues_count":10,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-28T09:11:23.781Z","etag":null,"topics":["benchmarking","dandi-zarr-checksum","implementation-comparison","rust","zarr"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jwodder.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-08-08T22:23:08.000Z","updated_at":"2025-07-28T07:11:07.000Z","dependencies_parsed_at":"2023-12-18T18:49:50.037Z","dependency_job_id":"f200237d-078c-4a3a-91ef-041e0660f6e1","html_url":"https://github.com/jwodder/zarr-checksum-gallery","commit_stats":{"total_commits":439,"total_committers":2,"mean_commits":219.5,"dds":"0.44419134396355353","last_synced_commit":"0fee6661af03e2b8b8c56ca25dc8527513000d24"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jwodder/zarr-checksum-gallery","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwodder%2Fzarr-checksum-gallery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwodder%2Fzarr-checksum-gallery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwodder%2Fzarr-checksum-gallery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwodder%2Fzarr-checksum-gallery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jwodder","download_url":"https://codeload.github.com/jwodder/zarr-checksum-gallery/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwodder%2Fzarr-checksum-gallery/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281241875,"owners_count":26467373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-27T02:00:05.855Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","dandi-zarr-checksum","implementation-comparison","rust","zarr"],"created_at":"2024-10-31T15:08:55.458Z","updated_at":"2025-10-27T08:51:21.085Z","avatar_url":"https://github.com/jwodder.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Project Status: Concept – Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept.](https://www.repostatus.org/badges/latest/concept.svg)](https://www.repostatus.org/#concept)\n[![CI Status](https://github.com/jwodder/zarr-checksum-gallery/actions/workflows/test.yml/badge.svg)](https://github.com/jwodder/zarr-checksum-gallery/actions/workflows/test.yml)\n[![codecov.io](https://codecov.io/gh/jwodder/zarr-checksum-gallery/branch/master/graph/badge.svg)](https://codecov.io/gh/jwodder/zarr-checksum-gallery)\n[![MIT License](https://img.shields.io/github/license/jwodder/zarr-checksum-gallery.svg)](https://opensource.org/licenses/MIT)\n\nThis is a Rust library \u0026 binary featuring a collection of various different\nways to implement a Merkle tree hash for a directory tree in the [format][1]\nused by [the DANDI project](https://github.com/dandi) for Zarr assets.  It was\nwritten partly in search of the most efficient implementation but mostly as\njust an exercise in Rust.\n\n[1]: https://github.com/dandi/dandi-archive/blob/master/doc/design/zarr-support-3.md#zarr-entry-checksum-format\n\nInstallation\n============\n\nRegardless of which installation method you choose, you need to first [install\nRust and Cargo](https://www.rust-lang.org/tools/install).\n\nTo install the `zarr-checksum-gallery` binary in `~/.cargo/bin`, run:\n\n    cargo install --git https://github.com/jwodder/zarr-checksum-gallery\n\nAlternatively, a binary localized to a clone of this repository can be built\nwith:\n\n    git clone https://github.com/jwodder/zarr-checksum-gallery\n    cd zarr-checksum-gallery\n    cargo build  # or `cargo build --release` to enable optimizations\n    # You can now run the binary with `cargo run -- \u003cargs\u003e` while in this\n    # repository.\n\n\nUsage\n=====\n\n    zarr-checksum-gallery [\u003cglobal options\u003e] \u003cimplementation\u003e [\u003coptions\u003e] \u003cdirpath\u003e\n\nor, if running a localized binary:\n\n    cargo run [--release] -- [\u003cglobal options\u003e] \u003cimplementation\u003e [\u003coptions\u003e] \u003cdirpath\u003e\n\n`zarr-checksum-gallery` computes the Zarr checksum for the directory at\n`\u003cdirpath\u003e` using the given `\u003cimplementation\u003e` (See list below).  Regardless of\nthe implementation chosen, the checksum should always be the same for the same\ndirectory contents \u0026 layout; if it is not, it is a bug.\n\nGlobal Options\n--------------\n\n- `--debug` — Show DEBUG log messages listing the checksum for each file \u0026\n  directory as it's computed.\n\n- `-E`/`--exclude-dotfiles` — Exclude the dotfiles \u0026 dot-directories `.dandi`,\n  `.datalad`, `.git`, `.gitattributes`, and `.gitmodules` from checksumming\n\n- `--trace` — Show TRACE log messages in addition to DEBUG messages.  Not all\n  implementations emit TRACE logs.\n\nImplementations\n---------------\n\n- `breadth-first` — Walk the directory tree iteratively \u0026 breadth-first,\n  building a tree of file checksums in memory\n\n- `collapsio-arc` — Walk the directory tree using multiple threads, computing\n  the checksum for each directory as soon as possible, with intermediate\n  results reported using shared memory\n\n  **Options:**\n\n    - `-t \u003cNUM\u003e`/`--threads \u003cNUM\u003e` — Set the number of threads to use.  The\n      default value is the number of logical CPU cores on the machine.\n\n- `collapsio-mpsc` — Walk the directory tree using multiple threads, computing\n  the checksum for each directory as soon as possible, with intermediate\n  results reported over synchronized channels\n\n  **Options:**\n\n    - `-t \u003cNUM\u003e`/`--threads \u003cNUM\u003e` — Set the number of threads to use.  The\n      default value is the number of logical CPU cores on the machine.\n\n- `depth-first` — Walk the directory tree iteratively \u0026 depth-first, computing\n  the checksum for each directory as soon as possible\n\n- `fastasync` — Walk the directory tree using multiple asynchronous worker\n  tasks, building a tree of file checksums in memory\n\n  **Options:**\n\n    - `-t \u003cNUM\u003e`/`--threads \u003cNUM\u003e` — Set the number of threads for the async\n      runtime to use.  A value of 1 means to run all tasks in the main thread.\n      The default value is the number of logical CPU cores on the machine.\n\n    - `-w \u003cNUM\u003e`/`--workers \u003cNUM\u003e` — Set the number of worker tasks to use.\n      The default value is the number of logical CPU cores on the machine.\n\n- `fastio` — Walk the directory tree using multiple threads, building a tree of\n  file checksums in memory\n\n  **Options:**\n\n    - `-t \u003cNUM\u003e`/`--threads \u003cNUM\u003e` — Set the number of threads to use.  The\n      default value is the number of logical CPU cores on the machine.\n\n- `recursive` — Walk the directory tree recursively and depth-first, computing\n  the checksum for each directory as soon as possible\n\n- `tree` — Like `fastio`, but instead of displaying only the final checksum,\n  shows a textual tree of the files \u0026 directories within the directory tree and\n  their corresponding checksums\n\n  **Options:**\n\n    - `-t \u003cNUM\u003e`/`--threads \u003cNUM\u003e` — Set the number of threads to use.  The\n      default value is the number of logical CPU cores on the machine.\n\n\nComparative Performance\n=======================\n\nTypical final output from a run of `time-all.sh` on a 1.59 GiB directory of\n7084 files:\n\n    collapsio-arc ran\n      1.06 ± 0.06 times faster than fastio\n      1.31 ± 0.14 times faster than collapsio-mpsc\n      6.18 ± 0.07 times faster than depth-first\n      6.28 ± 0.10 times faster than breadth-first\n      6.35 ± 0.22 times faster than recursive\n      6.41 ± 0.24 times faster than fastasync\n\nNote that the collapsio implementations should have some of the smallest memory\nfootprints, but this has not yet been tested.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwodder%2Fzarr-checksum-gallery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjwodder%2Fzarr-checksum-gallery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwodder%2Fzarr-checksum-gallery/lists"}