{"id":29227151,"url":"https://github.com/royerlab/czpeedy","last_synced_at":"2025-07-03T09:10:05.776Z","repository":{"id":249138442,"uuid":"822809325","full_name":"royerlab/czpeedy","owner":"royerlab","description":"A tool for experimentally determining the best tensorstore spec for a given machine and dataset.","archived":false,"fork":false,"pushed_at":"2024-12-11T19:29:05.000Z","size":516,"stargazers_count":4,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-24T11:40:30.153Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/royerlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-01T21:50:42.000Z","updated_at":"2025-03-27T04:51:43.000Z","dependencies_parsed_at":"2024-12-11T20:24:49.702Z","dependency_job_id":"e129f387-dde6-41e8-9443-f178d9217902","html_url":"https://github.com/royerlab/czpeedy","commit_stats":null,"previous_names":["royerlab/czpeedy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/royerlab/czpeedy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fczpeedy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fczpeedy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fczpeedy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fczpeedy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/royerlab","download_url":"https://codeload.github.com/royerlab/czpeedy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fczpeedy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262365999,"owners_count":23299751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-03T09:10:04.418Z","updated_at":"2025-07-03T09:10:05.736Z","avatar_url":"https://github.com/royerlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/royerlab/czpeedy/raw/main/images/logo.png\" width=\"150\" alt=\"Czpeedy logo\"\u003e\n\u003c/p\u003e\n\n\n# czpeedy - `tensorstore` Profiling Tool\n`czpeedy` (pronounced 'speedy') is a command-line tool used to determine the [`tensorstore`](https://github.com/google/tensorstore/) settings\n(called a 'spec') which yield the fastest write speed on a given machine. For example, on some systems, it is faster\nto compress data on the cpu before writing it to the comparatively slow drives. On some systems, this is not the case - to\nknow which is best for you, you need to perform a benchmark using the real system and the real data.\n\n`czpeedy` can be configured with each value that you might sensibly try - compression level, codec, chunk size (important\nfor sequential write performance), endianness, and other properties. Then, it loads real data from your machine, and writes it\nto disk using each possible combination of those parameters (which can easily be in the thousands). At the end, the average speed,\nstandard deviation, and fastest settings will be reported.\n\n## Screenshots\n![A screenshot of the terminal output created by `czpeedy`.](images/term_screenshot.png)\n(Full log ommitted for brevity)\n![A screenshot of the result summary created by `czpeedy`.](images/term_screenshot_2.png)\n\n## Installation\n`czpeedy` can be installed via pip for end use, or managed with [`rye`](https://rye.astral.sh/) if you are developing for it. For most users,\nwe reccommend running\n`pip install czpeedy`\nand then following the usage instructions below. To use rye, [install it](https://rye.astral.sh/) and then use `rye run czpeedy` instead of `czpeedy`\nin your shell.\n\n## Usage\nThe most basic use of `czpeedy` is a write test over the entire default test space:\n`czpeedy /path/to/input/file.raw --dest /path/to/output/directory --shape 1920x1080x512`\nIf you're willing to wait a long time (depending on the input size and drive speeds, ~a day), this command will work fine and try a\nwide range of reasonable parameters. This is not an exhaustive search of all possible parameters - for example, the compression levels\ntested by default max out at 5. This is because write speed usually gets bottlenecked by the cpu when high compressions are used,\nso there is rarely a point in testing above 5.\n\nIf you want to specify other parameters, more cli arguments can be passed to restrict (or expand) the test\nspace. With the exception of `shape` and `dtype`, all parameter space adjustments can specify multiple values\nby separating them with commas.\n\nFor example, if your drives are extremely slow and your cpu is extremely fast, you might want to try higher compression levels than\nthe defaults. To try just clevel 9, you could run:\n`czpeedy /path/to/input/file.raw --dest /path/to/output/directory --shape 1920x1080x512 --clevel 9`\n\nIf you found that 9 was too high, you could specify a few more options and find the best performance as such:\n`czpeedy /path/to/input/file.raw --dest /path/to/output/directory --shape 1920x1080x512 --clevel 6,7,8`\n\nTo see what other parameters you can set, invoke `czpeedy --help`. As of July 2024, it outputs the following:\n```text\nusage: main.py [-h] [--dest DEST] [--savecsv SAVECSV] [--repetitions REPETITIONS] [--dtype DTYPE] [--shape SHAPE] [--clevel CLEVEL] [--compressor COMPRESSOR] [--shuffle SHUFFLE] [--chunk-size CHUNK_SIZE] [--endianness ENDIANNESS] source\n\npositional arguments:\n  source                The input dataset used in benchmarking. If write benchmarking, this is the data that will be written to disk.\n\noptions:\n  -h, --help            show this help message and exit\n  --dest DEST           The destination where write testing will occur. A directory will be created inside, called 'czpeedy'. Each write test will delete and recreate the `czpeedy` folder.\n  --savecsv SAVECSV     The destination to save test results to in csv format. Will overwrite the named file if it exists already.\n  --repetitions REPETITIONS\n                        The number of times to test each configuration. This increases confidence that speeds are repeatable, but takes a while. (default: 3)\n  --dtype DTYPE         If your data source is a raw numpy array dump, you must provide its dtype (i.e. --dtype uint32)\n  --shape SHAPE         If your data source is a raw numpy array dump, you must provide the shape (i.e. --shape 1920x1080x1024). Ignored if the data source has a shape in its metadata.\n  --clevel CLEVEL       The endianness you want to write your data as (can be big, little, or none). \"none\" is only an acceptable endianness if the dtype is 1 byte.\n  --compressor COMPRESSOR\n                        The compressor id you want to use with blosc. Valid compressors: blosclz, lz4, lz4hc, snappy, zlib, zstd.\n  --shuffle SHUFFLE     The shuffle mode you want to use with blosc compression. Valid shuffle types: auto, none, byte, bit\n  --chunk-size CHUNK_SIZE\n                        The chunk size that tensorstore should use when writing data. i.e. --chunk-size 100x100x100. Must have the same number of dimensions as the source data.\n  --endianness ENDIANNESS\n                        The endianness you want to write your data as (can be big, little, or none). \"none\" is only an acceptable endianness if the dtype is 1 byte.\n  --zarr-version ZARR_VERSION\n                        The version of zarr to use. (Supported: 2, 3.)\n```\n\n### Shape \u0026 Chunk Sizes\nIf your input source does not have included metadata about its shape (i.e. a raw numpy byte dump - as of July 2024\nthis is the only input type supported), you must specify the input shape using the `--shape` flag. It accepts\nan `x`-delimited list of integers that match the ndarray shape. For example, if your input should have shape\n`[100, 200, 300]`, then pass the argument `--shape 100x200x300`.\n\nBecause many `tensorstore` drivers use chunking, chunk shapes must be available to `czpeedy`. By default, `czpeedy` will\nautomatically compute a few reasonable chunk sizes with different volumes - higher chunk volumes are usually desirable\nto take advantage of your disk's sequential write speed. The suggested chunk sizes will be printed out at the beginning\nof program execution.\n\nIf you want to specify your own chunk sizes, you can do so using the x-delimited format described above. Additionally,\nas you may want to try multiple shapes to find the best balance between compression and write speed, you can provide a\ncomma delimited list of chunk sizes. For example, to benchmark the chunk shape `[100, 200, 300]` against the chunk shape\n`[400, 500, 600]`, you would specify `--chunk-size 100x200x300,400x500x600`. This is a bit hard to read, but is a simple\nformat to work with on the command line.\n\n### Saving the Benchmark\nBy default, czpeedy just prints the results as they arrive, then prints details of the fastest 3 configurations\nat the end of the benchmark. Because benchmarks can take a long time and may need to be interrupted, it is\nhighly reccommended to provide an output filepath where czpeedy can save its results as a CSV. That way,\nyou can easily analyze all the test data later, and `ctrl-c` without fear of losing hours of benchmark data.\nTo specify the output file, use `--savecsv /path/to/output.csv`.\n\n### Test Repetitions\nDisk benchmarks usually vary quite a bit between subsequent runs. To make your results more certain, `czpeedy` performs\n3 repetitions of each trial it performs by default, computing the mean time and standard deviation as it goes. To use\nmore trials (for better data) or less trials (for faster results), use the `--repetitions` flag.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyerlab%2Fczpeedy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froyerlab%2Fczpeedy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyerlab%2Fczpeedy/lists"}