{"id":13636721,"url":"https://github.com/sstadick/gzp","last_synced_at":"2025-05-15T11:03:32.307Z","repository":{"id":37923361,"uuid":"389688240","full_name":"sstadick/gzp","owner":"sstadick","description":"Multi-threaded Compression","archived":false,"fork":false,"pushed_at":"2025-02-28T19:40:55.000Z","size":2755,"stargazers_count":158,"open_issues_count":18,"forks_count":15,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-04T19:57:18.873Z","etag":null,"topics":["compression","gzip","parallel","snappy"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sstadick.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-MIT","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-26T15:51:56.000Z","updated_at":"2025-02-28T19:40:58.000Z","dependencies_parsed_at":"2024-06-12T17:30:29.493Z","dependency_job_id":"ab1ba030-f663-433b-91c2-be614b169af5","html_url":"https://github.com/sstadick/gzp","commit_stats":{"total_commits":95,"total_committers":5,"mean_commits":19.0,"dds":"0.052631578947368474","last_synced_commit":"204205d3fd061cc453af65c3bbbf7e3dcd2a0c38"},"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sstadick%2Fgzp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sstadick%2Fgzp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sstadick%2Fgzp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sstadick%2Fgzp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sstadick","download_url":"https://codeload.github.com/sstadick/gzp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247242650,"owners_count":20907130,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","gzip","parallel","snappy"],"created_at":"2024-08-02T00:01:04.433Z","updated_at":"2025-04-07T14:01:34.435Z","avatar_url":"https://github.com/sstadick.png","language":"Rust","readme":"# ⛓️gzp\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/sstadick/gzp/actions?query=workflow%3Aci\"\u003e\u003cimg src=\"https://github.com/sstadick/gzp/workflows/ci/badge.svg\" alt=\"Build Status\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/crates/l/gzp.svg\" alt=\"license\"\u003e\n  \u003ca href=\"https://crates.io/crates/gzp\"\u003e\u003cimg src=\"https://img.shields.io/crates/v/gzp.svg?colorB=319e8c\" alt=\"Version info\"\u003e\u003c/a\u003e\u003cbr\u003e\n\u003c/p\u003e\n\nMulti-threaded encoding and decoding.\n\n## Why?\n\nThis crate provides a near drop in replacement for `Write` that has will compress chunks of data in parallel and write\nto an underlying writer in the same order that the bytes were handed to the writer. This allows for much faster\ncompression of data.\n\nAdditionally, this provides multi-threaded decompressors for Mgzip and BGZF formats.\n\n### Supported Encodings:\n\n- Gzip via [flate2](https://docs.rs/flate2/)\n- Zlib via [flate2](https://docs.rs/flate2/)\n- Raw Deflate via [flate2](https://docs.rs/flate2/)\n- Snappy via [rust-snappy](https://docs.rs/snap)\n- [BGZF](https://samtools.github.io/hts-specs/SAMv1.pdf) block compression format limited to 64 Kb blocks\n- [Mgzip](https://pypi.org/project/mgzip/) block compression format with no block size limit\n\n## Usage / Features\n\nBy default `gzp` has the `deflate_default` and `libdeflate` features enabled which brings in the best performing `zlib`\nimplementation as the backend for `flate2` as well as `libdeflater` for the block gzip formats.\n\n### Examples\n\n- Deflate default\n\n```toml\n[dependencies]\ngzp = { version = \"*\" }\n```\n\n- Rust backend, this means that the `Zlib` format will not be available.\n\n```toml\n[dependencies]\ngzp = { version = \"*\", default-features = false, features = [\"deflate_rust\"] }\n```\n\n- Snap only\n\n```toml\n[dependencies]\ngzp = { version = \"*\", default-features = false, features = [\"snap_default\"] }\n```\n\n**Note**: if you are running into compilation issues with libdeflate and the `i686-pc-windows-msvc` target, please see [this](https://github.com/sstadick/gzp/issues/18) issue for workarounds.\n\n## Examples\n\nSimple example\n\n```rust\nuse std::{env, fs::File, io::Write};\n\nuse gzp::{deflate::Gzip, ZBuilder, ZWriter};\n\nfn main() {\n    let mut writer = vec![];\n    // ZBuilder will return a trait object that transparent over `ParZ` or `SyncZ`\n    let mut parz = ZBuilder::\u003cGzip, _\u003e::new()\n        .num_threads(0)\n        .from_writer(writer);\n    parz.write_all(b\"This is a first test line\\n\").unwrap();\n    parz.write_all(b\"This is a second test line\\n\").unwrap();\n    parz.finish().unwrap();\n}\n```\n\nAn updated version of [pgz](https://github.com/vorner/pgz).\n\n```rust\nuse gzp::{\n    ZWriter,\n    deflate::Mgzip,\n    par::{compress::{ParCompress, ParCompressBuilder}}\n};\nuse std::io::{Read, Write};\n\nfn main() {\n    let chunksize = 64 * (1 \u003c\u003c 10) * 2;\n\n    let stdout = std::io::stdout();\n    let mut writer: ParCompress\u003cMgzip\u003e = ParCompressBuilder::new().from_writer(stdout);\n\n    let stdin = std::io::stdin();\n    let mut stdin = stdin.lock();\n\n    let mut buffer = Vec::with_capacity(chunksize);\n    loop {\n        let mut limit = (\u0026mut stdin).take(chunksize as u64);\n        limit.read_to_end(\u0026mut buffer).unwrap();\n        if buffer.is_empty() {\n            break;\n        }\n        writer.write_all(\u0026buffer).unwrap();\n        buffer.clear();\n    }\n    writer.finish().unwrap();\n}\n```\n\nSame thing but using Snappy instead.\n\n```rust\nuse gzp::{parz::{ParZ, ParZBuilder}, snap::Snap};\nuse std::io::{Read, Write};\n\nfn main() {\n    let chunksize = 64 * (1 \u003c\u003c 10) * 2;\n\n    let stdout = std::io::stdout();\n    let mut writer: ParZ\u003cSnap\u003e = ParZBuilder::new().from_writer(stdout);\n\n    let stdin = std::io::stdin();\n    let mut stdin = stdin.lock();\n\n    let mut buffer = Vec::with_capacity(chunksize);\n    loop {\n        let mut limit = (\u0026mut stdin).take(chunksize as u64);\n        limit.read_to_end(\u0026mut buffer).unwrap();\n        if buffer.is_empty() {\n            break;\n        }\n        writer.write_all(\u0026buffer).unwrap();\n        buffer.clear();\n    }\n    writer.finish().unwrap();\n}\n```\n\n## Acknowledgements\n\n- Many of the ideas for this crate were directly inspired by [`pigz`](https://github.com/madler/pigz), including\n  implementation details for some functions.\n\n## Contributing\n\nPRs are very welcome! Please run tests locally and ensure they are passing. May tests are ignored in CI because the CI\ninstances don't have enough threads to test them / are too slow.\n\n```bash\ncargo test --all-features \u0026\u0026 cargo test --all-features -- --ignored\n```\n\nNote that tests will take 30-60s.\n\n## Future todos\n\n- Pull in an adler crate to replace zlib impl (need one that can combine values, probably implement COMB from pigz).\n- Add more metadata to the headers\n- Return an auto-generated index for BGZF / Mgzip formats\n- Try with https://docs.rs/lzzzz/0.8.0/lzzzz/lz4_hc/fn.compress.html\n\n## Benchmarks\n\nAll benchmarks were run on the file in `./bench-data/shakespeare.txt` catted together 100 times which creates a rough\n550Mb file.\n\nThe primary benchmark takeaway is that compression time decreases proportionately to the number of threads used.\n\n![benchmarks](./violin.svg)\n","funding_links":[],"categories":["Libraries"],"sub_categories":["Compression"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsstadick%2Fgzp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsstadick%2Fgzp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsstadick%2Fgzp/lists"}