{"id":15662473,"url":"https://github.com/remram44/cdchunking-rs","last_synced_at":"2025-03-17T14:16:27.238Z","repository":{"id":57544348,"uuid":"99239075","full_name":"remram44/cdchunking-rs","owner":"remram44","description":"Content-Defined Chunking for Rust","archived":false,"fork":false,"pushed_at":"2024-12-17T19:46:28.000Z","size":45,"stargazers_count":18,"open_issues_count":4,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-10T20:16:15.143Z","etag":null,"topics":["chunk","chunking","rolling-hash-functions","rust"],"latest_commit_sha":null,"homepage":"https://remram44.github.io/cdchunking-rs/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/remram44.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-03T14:04:42.000Z","updated_at":"2024-12-17T19:46:37.000Z","dependencies_parsed_at":"2024-10-23T08:24:57.966Z","dependency_job_id":"cb24055c-2ec2-4969-ad91-ee19504657e2","html_url":"https://github.com/remram44/cdchunking-rs","commit_stats":{"total_commits":43,"total_committers":4,"mean_commits":10.75,"dds":"0.37209302325581395","last_synced_commit":"1224bc3f020d2e2b9ab5a78e650896ebcbabe67e"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/remram44%2Fcdchunking-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/remram44%2Fcdchunking-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/remram44%2Fcdchunking-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/remram44%2Fcdchunking-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/remram44","download_url":"https://codeload.github.com/remram44/cdchunking-rs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244047645,"owners_count":20389206,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chunk","chunking","rolling-hash-functions","rust"],"created_at":"2024-10-03T13:32:48.603Z","updated_at":"2025-03-17T14:16:27.214Z","avatar_url":"https://github.com/remram44.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://github.com/remram44/cdchunking-rs/workflows/Test/badge.svg)](https://github.com/remram44/cdchunking-rs/actions)\n[![Crates.io](https://img.shields.io/crates/v/cdchunking.svg)](https://crates.io/crates/cdchunking)\n[![Documentation](https://docs.rs/cdchunking/badge.svg)](https://docs.rs/cdchunking)\n[![Say Thanks!](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/remram44)\n\nContent-Defined Chunking\n========================\n\nThis crates provides a way to device a stream of bytes into chunks, using methods that choose the splitting point from the content itself. This means that adding or removing a few bytes in the stream would only change the chunks directly modified. This is different from just splitting every n bytes, because in that case every chunk is different unless the number of bytes changed is a multiple of n.\n\nContent-defined chunking is useful for data de-duplication. It is used in many backup software, and by the rsync data synchronization tool.\n\nThis crate exposes both easy-to-use methods, implementing the standard `Iterator` trait to iterate on chunks in an input stream, and efficient zero-allocation methods that reuse an internal buffer.\n\nUsing this crate\n----------------\n\nFirst, add a dependency on this crate by adding the following to your `Cargo.toml`:\n\n```\ncdchunking = 1.0\n```\n\nAnd your `lib.rs`:\n\n```rust\nextern crate cdchunking;\n```\n\nThen create a `Chunker` object using a specific method, for example the ZPAQ algorithm:\n\n```rust\nuse cdchunking::{Chunker, ZPAQ};\n\nlet chunker = Chunker::new(ZPAQ::new(13)); // 13 bits = 8 KiB block average\n```\n\nThere are multiple way to get chunks out of some input data.\n\n### From an in-memory buffer: iterate on slices\n\nIf your whole input data is in memory at once, you can use the `slices()` method. It will return an iterator on slices of this buffer, allowing to handle those chunks with no additional allocation.\n\n```rust\nfor slice in chunker.slices(data) {\n    println(\"{:?}\", slice);\n}\n```\n\n### From a file object: read chunks into memory\n\nIf you are reading from a file, or any object that implements `Read`, you can use `Chunker` to read whole chunks directly. Use the `whole_chunks()` method to get an iterator on chunks, read as new `Vec\u003cu8\u003e` objects.\n\n```rust\nfor chunk in chunker.whole_chunks(reader) {\n    let chunk = chunk.expect(\"Error reading from file\");\n    println!(\"{:?}\", chunk);\n}\n```\n\nYou can also read all the chunks from the file and collect them in a `Vec` (of `Vec`s) using the `all_chunks()` method. It will take care of the IO errors for you, returning an error if any of the chunks failed to read.\n\n```rust\nlet chunks: Vec\u003cVec\u003cu8\u003e\u003e = chunker.all_chunks(reader)\n    .expect(\"Error reading from file\");\nfor chunk in chunks {\n    println!(\"{:?}\", chunk);\n}\n```\n\n### From a file object: streaming chunks with zero allocation\n\nIf you are reading from a file to write to another, you might deem the allocation of intermediate `Vec` objects unnecessary. If you want, you can have `Chunker` provide you chunks data from the internal read buffer, without allocating anything else. In that case, note that a chunk might be split between multiple read operations. This method will work fine with any chunk sizes.\n\nUse the `stream()` method to do this. Note that because an internal buffer is reused, we cannot implement the `Iterator` trait, so you will have to use a while loop:\n\n```rust\nlet mut chunk_iterator = chunker.stream(reader);\nwhile let Some(chunk) = chunk_iterator.read() {\n    let chunk = chunk.unwrap();\n    match chunk {\n        ChunkInput::Data(d) =\u003e {\n            print!(\"{:?}, \", d);\n        }\n        ChunkInput::End =\u003e println!(\" end of chunk\"),\n    }\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fremram44%2Fcdchunking-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fremram44%2Fcdchunking-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fremram44%2Fcdchunking-rs/lists"}