{"id":16014704,"url":"https://github.com/green-coder/cdc","last_synced_at":"2025-03-16T07:32:13.809Z","repository":{"id":11112092,"uuid":"68399229","full_name":"green-coder/cdc","owner":"green-coder","description":"A library for performing Content-Defined Chunking (CDC) on data streams.","archived":false,"fork":false,"pushed_at":"2023-03-08T06:01:57.000Z","size":29,"stargazers_count":24,"open_issues_count":5,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-27T05:50:56.614Z","etag":null,"topics":["cdc","data-stream","rust","rust-library"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/green-coder.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":"vincentcantin","tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2016-09-16T17:11:21.000Z","updated_at":"2025-02-20T22:24:34.000Z","dependencies_parsed_at":"2023-07-14T09:15:51.642Z","dependency_job_id":null,"html_url":"https://github.com/green-coder/cdc","commit_stats":{"total_commits":27,"total_committers":4,"mean_commits":6.75,"dds":0.2962962962962963,"last_synced_commit":"cc7ac65d211b518ef28a522f3327776f0a53a4bb"},"previous_names":["green-coder/cdc-rs"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/green-coder%2Fcdc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/green-coder%2Fcdc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/green-coder%2Fcdc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/green-coder%2Fcdc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/green-coder","download_url":"https://codeload.github.com/green-coder/cdc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243806046,"owners_count":20350775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdc","data-stream","rust","rust-library"],"created_at":"2024-10-08T15:04:49.965Z","updated_at":"2025-03-16T07:32:13.496Z","avatar_url":"https://github.com/green-coder.png","language":"Rust","readme":"cdc\n========\n\nA library for performing *Content-Defined Chunking* (CDC) on data streams. Implemented using generic iterators, very easy to use.\n\n- [API Documentation](https://docs.rs/cdc/)\n\n## Example\n\n```rust\n  let reader: BufReader\u003cFile\u003e = BufReader::new(file);\n  let byte_iter = reader.bytes().map(|b| b.unwrap());\n\n  // Finds and iterates on the separators.\n  for separator in SeparatorIter::new(byte_iter) {\n    println!(\"Index: {}, hash: {:016x}\", separator.index, separator.hash);\n  }\n```\n\nEach module is documented via an example which you can find in the `examples/` folder.\n\nTo run them, use a command like:\n\n    cargo run --example separator --release\n\n**Note:** Some examples are looking for a file named `myLargeFile.bin` which I didn't upload to Github. Please use your own files for testing.\n\n## What's in the crate\n\nFrom low level to high level:\n\n* A `RollingHash64` trait, for rolling hash with a 64 bits hash value.\n\n* `Rabin64`, an implementation of the Rabin Fingerprint rolling hash with a 64 bits hash value.\n\n* `Separator`, a struct which describes a place in a data stream identified as a separator.\n\n* `SeparatorIter`, an adaptor which takes an `Iterator\u003cItem=u8\u003e` as input and which enumerates all the separators found.\n\n* `Chunk`, a struct which describes a piece of the data stream (index and size).\n\n* `ChunkIter`, an adaptor which takes an `Iterator\u003cItem=Separator\u003e` as input and which enumerates chunks.\n\n## Implementation details\n\n* The library is not cutting any files, it only provides information on how to do it.\n\n* You can change the default window size used by `Rabin64`, and how the `SeparatorIter` is choosing the separator.\n\n* The design of this crate may be subject to changes sometime in the future. I am waiting for some features of `Rust` to mature up, specially the [`impl Trait`](https://github.com/rust-lang/rust/issues/34511) feature.\n\n## Performance\n\nThere is a **huge** difference between the debug build and the release build in terms of performance. Remember that when you test the lib, use `cargo run --release`.\n\nI may try to improve the performance of the lib at some point, but for now it is good enough for most usages.\n\n## License\n\nCoded with ❤️ , licensed under the terms of the [MIT license](LICENSE.txt).\n","funding_links":["https://ko-fi.com/vincentcantin"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreen-coder%2Fcdc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreen-coder%2Fcdc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreen-coder%2Fcdc/lists"}