https://github.com/green-coder/cdc
A library for performing Content-Defined Chunking (CDC) on data streams.
https://github.com/green-coder/cdc
cdc data-stream rust rust-library
Last synced: 10 months ago
JSON representation
A library for performing Content-Defined Chunking (CDC) on data streams.
- Host: GitHub
- URL: https://github.com/green-coder/cdc
- Owner: green-coder
- License: mit
- Created: 2016-09-16T17:11:21.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2023-03-08T06:01:57.000Z (almost 3 years ago)
- Last Synced: 2025-02-27T05:50:56.614Z (10 months ago)
- Topics: cdc, data-stream, rust, rust-library
- Language: Rust
- Homepage:
- Size: 28.3 KB
- Stars: 24
- Watchers: 5
- Forks: 5
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE.txt
Awesome Lists containing this project
README
cdc
========
A library for performing *Content-Defined Chunking* (CDC) on data streams. Implemented using generic iterators, very easy to use.
- [API Documentation](https://docs.rs/cdc/)
## Example
```rust
let reader: BufReader = BufReader::new(file);
let byte_iter = reader.bytes().map(|b| b.unwrap());
// Finds and iterates on the separators.
for separator in SeparatorIter::new(byte_iter) {
println!("Index: {}, hash: {:016x}", separator.index, separator.hash);
}
```
Each module is documented via an example which you can find in the `examples/` folder.
To run them, use a command like:
cargo run --example separator --release
**Note:** Some examples are looking for a file named `myLargeFile.bin` which I didn't upload to Github. Please use your own files for testing.
## What's in the crate
From low level to high level:
* A `RollingHash64` trait, for rolling hash with a 64 bits hash value.
* `Rabin64`, an implementation of the Rabin Fingerprint rolling hash with a 64 bits hash value.
* `Separator`, a struct which describes a place in a data stream identified as a separator.
* `SeparatorIter`, an adaptor which takes an `Iterator` as input and which enumerates all the separators found.
* `Chunk`, a struct which describes a piece of the data stream (index and size).
* `ChunkIter`, an adaptor which takes an `Iterator` as input and which enumerates chunks.
## Implementation details
* The library is not cutting any files, it only provides information on how to do it.
* You can change the default window size used by `Rabin64`, and how the `SeparatorIter` is choosing the separator.
* The design of this crate may be subject to changes sometime in the future. I am waiting for some features of `Rust` to mature up, specially the [`impl Trait`](https://github.com/rust-lang/rust/issues/34511) feature.
## Performance
There is a **huge** difference between the debug build and the release build in terms of performance. Remember that when you test the lib, use `cargo run --release`.
I may try to improve the performance of the lib at some point, but for now it is good enough for most usages.
## License
Coded with ❤️ , licensed under the terms of the [MIT license](LICENSE.txt).