https://github.com/noahgift/rdedupe
A Rust based deduplication tool
https://github.com/noahgift/rdedupe
clap command-line deduplication filesystem multithreading rust rust-lang
Last synced: about 2 months ago
JSON representation
A Rust based deduplication tool
- Host: GitHub
- URL: https://github.com/noahgift/rdedupe
- Owner: noahgift
- License: cc0-1.0
- Created: 2022-12-24T10:29:54.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-06-26T10:12:20.000Z (11 months ago)
- Last Synced: 2025-06-26T11:29:08.982Z (11 months ago)
- Topics: clap, command-line, deduplication, filesystem, multithreading, rust, rust-lang
- Language: Rust
- Homepage:
- Size: 60.5 KB
- Stars: 34
- Watchers: 2
- Forks: 28
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/noahgift/rdedupe/actions/workflows/tests.yml)
[](https://github.com/noahgift/rdedupe/actions/workflows/release.yml)
[](https://github.com/noahgift/rdedupe/actions/workflows/lint.yml)
[](https://github.com/noahgift/rdedupe/actions/workflows/rustfmt.yml)
## 🎓 Pragmatic AI Labs | Join 1M+ ML Engineers
### 🔥 Hot Course Offers:
* 🤖 [Master GenAI Engineering](https://ds500.paiml.com/learn/course/0bbb5/) - Build Production AI Systems
* 🦀 [Learn Professional Rust](https://ds500.paiml.com/learn/course/g6u1k/) - Industry-Grade Development
* 📊 [AWS AI & Analytics](https://ds500.paiml.com/learn/course/31si1/) - Scale Your ML in Cloud
* ⚡ [Production GenAI on AWS](https://ds500.paiml.com/learn/course/ehks1/) - Deploy at Enterprise Scale
* 🛠️ [Rust DevOps Mastery](https://ds500.paiml.com/learn/course/ex8eu/) - Automate Everything
### 🚀 Level Up Your Career:
* 💼 [Production ML Program](https://paiml.com) - Complete MLOps & Cloud Mastery
* 🎯 [Start Learning Now](https://ds500.paiml.com) - Fast-Track Your ML Career
* 🏢 Trusted by Fortune 500 Teams
Learn end-to-end ML engineering from industry veterans at [PAIML.COM](https://paiml.com)
## RDedupe
A Rust based deduplication tool
### Goals
* Build a multiplatform, fast deduplication tool that uses Rust parallelization.

#### Current Status
* Added 
* Added [progress bar](https://github.com/console-rs/indicatif)

* Added [Polars](https://github.com/pola-rs/polars) DataFrame
* Added statistics about files with optional CSV report.
#### Future Improvements
* Add a GUI
* Add a web interface
* Fix GitHub Actions Build process to not fail silently!
* Store logs about actions performed across multiple runs
### Building and Running
* Build: cd into rdedupe and run `make all`
* Run: `cargo run -- dedupe --path tests --pattern .txt`
* Run tests: `make test`
### OS X Install
* Install rust via [rustup](https://rustup.rs/)
* Add to `~/.cargo/config`
```bash
[target.x86_64-apple-darwin]
rustflags = [
"-C", "link-arg=-undefined",
"-C", "link-arg=dynamic_lookup",
]
[target.aarch64-apple-darwin]
rustflags = [
"-C", "link-arg=-undefined",
"-C", "link-arg=dynamic_lookup",
]
```
* run `make all` in rdedupe directory