{"id":13586936,"url":"https://github.com/sreedevk/deduplicator","last_synced_at":"2025-06-21T03:42:24.778Z","repository":{"id":65444465,"uuid":"581938342","full_name":"sreedevk/deduplicator","owner":"sreedevk","description":"Filter, Sort \u0026 Delete Duplicate Files Recursively","archived":false,"fork":false,"pushed_at":"2024-07-05T04:34:21.000Z","size":342,"stargazers_count":344,"open_issues_count":8,"forks_count":19,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-14T11:27:10.822Z","etag":null,"topics":["deduplication","duplicate-detection","duplicate-files","duplicatefilefinder","filesystem","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sreedevk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-25T00:17:32.000Z","updated_at":"2025-05-12T19:24:47.000Z","dependencies_parsed_at":"2024-11-06T05:33:11.332Z","dependency_job_id":"a0785b73-505c-4e7c-adf2-e0aa12bedd33","html_url":"https://github.com/sreedevk/deduplicator","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/sreedevk/deduplicator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sreedevk%2Fdeduplicator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sreedevk%2Fdeduplicator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sreedevk%2Fdeduplicator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sreedevk%2Fdeduplicator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sreedevk","download_url":"https://codeload.github.com/sreedevk/deduplicator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sreedevk%2Fdeduplicator/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261059409,"owners_count":23103950,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deduplication","duplicate-detection","duplicate-files","duplicatefilefinder","filesystem","rust"],"created_at":"2024-08-01T15:05:55.292Z","updated_at":"2025-06-21T03:42:19.762Z","avatar_url":"https://github.com/sreedevk.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eDeduplicator\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  Find, Sort, Filter \u0026 Delete duplicate files \n\u003c/p\u003e\n\n## Usage\n\n```bash\nUsage: deduplicator [OPTIONS] [scan_dir_path]\n\nArguments:\n  [scan_dir_path]  Run Deduplicator on dir different from pwd (e.g., ~/Pictures )\n\nOptions:\n  -t, --types \u003cTYPES\u003e          Filetypes to deduplicate [default = all]\n  -i, --interactive            Delete files interactively\n  -s, --min-size \u003cMIN_SIZE\u003e    Minimum filesize of duplicates to scan (e.g., 100B/1K/2M/3G/4T) [default: 1b]\n  -d, --max-depth \u003cMAX_DEPTH\u003e  Max Depth to scan while looking for duplicates\n      --min-depth \u003cMIN_DEPTH\u003e  Min Depth to scan while looking for duplicates\n  -f, --follow-links           Follow links while scanning directories\n  -h, --help                   Print help information\n  -V, --version                Print version information\n      --json                    \n```\n### Examples\n\n```bash\n# Scan for duplicates recursively from the current dir, only look for png, jpg \u0026 pdf file types \u0026 interactively delete files\ndeduplicator -t pdf,jpg,png -i\n\n# Scan for duplicates recursively from the ~/Pictures dir, only look for png, jpeg, jpg \u0026 pdf file types \u0026 interactively delete files\ndeduplicator ~/Pictures/ -t png,jpeg,jpg,pdf -i\n\n# Scan for duplicates in the ~/Pictures without recursing into subdirectories\ndeduplicator ~/Pictures --max-depth 0\n\n# look for duplicates in the ~/.config directory while also recursing into symbolic link paths\ndeduplicator ~/.config --follow-links\n\n# scan for duplicates that are greater than 100mb in the ~/Media directory\ndeduplicator ~/Media --min-size 100mb\n```\n\n## Installation\n\n### Cargo Install\n\n#### Stable\n\n\u003e [!WARNING] Note from GxHash: GxHash relies on aes hardware acceleration, you must make sure the aes feature is enabled when building (otherwise it won't build). This can be done by setting the RUSTFLAGS environment variable to -C target-feature=+aes or -C target-cpu=native (the latter should work if your CPU is properly recognized by rustc, which is the case most of the time).\n\u003e please install version `0.2.1`  if you are unable to install `0.2.2`\n\n```bash\n$ RUSTFLAGS=\"-C target-cpu=native\" cargo install deduplicator\n```\n\n\u003e [!]\n\n#### Nightly\n\nif you'd like to install with nightly features, you can use\n\n```bash\n$ cargo install --git https://github.com/sreedevk/deduplicator\n```\nPlease note that if you use a version manager to install rust (like asdf), you need to reshim (`asdf reshim rust`).\n\n### Linux (Pre-built Binary)\n\nyou can download the pre-built binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.\ndownload the `deduplicator-x86_64-unknown-linux-gnu.tar.gz` for linux. Once you have the tarball file with the executable,\nyou can follow these steps to install:\n\n```bash\n$ tar -zxvf deduplicator-x86_64-unknown-linux-gnu.tar.gz\n$ sudo mv deduplicator /usr/bin/\n```\n\n### Mac OS (Pre-built Binary)\n\nyou can download the pre-build binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.\ndownload the `deduplicator-x86_64-apple-darwin.tar.gz` tarball for mac os. Once you have the tarball file with the executable, you can follow these steps to install:\n\n```bash\n$ tar -zxvf deduplicator-x86_64-unknown-linux-gnu.tar.gz\n$ sudo mv deduplicator /usr/bin/\n```\n\n### Windows (Pre-built Binary)\n\nyou can download the pre-build binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.\ndownload the `deduplicator-x86_64-pc-windows-msvc.zip` zip file for windows. unzip the `zip`  file \u0026 move the `deduplicator.exe` to a location in the PATH system environment variable.\n\nNote: If you Run into an msvc error, please install MSCV from [here](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170)\n\n## Performance\n\nDeduplicator uses size comparison and fxhash (a non non-cryptographic hashing algo) to quickly scan through large number of files to find duplicates. its also highly parallel (uses rayon and dashmap). I was able to scan through 120GB of files (Videos, PDFs, Images) in ~300ms. checkout the benchmarks\n\n## benchmarks\n\n| Command | Dirsize | Filecount | Mean [ms] | Min [ms] | Max [ms] | Relative |\n|:---|:---|---:|---:|---:|---:|---:|\n| `deduplicator ~/Data/tmp` | (~120G) | 721 files | 33.5 ± 28.6 | 25.3 | 151.5 | 1.87 ± 1.60 |\n| `deduplicator ~/Data/books` | (~8.6G) | 1419 files | 24.5 ± 1.0 | 22.9 | 28.1 | 1.37 ± 0.08 |\n| `deduplicator ~/Data/books --min-size 10M` | (~8.6G) | 1419 files | 17.9 ± 0.7 | 16.8 | 20.0 | 1.00 |\n| `deduplicator ~/Data/ --types pdf,jpg,png,jpeg` | (~290G) | 104222 files | 1207.2 ± 37.0 | 1172.2 | 1287.7 | 67.27 ± 3.33 |\n\n* The last entry is lower because of the number of files deduplicator had to go through (~660895 Files). The average size of the files rarely affect the performance of deduplicator.\n\nThese benchmarks were run using [hyperfine](https://github.com/sharkdp/hyperfine). Here are the specs of the machine used to benchmark deduplicator:\n\n```\nOS: Arch Linux x86_64 \nHost: Precision 5540\nKernel: 5.15.89-1-lts \nUptime: 4 hours, 44 mins \nShell: zsh 5.9                        \nTerminal: kitty \nCPU: Intel i9-9880H (16) @ 4.800GHz \nGPU: NVIDIA Quadro T2000 Mobile / Max-Q \nGPU: Intel CoffeeLake-H GT2 [UHD Graphics 630] \nMemory: 31731MiB (~32GiB)\n```\n\n## Screenshots\n\n![](https://user-images.githubusercontent.com/36154121/213618143-e5182e39-731e-4817-87dd-1a6a0f38a449.gif)\n\n## Roadmap\n    - Tree format output for duplicate file listing\n    - GUI\n    - Packages for different operating system repositories (currently only installable via cargo) \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsreedevk%2Fdeduplicator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsreedevk%2Fdeduplicator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsreedevk%2Fdeduplicator/lists"}