{"id":19846506,"url":"https://github.com/hekmon/deduper","last_synced_at":"2026-05-14T06:32:04.931Z","repository":{"id":206087320,"uuid":"714964717","full_name":"hekmon/deduper","owner":"hekmon","description":"Analyse 2 paths to found identical files and hard link them to save space","archived":false,"fork":false,"pushed_at":"2024-04-19T11:52:28.000Z","size":155,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-28T22:52:10.737Z","etag":null,"topics":["dedup","deduplicate","deduplication","deduplicator","duplicate-files","filesystem","hardlink","hardlinking","hardlinks","linux","saving","scan","scanner","space","tool","utility"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hekmon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-06T08:00:54.000Z","updated_at":"2024-04-19T23:03:18.000Z","dependencies_parsed_at":"2023-11-13T15:03:54.616Z","dependency_job_id":"8730a3e8-25eb-46f8-90d9-95fa7ad21685","html_url":"https://github.com/hekmon/deduper","commit_stats":null,"previous_names":["hekmon/deduper"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/hekmon/deduper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hekmon%2Fdeduper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hekmon%2Fdeduper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hekmon%2Fdeduper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hekmon%2Fdeduper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hekmon","download_url":"https://codeload.github.com/hekmon/deduper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hekmon%2Fdeduper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33013235,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dedup","deduplicate","deduplication","deduplicator","duplicate-files","filesystem","hardlink","hardlinking","hardlinks","linux","saving","scan","scanner","space","tool","utility"],"created_at":"2024-11-12T13:11:41.220Z","updated_at":"2026-05-14T06:32:04.910Z","avatar_url":"https://github.com/hekmon.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# deduper\n\nAnalyse 2 paths on the same file system to found identical files and hard link them to save space.\n\n## How it works\n\n* Indexing: both paths will be analyzed and the structure of the directories tree and their corresponding inodes mapped in memory (files \u0026 directories)\n* Then the structure of the A path will be walked and for each regular file, the mapped memory structure of path B will be analyzed to find potential candidates\n  * First a list of all files in B having the exact same size of the A files analyzed will be compiled (empty files will be ignored)\n  * Then this list will be pruned based on several criterias\n    * Candidates in B that are already hardlinks of the reference A file will be removed from the list\n    * Files that do not have the same inode metadata (ownership [uid, gid] and file mode) will be removed from the candidates list to avoid breaking potential current access to these files (as hardlinks share the same metadata by design)\n      * Unless the `-force` flag is set, in that case candidates are kept (but will have their metadata changed once hardlinking is done)\n    * For candidates that are still on the list, a SHA256 checksum will be performed to ensure they have indeed the same content as the reference file in A currently being processed\n* For candidates that have passed all the tests and are still on the candidates list:\n  * if the `-apply` flag has been set\n    * They will be removed (in order to free their path)\n    * Reffile in A will be hard linked to the path that the B candidate had, making it available once again but dedupped with A this time\n  * if the `-apply` flag has not been set\n    * A reporting will be printed of what would have been done (and saved) with the flag on\n\n## Usage\n\n```\nUsage of ./deduper:\n  -apply\n        By default deduper run in dry run mode: set this flag to actually apply changes\n  -debug\n        Show debug logs during the analysis phase\n  -dirA string\n        Referential directory\n  -dirB string\n        Second directory to compare dirA against\n  -force\n        Dedup files that have the same content even if their inode metadata (ownership and mode) are not the same\n  -minSize string\n        Set the minimum size a file must have to be kept for analysis (ex: 100MiB)\n  -workers int\n        Set the maximum numbers of workers that will perform IO tasks (default 6)\n```\n\n### Example\n\n```bash\n./deduper -minSize 10MiB -workers 8 -dirA \"$(pwd)/example/dirA\" -dirB \"$(pwd)/example/dirB\" -apply\n```\n\n![Example GIF](https://github.com/hekmon/deduper/raw/main/example/example.gif)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhekmon%2Fdeduper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhekmon%2Fdeduper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhekmon%2Fdeduper/lists"}