{"id":18012033,"url":"https://github.com/platypusguy/filededupe","last_synced_at":"2025-10-26T09:06:49.044Z","repository":{"id":54267800,"uuid":"157662756","full_name":"platypusguy/FileDedupe","owner":"platypusguy","description":"Utility to list duplicate files in one or more directories based on the file contents ","archived":false,"fork":false,"pushed_at":"2024-09-23T01:14:24.000Z","size":1619,"stargazers_count":24,"open_issues_count":0,"forks_count":9,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-26T07:46:22.668Z","etag":null,"topics":["filemanager","filesystem","java","utility"],"latest_commit_sha":null,"homepage":"https://github.com/platypusguy/FileDedupe/wiki","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/platypusguy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-11-15T06:26:51.000Z","updated_at":"2025-02-13T19:33:51.000Z","dependencies_parsed_at":"2022-08-13T10:31:00.728Z","dependency_job_id":null,"html_url":"https://github.com/platypusguy/FileDedupe","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/platypusguy%2FFileDedupe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/platypusguy%2FFileDedupe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/platypusguy%2FFileDedupe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/platypusguy%2FFileDedupe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/platypusguy","download_url":"https://codeload.github.com/platypusguy/FileDedupe/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245692562,"owners_count":20656967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["filemanager","filesystem","java","utility"],"created_at":"2024-10-30T03:14:07.753Z","updated_at":"2025-10-26T09:06:44.009Z","avatar_url":"https://github.com/platypusguy.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FileDedupe\nUtility to list duplicate files in one or more directories based on the file contents (rather than the name). \n\n## What it is\n\nFileDedupe is a utility that checks one or more directories for duplicate files. Just run it with a list of directories on the command line. The default is to check all subdirectories. This can be controlled (see below). The output is a text file, which is written to stdout consists of the name of files that have duplicates. The file is given followed by its duplicates.\n\nAn article on this utility and how it was designed and written appears in [Oracle's Java Magazine](https://blogs.oracle.com/javamagazine/the-joy-of-writing-command-line-utilities-finding-duplicate-files-part-1)\n\nVersion 1.0 used a brute-force approach of running checksums on every file in the user-specified directories and then comparing the checksums to identify duplicates. This worked well, but was slow. \n\nVersion 2.0 uses comparisons of file sizes to greatly reduce the number of files that require checksums. It runs 9x-11x faster on the test directories. Use this version for your own needs. The optimization that delivered this benefit is described in [this article in Oracle's Java Magazine](https://blogs.oracle.com/javamagazine/the-joy-of-writing-command-line-utilities-part-2-the-souped-up-way-to-find-duplicate-files)\n\n## How to run\nFileDedupe is written in Java 8. To run it, run the JAR file with the directory or directories to scan for duplicates. Note that directory of `.` is supported.\nOptions:\n\n`-nosubdirs` this flag prevents FileDedupe from checking subdirectories for duplicates.\n\n`-help` or `-h` or `--h`: shows this usage information\n\nSo, to run the utility on in the current directory:\n\n`java -jar filededupe-2.0.jar .`\n\n## Testing\nThe tests included here generate code coverage of 80%. And FileDedupe has been tested repeatedly on directories of more than 600,000 files. \n\n## Extension: HTML Report\nDavid V. Saraiva  forked the code presented here and added the ability to generate an HTML report of duplicates. His repository [here](https://github.com/davidvsaraiva/FileDedupe). \n\n## Thanks\nThanks to Oracle's _Java Magazine_ for publishing the articles on this utility. \n\nThanks to JetBrains for supporting open source by providing a license to [IntelliJ IDEA](https://www.jetbrains.com/idea/), which is an IDE that I have used since version 3.5.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplatypusguy%2Ffilededupe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fplatypusguy%2Ffilededupe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplatypusguy%2Ffilededupe/lists"}