Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/platypusguy/filededupe
Utility to list duplicate files in one or more directories.
https://github.com/platypusguy/filededupe
filemanager filesystem java utility
Last synced: 3 months ago
JSON representation
Utility to list duplicate files in one or more directories.
- Host: GitHub
- URL: https://github.com/platypusguy/filededupe
- Owner: platypusguy
- Created: 2018-11-15T06:26:51.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-02-28T08:56:13.000Z (almost 4 years ago)
- Last Synced: 2023-03-10T05:11:15.116Z (almost 2 years ago)
- Topics: filemanager, filesystem, java, utility
- Language: Java
- Homepage: https://github.com/platypusguy/FileDedupe/wiki
- Size: 1.54 MB
- Stars: 23
- Watchers: 4
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# FileDedupe
Utility to list duplicate files in one or more directories based on the file contents (rather than the name).## What it is
FileDedupe is a utility that checks one or more directories for duplicate files. Just run it with a list of directories on the command line. The default is to check all subdirectories. This can be controlled (see below). The output is a text file, which is written to stdout consists of the name of files that have duplicates. The file is given followed by its duplicates.
An article on this utility and how it was designed and written appears in [Oracle's Java Magazine](https://blogs.oracle.com/javamagazine/the-joy-of-writing-command-line-utilities-finding-duplicate-files-part-1)
Version 1.0 used a brute-force approach of running checksums on every file in the user-specified directories and then comparing the checksums to identify duplicates. This worked well, but was slow.
Version 2.0 uses comparisons of file sizes to greatly reduce the number of files that require checksums. It runs 9x-11x faster on the test directories. Use this version for your own needs. The optimization that delivered this benefit is described in [this article in Oracle's Java Magazine](https://blogs.oracle.com/javamagazine/the-joy-of-writing-command-line-utilities-part-2-the-souped-up-way-to-find-duplicate-files)
## How to run
FileDedupe is written in Java 8. To run it, run the JAR file with the directory or directories to scan for duplicates. Note that directory of `.` is supported.
Options:`-nosubdirs` this flag prevents FileDedupe from checking subdirectories for duplicates.
`-help` or `-h` or `--h`: shows this usage information
So, to run the utility on in the current directory:
`java -jar filededupe-2.0.jar .`
## Testing
The tests included here generate code coverage of 80%. And FileDedupe has been tested repeatedly on directories of more than 600,000 files.## Extension: HTML Report
David V. Saraiva forked the code presented here and added the ability to generate an HTML report of duplicates. His repository [here](https://github.com/davidvsaraiva/FileDedupe).## Thanks
Thanks to Oracle's _Java Magazine_ for publishing the articles on this utility.Thanks to JetBrains for supporting open source by providing a license to [IntelliJ IDEA](https://www.jetbrains.com/idea/), which is an IDE that I have used since version 3.5.