https://github.com/platypusguy/filededupe

Utility to list duplicate files in one or more directories based on the file contents
https://github.com/platypusguy/filededupe

filemanager filesystem java utility

Last synced: 4 months ago
JSON representation

Utility to list duplicate files in one or more directories based on the file contents

Host: GitHub
URL: https://github.com/platypusguy/filededupe
Owner: platypusguy
Created: 2018-11-15T06:26:51.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2024-09-23T01:14:24.000Z (10 months ago)
Last Synced: 2025-03-26T07:46:22.668Z (4 months ago)
Topics: filemanager, filesystem, java, utility
Language: Java
Homepage: https://github.com/platypusguy/FileDedupe/wiki
Size: 1.54 MB
Stars: 24
Watchers: 4
Forks: 9
Open Issues: 0
Metadata Files:
- Readme: README.md
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

# FileDedupe
Utility to list duplicate files in one or more directories based on the file contents (rather than the name).

## What it is

FileDedupe is a utility that checks one or more directories for duplicate files. Just run it with a list of directories on the command line. The default is to check all subdirectories. This can be controlled (see below). The output is a text file, which is written to stdout consists of the name of files that have duplicates. The file is given followed by its duplicates.

An article on this utility and how it was designed and written appears in [Oracle's Java Magazine](https://blogs.oracle.com/javamagazine/the-joy-of-writing-command-line-utilities-finding-duplicate-files-part-1)

Version 1.0 used a brute-force approach of running checksums on every file in the user-specified directories and then comparing the checksums to identify duplicates. This worked well, but was slow.

Version 2.0 uses comparisons of file sizes to greatly reduce the number of files that require checksums. It runs 9x-11x faster on the test directories. Use this version for your own needs. The optimization that delivered this benefit is described in [this article in Oracle's Java Magazine](https://blogs.oracle.com/javamagazine/the-joy-of-writing-command-line-utilities-part-2-the-souped-up-way-to-find-duplicate-files)

## How to run
FileDedupe is written in Java 8. To run it, run the JAR file with the directory or directories to scan for duplicates. Note that directory of `.` is supported.
Options:

`-nosubdirs` this flag prevents FileDedupe from checking subdirectories for duplicates.

`-help` or `-h` or `--h`: shows this usage information

So, to run the utility on in the current directory:

`java -jar filededupe-2.0.jar .`

## Testing
The tests included here generate code coverage of 80%. And FileDedupe has been tested repeatedly on directories of more than 600,000 files.

## Extension: HTML Report
David V. Saraiva forked the code presented here and added the ability to generate an HTML report of duplicates. His repository [here](https://github.com/davidvsaraiva/FileDedupe).

## Thanks
Thanks to Oracle's _Java Magazine_ for publishing the articles on this utility.

Thanks to JetBrains for supporting open source by providing a license to [IntelliJ IDEA](https://www.jetbrains.com/idea/), which is an IDE that I have used since version 3.5.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/platypusguy/filededupe

Awesome Lists containing this project

README