https://github.com/umstek/dupkiller

Slow, but more reliable duplicate files cleaner.
https://github.com/umstek/dupkiller

cleaner duplicate-files storage

Last synced: 7 months ago
JSON representation

Slow, but more reliable duplicate files cleaner.

Host: GitHub
URL: https://github.com/umstek/dupkiller
Owner: umstek
License: mit
Created: 2017-01-04T06:35:13.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2021-03-14T10:33:46.000Z (over 4 years ago)
Last Synced: 2024-11-17T15:21:11.257Z (11 months ago)
Topics: cleaner, duplicate-files, storage
Language: C#
Size: 372 KB
Stars: 1
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# DupKiller
Slow, but more reliable duplicate files cleaner.

A perfect duplicate files finder would have to compare the content of each file with the
content of everything else. This has a O(n^2) complexity, and since the files can be
large, this is not possible and will take a long time to complete. If a dictionary i.e.:
a hashmap can be created with the content as keys and paths as values, we can identify
duplicate files quickly. But this is neither a viable option because all the files will
have to be stored in memory and a dictionary will not work properly with that.
Most of the current software use file size and extension to find the duplicate files but
this can be inaccurate in various instances e.g.: uncompressed images that are the same
size.
So, the best option is to consider multiple factors, and allow user to select whether to
use them. These factors should include a hash function. If the user suspects a hash
collision, there should be a way to raw compare the files. Since we have filtered
possible duplicates by various means, such incident can rarely occur.

Group by extension (optional default yes), file name (optional default no),
file size, shorter hash (MD5), longer hash (SHA512)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/umstek/dupkiller

Awesome Lists containing this project

README