https://github.com/raspi/duplikaatti
Remove duplicate files.
https://github.com/raspi/duplikaatti
duplicate-files duplicates files go golang
Last synced: 3 months ago
JSON representation
Remove duplicate files.
- Host: GitHub
- URL: https://github.com/raspi/duplikaatti
- Owner: raspi
- License: apache-2.0
- Created: 2018-07-09T16:33:11.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2021-10-26T00:30:03.000Z (about 4 years ago)
- Last Synced: 2025-04-04T07:23:37.951Z (9 months ago)
- Topics: duplicate-files, duplicates, files, go, golang
- Language: Go
- Homepage:
- Size: 33.2 KB
- Stars: 17
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# duplikaatti



Remove duplicate files and do it fast. `duplikaatti` is designed to go through 50 TiB+ of data and hundreds of thousands of files and find duplicate files in few minutes.
## Algorithm
* Create file list of given directories
* do not add files with same identifier already added to the list (windows: file id, *nix: inode)
* do not add 0 byte files
* directories listed first has higher priority than the last
* Remove all files from the list which do not share same file sizes (ie. there's only one 1000 byte file -> remove)
* Read first bytes of files and generate SHA256 sum of those bytes
* Remove all hashes from the list which occured only once
* Read last bytes of files and generate SHA256 sum of those bytes
* Remove all hashes from the list which occured only once
* Now finally hash the whole files that are left
* Remove all hashes from the list which occured only once
* Generate list of files to keep and what to remove
* use directory priority and file age to find what to keep
* oldest and highest priority files are kept
* Finally, remove files from filesystem(s)
## Usage
```
Duplicate file remover (version 1.0.0)
Removes duplicate files. Algorithm idea from rdfind.
Usage of duplikaatti [options] :
Parameters:
-remove
Actually remove files.
Examples:
Test what would be removed:
duplikaatti /home/raspi/storage /mnt/storage
Remove files:
duplikaatti -remove /home/raspi/storage /mnt/storage
```
Idea inspired by https://github.com/pauldreik/rdfind