https://github.com/binbash23/dupfinder
Identify similar files
https://github.com/binbash23/dupfinder
bash duplicates linux photos pictures script videos
Last synced: about 1 year ago
JSON representation
Identify similar files
- Host: GitHub
- URL: https://github.com/binbash23/dupfinder
- Owner: binbash23
- License: gpl-3.0
- Created: 2020-07-09T21:00:46.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-10-17T08:37:04.000Z (over 3 years ago)
- Last Synced: 2023-08-09T16:38:06.616Z (over 2 years ago)
- Topics: bash, duplicates, linux, photos, pictures, script, videos
- Language: Shell
- Size: 104 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
- License: LICENSE
Awesome Lists containing this project
README
NAME
dupfinder
DESCRIPTION
Use dupfinder to find duplicate files in big archives of files.
Dupfinder searches the given path for files, creates a hash database for
all files and searches for files which have the same hash. So if you
have a picture or video with 2 diffenrent names or in different folders,
you can identify them and create an archive script and a delete script.
Dupfinder can be used to search for duplicate pictures in huge picture
collections for example.
SYNOPSIS
dupfinder OPTIONS
OPTIONS
-a Search in path for duplicates (requires -r to set the root path).
This option scans the root path for files, creates hashes for them
and searches for duplicate files. The filehash database will also
be cleaned from files that do not exist any more in the filesystem.
You can use dupfinder -g
later to generate scripts for moving the duplicates to an archive
folder. So if you have a picture or video with 2 different names or
in different folders, you can identify them and create an archive
script and a delete script.
The delete script will only delete duplicates - one version of the
duplicate files will be kept in the root path.
-c Calculate hashes for files in filelist file and add them to
filehash database
-C Create duplicate files file from hash database
-D Show contents of duplicates file
-d Delete not existing files from filehash database
-f search root path for files and create plain files file (requires -r)
-g Generate scripts to copy duplicates and to delete duplicates
-h Show usage information
-H Show contents of hash database
-L Show contents of filelist database
-n Calculate new hash for all files found, even if they exist in hash db
-N Skip calculating sum of filesizes (size calculation may take some time
in big archives)
-p Create filelist database for root path (requires -r)
-r [PATH] set root path
-R Delete filehash database files
-s Show statistics of database
-v Verbose mode
-V Show program version
EXAMPLES
1) To search duplicate files in path /tmp/picture use:
> ./dupfinder -avr /tmp/Pictures
3 files will be created:
> filelist.db - list of files in root path
> filehash.db - list of all files with every hash value
> duplicates.db - list of duplicate files in
2) To get help for all options use:
> ./dupfinder -h
3) To generate 2 scripts which can be used to archive the duplicates to
an archive folder and to delete the duplicates use:
> ./dupfinder -g
2 scripts will be generated:
> copy_dups_from_root_path_to_archive.sh
This script which will copy all duplicates in root path to
an archive folder in your current directory called "duplicates".
> delete_dups_from_root_path.sh
This script can be used to delete all duplicate files from the
root path. If you want to exclude several folders, you can modify
the script with grep for example.
The delete script contents only the duplicates. If there are 3
similar files, the delete script will only have 2 of them deleted.
Just have a look into the delete script:
> tail delete_dups_from_root_path.sh
rm -v '/tmp/2015/921.JPG'
rm -v '/tmp/2015/.thumbs/923.JPG'
If you don't want to delete file from the thumbnail folders, create
a new script:
> grep -v ".thumbs" > delete_dups_from_root_path_custom.sh
4) To find out about the duplicates statistics use:
> ./dupfinder -s
Dupfinder Statistics
Files in filehash database : 4774 (537 Kb) modified: 2020-10-03 17:06:59.953627326 +0200
Files in plain file list : 4774 (378 Kb) modified: 2020-10-03 17:06:08.454297780 +0200
Files in duplicates file : 1529 (130 Kb) modified: 2020-10-03 17:07:04.149572704 +0200
Size of all duplicates : 1.489 GB
REPORTING BUGS
Report bugs to
AUTHOR
dupfinder by Jens Heine 2020
and Dennis Brossat
and Benjamin Heine