https://github.com/binbash23/fscan
python project for finding duplicate files in a filesystem and moving them to an archive folder. Stats and stuff are stored in an sqlite database
https://github.com/binbash23/fscan
duplicate files filesystem-library find python sqlite
Last synced: 3 months ago
JSON representation
python project for finding duplicate files in a filesystem and moving them to an archive folder. Stats and stuff are stored in an sqlite database
- Host: GitHub
- URL: https://github.com/binbash23/fscan
- Owner: binbash23
- License: gpl-3.0
- Created: 2022-10-15T12:58:37.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-12-11T11:03:55.000Z (about 3 years ago)
- Last Synced: 2025-03-23T04:29:02.678Z (9 months ago)
- Topics: duplicate, files, filesystem-library, find, python, sqlite
- Language: Python
- Homepage:
- Size: 14.9 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
- License: LICENSE
Awesome Lists containing this project
README
20221211
fscan by Jens Heine
Scan a filesystem/path and store the result in a sqlite db. View statistics of
the scanned files. Find duplicate files and move them to a configurable archive
fodler. Browse the sqlite database file with your favorite sqlite browser
(i.e. https://sqlitebrowser.org/dl/) and find out more statistics of your
files (changes, new/deleted files, ...)
If you do not want to mess around with the python code, you can use the
precompiled binaries for linux and windows:
https://github.com/binbash23/fscan/tree/master/source/dist
Best regards, Jens
./fscan.py -h
(or with the precompiled binaries under windows: fscan.exe or under linux: ./fscan)
Usage: fscan.py [options]
Options:
-h, --help show this help message and exit
-s, --scan Start filesystem scan
-a, --analyze Start filesystem analysis (hashing and mime type
detection)
-m, --moveduplicates Move duplicate files to archive folder
-c, --config Show configuration
-M, --mimetypes Show mime type statistics
-d, --duplicates Show duplicate files
-S, --stats Show database statistics
-l LOGLEVEL, --loglevel=LOGLEVEL
Set loglevel (DEBUG, INFO, WARN, ERROR, CRITICAL)
-V, --version Show fscan version info
Set configuration parameter:
-p PROPERTY, --property=PROPERTY
Set property
-v VALUE, --value=VALUE
Set value
HOWTO start using fscan
-----------------------
1. Show configuration
> ./fscan.py -c
FILESCAN_COMMIT_BATCH_SIZE 15000 The commit size for the file scanning process. Default is 50000.
WORKPATH /path_to/Pictures The path where the filescanner will work.
FILEHASH_COMMIT_BATCH_SIZE 100 The commit size for the process that calculates filehashes and mime types. Default is 50.
FILEHASH_BLOCK_SIZE 1073741824 The block size that the hash calculator process will use. Default is 1073741824.
DUPLICATES_ARCHIVE_PATH /path_to/duplicates The path where the duplicates will be stored. This path can be inside the WORKPATH
LOG_LEVEL INFO Loglevel of the application: CRITICAL, ERROR, WARN, INFO, DEBUG
2. Set scan path in configuration
> ./fscan -p WORKPATH -v /home/myusername/Bilder
3. Set archive path in configuration
> ./fscan -p DUPLICATES_ARCHIVE_PATH -v /home/myusername/Bilder_duplicates
4. Start scanning and hashing
> ./fscan.py -sa
5. Show statistics
> ./fscan.py -S
Filecount 116521 Total number of files in workpath.
Filecount 0 byte files 5 Total number of files with zero byte size.
Filesize 583.147 GB Total size of all files in workpath.
Files hashed / not hashed 116517 / 0 Total number of files which have been hashed / not hashed (yet).
Filesize hashed / not hashed 583.147 GB / 0.0 GB Total size in bytes of files that have been hashed / not hashed (yet).
Percent bytes hashed 100.0 % Percentage of bytes that have been hashed.
Duplicate files 704 Total number of files that can be deleted from the workpath because they are duplicates.
Duplicates filesize 11.432 GB Total size in bytes of the duplicate files which can be safely deleted.
Percent duplicate bytes 1.96 % Percentage of space from the workpath space usage that is used by duplicates.
Different mime types 42 Number of distinct mime types that have been found inside the workpath.
6. Show mime type statistics
> ./fscan.py -M
image/jpeg 100491
video/mp4 5207
text/html 3152
video/quicktime 1354
audio/x-hx-aac-adts 1346
audio/x-m4a 894
audio/ogg 759
application/json 622
video/x-msvideo 479
image/png 450
application/dicom 426
image/g3fax 302
image/gif 227
application/octet-stream 227
text/xml 145
text/plain 93
...
7. Show duplicates
> ./fscan.py -d
8. Move duplicates into the previous configured folder for duplicates (step 3.)
> ./fscan.py -m
Info for installing magic (mime type detection) modules under windows:
pip install python-libmagic
pip install python-magic-bin