{"id":26578205,"url":"https://github.com/binbash23/fscan","last_synced_at":"2026-05-14T20:32:35.390Z","repository":{"id":187262055,"uuid":"551943553","full_name":"binbash23/fscan","owner":"binbash23","description":"python project for finding duplicate files in a filesystem and moving them to an archive folder. Stats and stuff are stored in an sqlite database","archived":false,"fork":false,"pushed_at":"2022-12-11T11:03:55.000Z","size":15575,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-12T19:00:43.630Z","etag":null,"topics":["duplicate","files","filesystem-library","find","python","sqlite"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/binbash23.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-10-15T12:58:37.000Z","updated_at":"2022-10-17T08:35:49.000Z","dependencies_parsed_at":"2023-08-09T16:51:20.563Z","dependency_job_id":null,"html_url":"https://github.com/binbash23/fscan","commit_stats":null,"previous_names":["binbash23/fscan"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/binbash23/fscan","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binbash23%2Ffscan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binbash23%2Ffscan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binbash23%2Ffscan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binbash23%2Ffscan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/binbash23","download_url":"https://codeload.github.com/binbash23/fscan/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binbash23%2Ffscan/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33042156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["duplicate","files","filesystem-library","find","python","sqlite"],"created_at":"2025-03-23T04:28:10.350Z","updated_at":"2026-05-14T20:32:35.367Z","avatar_url":"https://github.com/binbash23.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"20221211\n\nfscan by Jens Heine \u003cbinbash@gmx.net\u003e\n\nScan a filesystem/path and store the result in a sqlite db. View statistics of\nthe scanned files. Find duplicate files and move them to a configurable archive\nfodler. Browse the sqlite database file with your favorite sqlite browser \n(i.e. https://sqlitebrowser.org/dl/) and find out more statistics of your\nfiles (changes, new/deleted files, ...)\n\nIf you do not want to mess around with the python code, you can use the \nprecompiled binaries for linux and windows: \nhttps://github.com/binbash23/fscan/tree/master/source/dist\n\nBest regards, Jens\n\n\n./fscan.py -h \n(or with the precompiled binaries under windows: fscan.exe or under linux: ./fscan)\n\nUsage: fscan.py [options]\n\nOptions:\n  -h, --help            show this help message and exit\n  -s, --scan            Start filesystem scan\n  -a, --analyze         Start filesystem analysis (hashing and mime type\n                        detection)\n  -m, --moveduplicates  Move duplicate files to archive folder\n  -c, --config          Show configuration\n  -M, --mimetypes       Show mime type statistics\n  -d, --duplicates      Show duplicate files\n  -S, --stats           Show database statistics\n  -l LOGLEVEL, --loglevel=LOGLEVEL\n                        Set loglevel (DEBUG, INFO, WARN, ERROR, CRITICAL)\n  -V, --version         Show fscan version info\n\n  Set configuration parameter:\n    -p PROPERTY, --property=PROPERTY\n                        Set property\n    -v VALUE, --value=VALUE\n                        Set value\n\n\nHOWTO start using fscan\n-----------------------\n\n1. Show configuration\n\n\u003e ./fscan.py -c\n\nFILESCAN_COMMIT_BATCH_SIZE   15000                          The commit size for the file scanning process. Default is 50000.\nWORKPATH                     /path_to/Pictures              The path where the filescanner will work.\nFILEHASH_COMMIT_BATCH_SIZE   100                            The commit size for the process that calculates filehashes and mime types. Default is 50.\nFILEHASH_BLOCK_SIZE          1073741824                     The block size that the hash calculator process will use. Default is 1073741824.\nDUPLICATES_ARCHIVE_PATH      /path_to/duplicates            The path where the duplicates will be stored. This path can be inside the WORKPATH\nLOG_LEVEL                    INFO                           Loglevel of the application: CRITICAL, ERROR, WARN, INFO, DEBUG\n\n2. Set scan path in configuration\n\n\u003e ./fscan -p WORKPATH -v /home/myusername/Bilder\n\n3. Set archive path in configuration\n\n\u003e ./fscan -p DUPLICATES_ARCHIVE_PATH -v /home/myusername/Bilder_duplicates\n\n4. Start scanning and hashing\n\n\u003e ./fscan.py -sa\n\n5. Show statistics\n\n\u003e ./fscan.py -S\n\nFilecount                    116521                         Total number of files in workpath.\nFilecount 0 byte files       5                              Total number of files with zero byte size.\nFilesize                     583.147 GB                     Total size of all files in workpath.\nFiles hashed / not hashed    116517 / 0                     Total number of files which have been hashed / not hashed (yet).\nFilesize hashed / not hashed 583.147 GB / 0.0 GB            Total size in bytes of files that have been hashed / not hashed (yet).\nPercent bytes hashed         100.0 %                        Percentage of bytes that have been hashed.\nDuplicate files              704                            Total number of files that can be deleted from the workpath because they are duplicates.\nDuplicates filesize          11.432 GB                      Total size in bytes of the duplicate files which can be safely deleted.\nPercent duplicate bytes      1.96 %                         Percentage of space from the workpath space usage that is used by duplicates.\nDifferent mime types         42                             Number of distinct mime types that have been found inside the workpath.\n\n6. Show mime type statistics\n\n\u003e ./fscan.py -M\n\nimage/jpeg                                                   100491    \nvideo/mp4                                                    5207      \ntext/html                                                    3152      \nvideo/quicktime                                              1354      \naudio/x-hx-aac-adts                                          1346      \naudio/x-m4a                                                  894       \naudio/ogg                                                    759       \napplication/json                                             622       \nvideo/x-msvideo                                              479       \nimage/png                                                    450       \napplication/dicom                                            426       \nimage/g3fax                                                  302       \nimage/gif                                                    227       \napplication/octet-stream                                     227       \ntext/xml                                                     145       \ntext/plain                                                   93        \n...\n\n7. Show duplicates\n\n\u003e ./fscan.py -d\n\n8. Move duplicates into the previous configured folder for duplicates (step 3.)\n\n\u003e ./fscan.py -m\n\n\n\n\n\nInfo for installing magic (mime type detection) modules under windows:\n\npip install python-libmagic\npip install python-magic-bin\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinbash23%2Ffscan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbinbash23%2Ffscan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinbash23%2Ffscan/lists"}