https://github.com/fabricesalvaire/filewalker
A Python library to scan a file system, find duplicated file etc.
https://github.com/fabricesalvaire/filewalker
duplicate-detection duplicate-files python-library python3
Last synced: 12 months ago
JSON representation
A Python library to scan a file system, find duplicated file etc.
- Host: GitHub
- URL: https://github.com/fabricesalvaire/filewalker
- Owner: FabriceSalvaire
- License: gpl-3.0
- Created: 2020-12-03T00:03:42.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2024-04-21T02:08:34.000Z (about 2 years ago)
- Last Synced: 2025-02-28T23:22:46.338Z (over 1 year ago)
- Topics: duplicate-detection, duplicate-files, python-library, python3
- Language: Python
- Homepage:
- Size: 123 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Some File System Tools implemented in Python
## Purposes
* provides tools to scan a file system, to find duplicated file on disk, to implement a mlocate like tool
* highly and easily customisable
* not so optimised, should be able to run with a low IO / memory footprint
i.e. never freeze a computer like do many desktop file indexers (Baloo, etc.)
i.e. more sleeping than thread spanning
* thus could be implemented with asyncio or better in Go but ... cf. infra
## Features
* a module to get mounted file systems on Linux
* a base module to scan a file system
* an improved OO File API to get: stat, allocated size on disk, sha1 checksum; a method to check if two files are identical ...
* a module to find duplicated files on disk, inspired by the C++ command line tool [rdfind](https://github.com/pauldreik/rdfind)
It implements a mark/commit duplicate API and security checks.
It also implements a **rdfind** `results.txt` loader and can cross-check it.
Dump and load a JSON result file.