https://github.com/madebyjake/md5sift
Generate MD5 checksum reports (CSV) for large directories with filtering by file extension or lists.
https://github.com/madebyjake/md5sift
csv md5-checksum mit-license python python3 reporting
Last synced: 8 months ago
JSON representation
Generate MD5 checksum reports (CSV) for large directories with filtering by file extension or lists.
- Host: GitHub
- URL: https://github.com/madebyjake/md5sift
- Owner: madebyjake
- License: mit
- Created: 2024-10-07T17:20:52.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-04T12:41:00.000Z (over 1 year ago)
- Last Synced: 2024-11-30T18:09:04.978Z (over 1 year ago)
- Topics: csv, md5-checksum, mit-license, python, python3, reporting
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# md5sift
**md5sift** is a CLI tool written in Python designed to generate checksum reports for files across local directories or network shares. It offers **filtering by file extensions or predefined file lists** and produces reports in **CSV format**.
**NOTICE**: I have ended development on this project and moved over to [hashreport](https://github.com/madebyjake/hashreport), a complete rewrite that introduces additional features and significant improvements. Please consider either exploring the new project, or feel free to fork and continue working on this one.
## Features
- **Bulk Checksum Generation:** Calculate hashes for multiple files in a directory.
- **File Filtering:** Filter files by extension or from a provided file list.
- **Multi-threaded Processing:** Faster checksum generation with multi-threading.
- **CSV Output:** Generate comprehensive CSV reports including file paths, MD5 hashes, and timestamps.
- **Algorithm Options:** Supports hashing algorithms (`md5`, `sha1`, `sha256`).
- **Verbose Mode:** Real-time progress updates.
- **Test Mode:** Process a subset of files for quick validation.
## Installation
md5sift can be installed and run as a Python script or via an RPM package.
### Python Installation
#### Requirements:
- Python 3.x
- Git (optional)
#### Setup:
**Option 1:** Clone from GitHub:
```bash
git clone https://github.com/madebyjake/md5sift.git && cd md5sift
```
**Option 2:** Download ZIP and extract.
### RPM Installation
To install via RPM package:
1. Download the RPM from the [Releases](https://github.com/madebyjake/md5sift/releases) page.
2. Install using a package manager:
```bash
sudo rpm -ivh md5sift--1.noarch.rpm # using RPM
sudo yum install md5sift--1.noarch.rpm # using YUM
sudo dnf install md5sift--1.noarch.rpm # using DNF
```
*Replace `` with the package version.*
To build the RPM package from source, refer to the [Building the RPM Package](#building-the-rpm-package) section.
## Usage
Depending on the chosen installation method, md5sift can be run as a Python script or via the command-line interface.
**NOTE:**
- Default scan path is the current directory if `-s`/`--scan-path` isn’t provided.
- Default output file is `hash_report.csv` in the current directory if `-o`/`--output` isn’t specified.
### Python Script Execution
Run directly using Python:
```bash
python3 md5sift.py -s -o [OPTIONS]
```
### Command-Line Interface (CLI)
After RPM installation, run:
```bash
md5sift -s -o [OPTIONS]
```
### Arguments
| Argument | Description |
|-------------------|----------------------------------------------------------------------------|
| `-s, --scan-path` | Path to the directory to scan. Defaults to the current directory. |
| `-o, --output` | Path to the output CSV file. Defaults to `md5_report.csv`. |
| `-e, --extension` | Filter files by specific extension (e.g., `.txt`). |
| `-f, --filelist` | Path to a CSV file containing specific file names to process. |
| `-v, --verbose` | Enable verbose mode for progress updates. |
| `-t, --threads` | Number of threads (default: CPU core count). |
| `--test` | Run in test mode and process a limited number of files. |
| `-a, --algorithm` | Hashing algorithm (`md5`, `sha1`, `sha256`). Defaults to `md5`. |
| `--exclude` | Paths or directories to exclude from scanning. |
| `-h, --help` | Show help message. |
| `--version` | Show version information. |
### Examples
Below are some examples of how to use md5sift (rpm package) with different options:
**Scan a Directory and Save to CSV**
```bash
md5sift -s /path/to/scan -o /path/to/output/report.csv
```
**Filter by File Extension**
```bash
md5sift -s /path/to/scan -o /path/to/output/report.csv -e .txt
```
**Use a File List and Verbose Mode**
```bash
md5sift -s /path/to/scan -o /path/to/output/report.csv -f /path/to/filelist.csv -v
```
**Test Mode (Process First 10 Files)**
```bash
md5sift -s /path/to/scan -o /path/to/output/report.csv --test 10
```
**Use SHA-256 and Exclude Directories**
```bash
md5sift -s /path/to/scan -o /path/to/output/report.csv -a sha256 --exclude /path/to/exclude_dir
```
## Logging
- By default, `INFO` level logging is enabled.
- Use `-v` (`--verbose`) for real-time progress updates.
## Building the RPM Package
1. To build the RPM package, install the required dependencies:
```bash
sudo dnf install rpm-build python3-devel python3-setuptools
```
2. From the project root directory to generate the md5sift.spec file:
```bash
python3 setup.py genspec
```
3. Build the RPM package:
```bash
python3 setup.py bdist_rpm
```
The RPM package will be generated in the `dist/` directory.
## License
This project is licensed under the [MIT License](LICENSE).
## Contributing
Contributions are welcome! Please refer to the [CONTRIBUTING.md](CONTRIBUTING.md) file for guidance.
## Support
Please open an [issue](https://github.com/madebyjake/md5sift/issues) for support or feedback.