{"id":22544959,"url":"https://github.com/madebyjake/md5sift","last_synced_at":"2025-08-04T07:32:29.868Z","repository":{"id":257843600,"uuid":"869078458","full_name":"madebyjake/md5sift","owner":"madebyjake","description":"Generate MD5 checksum reports (CSV) for large directories with filtering by file extension or lists.","archived":false,"fork":false,"pushed_at":"2024-11-04T12:41:00.000Z","size":12,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-30T18:09:04.978Z","etag":null,"topics":["csv","md5-checksum","mit-license","python","python3","reporting"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/madebyjake.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-07T17:20:52.000Z","updated_at":"2024-11-04T12:38:14.000Z","dependencies_parsed_at":"2024-10-16T17:16:33.824Z","dependency_job_id":"f09a434d-91ed-4fb3-a8a8-68999bf6f4e3","html_url":"https://github.com/madebyjake/md5sift","commit_stats":null,"previous_names":["madebyjake/md5sift"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madebyjake%2Fmd5sift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madebyjake%2Fmd5sift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madebyjake%2Fmd5sift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madebyjake%2Fmd5sift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/madebyjake","download_url":"https://codeload.github.com/madebyjake/md5sift/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228610888,"owners_count":17945330,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","md5-checksum","mit-license","python","python3","reporting"],"created_at":"2024-12-07T14:08:32.739Z","updated_at":"2025-08-04T07:32:29.811Z","avatar_url":"https://github.com/madebyjake.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# md5sift\n\n**md5sift** is a CLI tool written in Python designed to generate checksum reports for files across local directories or network shares. It offers **filtering by file extensions or predefined file lists** and produces reports in **CSV format**.\n\n**NOTICE**: I have ended development on this project and moved over to [hashreport](https://github.com/madebyjake/hashreport), a complete rewrite that introduces additional features and significant improvements. Please consider either exploring the new project, or feel free to fork and continue working on this one.\n\n## Features\n\n- **Bulk Checksum Generation:** Calculate hashes for multiple files in a directory.\n- **File Filtering:** Filter files by extension or from a provided file list.\n- **Multi-threaded Processing:** Faster checksum generation with multi-threading.\n- **CSV Output:** Generate comprehensive CSV reports including file paths, MD5 hashes, and timestamps.\n- **Algorithm Options:** Supports hashing algorithms (`md5`, `sha1`, `sha256`).\n- **Verbose Mode:** Real-time progress updates.\n- **Test Mode:** Process a subset of files for quick validation.\n\n## Installation\n\nmd5sift can be installed and run as a Python script or via an RPM package.\n\n### Python Installation\n\n#### Requirements:\n- Python 3.x  \n- Git (optional)  \n\n#### Setup:\n\n**Option 1:** Clone from GitHub:\n```bash\ngit clone https://github.com/madebyjake/md5sift.git \u0026\u0026 cd md5sift\n```\n\n**Option 2:** Download ZIP and extract.\n\n### RPM Installation\n\nTo install via RPM package:\n\n1. Download the RPM from the [Releases](https://github.com/madebyjake/md5sift/releases) page.\n2. Install using a package manager:\n\n```bash\nsudo rpm -ivh md5sift-\u003cver\u003e-1.noarch.rpm      # using RPM\nsudo yum install md5sift-\u003cver\u003e-1.noarch.rpm   # using YUM\nsudo dnf install md5sift-\u003cver\u003e-1.noarch.rpm   # using DNF\n```\n\n*Replace `\u003cver\u003e` with the package version.*\n\nTo build the RPM package from source, refer to the [Building the RPM Package](#building-the-rpm-package) section.\n\n## Usage\n\nDepending on the chosen installation method, md5sift can be run as a Python script or via the command-line interface.\n\n**NOTE:**\n- Default scan path is the current directory if `-s`/`--scan-path` isn’t provided.\n- Default output file is `hash_report.csv` in the current directory if `-o`/`--output` isn’t specified.\n\n### Python Script Execution\n\nRun directly using Python:\n\n```bash\npython3 md5sift.py -s \u003cscan_directory\u003e -o \u003coutput_file\u003e [OPTIONS]\n```\n\n### Command-Line Interface (CLI)\n\nAfter RPM installation, run:\n\n```bash\nmd5sift -s \u003cscan_directory\u003e -o \u003coutput_file\u003e [OPTIONS]\n```\n\n### Arguments\n\n| Argument          | Description                                                                |\n|-------------------|----------------------------------------------------------------------------|\n| `-s, --scan-path` | Path to the directory to scan. Defaults to the current directory.          |\n| `-o, --output`    | Path to the output CSV file. Defaults to `md5_report.csv`.                 |\n| `-e, --extension` | Filter files by specific extension (e.g., `.txt`).                         |\n| `-f, --filelist`  | Path to a CSV file containing specific file names to process.              |\n| `-v, --verbose`   | Enable verbose mode for progress updates.                                  |\n| `-t, --threads`   | Number of threads (default: CPU core count).                               |\n| `--test`          | Run in test mode and process a limited number of files.                    |\n| `-a, --algorithm` | Hashing algorithm (`md5`, `sha1`, `sha256`). Defaults to `md5`.            |\n| `--exclude`       | Paths or directories to exclude from scanning.                             |\n| `-h, --help`      | Show help message.                                                         |\n| `--version`       | Show version information.                                                  |\n\n### Examples\n\nBelow are some examples of how to use md5sift (rpm package) with different options:\n\n**Scan a Directory and Save to CSV**\n```bash\nmd5sift -s /path/to/scan -o /path/to/output/report.csv\n```\n\n**Filter by File Extension**\n```bash\nmd5sift -s /path/to/scan -o /path/to/output/report.csv -e .txt\n```\n\n**Use a File List and Verbose Mode**\n```bash\nmd5sift -s /path/to/scan -o /path/to/output/report.csv -f /path/to/filelist.csv -v\n```\n\n**Test Mode (Process First 10 Files)**\n```bash\nmd5sift -s /path/to/scan -o /path/to/output/report.csv --test 10\n```\n\n**Use SHA-256 and Exclude Directories**\n```bash\nmd5sift -s /path/to/scan -o /path/to/output/report.csv -a sha256 --exclude /path/to/exclude_dir\n```\n\n## Logging\n- By default, `INFO` level logging is enabled.\n- Use `-v` (`--verbose`) for real-time progress updates.\n\n## Building the RPM Package\n\n1. To build the RPM package, install the required dependencies:\n\n```bash\nsudo dnf install rpm-build python3-devel python3-setuptools\n```\n\n2. From the project root directory to generate the md5sift.spec file:\n\n```bash\npython3 setup.py genspec\n```\n\n3. Build the RPM package:\n\n```bash\npython3 setup.py bdist_rpm\n```\n\nThe RPM package will be generated in the `dist/` directory.\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n## Contributing\n\nContributions are welcome! Please refer to the [CONTRIBUTING.md](CONTRIBUTING.md) file for guidance.\n\n## Support\n\nPlease open an [issue](https://github.com/madebyjake/md5sift/issues) for support or feedback.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadebyjake%2Fmd5sift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmadebyjake%2Fmd5sift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadebyjake%2Fmd5sift/lists"}