{"id":17151246,"url":"https://github.com/soda480/mpgitleaks","last_synced_at":"2025-04-13T12:03:37.492Z","repository":{"id":37513787,"uuid":"382222016","full_name":"soda480/mpgitleaks","owner":"soda480","description":"A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel","archived":false,"fork":false,"pushed_at":"2022-06-21T17:57:37.000Z","size":646,"stargazers_count":8,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-05T17:24:15.468Z","etag":null,"topics":["docker","gitleaks","multiprocessing","pybuilder","python","terminal"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/soda480.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-02T03:42:01.000Z","updated_at":"2023-02-23T07:45:11.000Z","dependencies_parsed_at":"2022-08-18T19:32:09.802Z","dependency_job_id":null,"html_url":"https://github.com/soda480/mpgitleaks","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soda480%2Fmpgitleaks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soda480%2Fmpgitleaks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soda480%2Fmpgitleaks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soda480%2Fmpgitleaks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/soda480","download_url":"https://codeload.github.com/soda480/mpgitleaks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240072098,"owners_count":19743527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","gitleaks","multiprocessing","pybuilder","python","terminal"],"created_at":"2024-10-14T21:37:35.689Z","updated_at":"2025-02-23T10:32:05.209Z","avatar_url":"https://github.com/soda480.png","language":"Python","readme":"# mpgitleaks\n[![build](https://github.com/soda480/mpgitleaks/actions/workflows/main.yml/badge.svg)](https://github.com/soda480/mpgitleaks/actions/workflows/main.yml)\n[![Code Grade](https://api.codiga.io/project/24885/status/svg)](https://app.codiga.io/public/project/24885/mpgitleaks/dashboard)\n[![complexity](https://img.shields.io/badge/complexity-Simple:%205-brightgreen)](https://radon.readthedocs.io/en/latest/api.html#module-radon.complexity)\n[![vulnerabilities](https://img.shields.io/badge/vulnerabilities-None-brightgreen)](https://pypi.org/project/bandit/)\n[![python](https://img.shields.io/badge/python-3.9-teal)](https://www.python.org/downloads/)\n\nA Python script that wraps the [gitleaks](https://github.com/zricethezav/gitleaks) tool to enable scanning of multiple repositories in parallel. \n\nThe motivation behind writing this script was:\n* implement workaround for `gitleaks` intermittent failures when cloning very large repositories\n* implement ability to scan multiple repostiories in parallel\n* implement ability to scan repositories for a user, a specified organization or read from a file\n\n**Notes**:\n* the script uses https to clone the repos\n  * you must set the `USERNAME` and `PASSWORD` environment variables - this credential needs to have access to the repos being scanned\n  * if using `--file` then https clone urls must be supplied in the file\n* the maximum number of background processes (workers) that will be started is `35`\n  * if the number of repos to process is less than the maximum number of workers\n    * the script will start one worker per repository\n  * if the number of repos to process is greater than the maximum number of workers\n    * the repos will be added to a thread-safe queue and processed by all the workers\n* the Docker container must run with a bind mount to the working directory in order to access logs/reports\n  * the repos will be cloned to the `./scans/clones` folder in the working directory\n  * the reports will be written to the `./scans/reports/` folder in the working directory\n  * a summary report will be written to `mpgitleaks.csv`\n\n\n## Usage\n```bash\nusage: mpgitleaks [-h] [--file FILENAME] [--user] [--org ORG] [--exclude EXCLUDE] [--include INCLUDE] [--size SIZE] [--log]\n\nA Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel\n\noptional arguments:\n  -h, --help         show this help message and exit\n  --file FILENAME    scan repos contained in the specified file\n  --user             scan repos for the authenticated GitHub user where user is owner or collaborator\n  --org ORG          scan repos for the specified GitHub organization\n  --exclude EXCLUDE  a regex to match name of repos to exclude from scanning\n  --include INCLUDE  a regex to match name of repos to include in scanning\n  --size SIZE        scan repos less than specified size (in KB)\n  --log              log messages to log file\n```\n\n## Execution\n\nSet the required environment variables:\n```bash\nexport USERNAME='--username--'\nexport PASSWORD='--password-or-token--'\n```\n\nIf using `--user` or `--org` options and GitHub instance is not `api.github.com`:\n```bash\nexport GH_BASE_URL='--api-address-to-github-instance--'\n```\n\nExecute the Docker container:\n```bash\ndocker container run \\\n--rm \\\n-it \\\n-e http_proxy \\\n-e https_proxy \\\n-e GH_BASE_URL \\\n-e USERNAME \\\n-e PASSWORD \\\n-v $PWD:/opt/mpgitleaks \\\nsoda480/mpgitleaks:latest \\\n[MPGITLEAKS OPTIONS]\n```\n\n**Note**: the `http[s]_proxy` environment variables are only required if executing behind a proxy server\n\n### Examples\n\nScan all repos contained in the file `repos.txt` but exclude the repos that match the specified regex, an example of a `repos.txt` can be found [here](https://raw.githubusercontent.com/soda480/mpgitleaks/master/examples/repos.txt):\n```bash\nmpgitleaks --file 'repos.txt' --exclude 'soda480/mplogp'\n```\n![example](https://raw.githubusercontent.com/soda480/mpgitleaks/master/docs/images/example1.gif)\n\nScan all repos for the authenticated user but exclude the repos that match the specified regex:\n```bash\nmpgitleaks --user --exclude 'intel|edgexfoundry|soda480/openhack'\n```\n\nScan all repos in the specified organization but only include the repos that match the specified regex:\n```bash\nmpgitleaks --org 'myorg' --include '.*-go'\n```\n\n## Development\n\nClone the repository and ensure the latest version of Docker is installed on your development server.\n\nBuild the Docker image:\n```bash\ndocker image build \\\n--target build \\\n--build-arg http_proxy \\\n--build-arg https_proxy \\\n-t \\\nmpgitleaks:latest .\n```\n\nRun the Docker container:\n```bash\ndocker container run \\\n--rm \\\n-it \\\n-e http_proxy \\\n-e https_proxy \\\n-v $PWD:/code \\\nmpgitleaks:latest \\\n/bin/bash\n```\n\nBuild application:\n```bash\npyb -X\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoda480%2Fmpgitleaks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoda480%2Fmpgitleaks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoda480%2Fmpgitleaks/lists"}