{"id":28905838,"url":"https://github.com/vasillieux/gitsens","last_synced_at":"2025-08-01T13:41:48.478Z","repository":{"id":293522692,"uuid":"977084520","full_name":"vasillieux/gitsens","owner":"vasillieux","description":"Scan thousands git repositories for leaked secrets","archived":false,"fork":false,"pushed_at":"2025-05-17T10:55:07.000Z","size":43,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-20T19:10:32.247Z","etag":null,"topics":["git","leaked-code","scanner","secrets","vulnerability"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vasillieux.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-03T11:46:17.000Z","updated_at":"2025-05-15T18:19:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"f7cadf1c-34a8-41a1-b12e-275617c1d4c2","html_url":"https://github.com/vasillieux/gitsens","commit_stats":null,"previous_names":["kn0tsu/gitsens","hypocycloidd/gitsens","vasillieux/gitsens"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vasillieux/gitsens","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasillieux%2Fgitsens","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasillieux%2Fgitsens/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasillieux%2Fgitsens/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasillieux%2Fgitsens/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vasillieux","download_url":"https://codeload.github.com/vasillieux/gitsens/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vasillieux%2Fgitsens/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261131225,"owners_count":23114094,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["git","leaked-code","scanner","secrets","vulnerability"],"created_at":"2025-06-21T13:39:36.972Z","updated_at":"2025-08-01T13:41:48.448Z","avatar_url":"https://github.com/vasillieux.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![immg](https://github.com/user-attachments/assets/6c2dd151-ad90-41cf-ba69-60918faa9f31)\n\n\n## Gitsens \n\n**What is**? Semantic scanner for thousands of (git-) github repositories on leaing sensitive information.\n\n**The flow**:\n1. Parsing repositories\n2. Checking deleted/history data with filters (**blobs** data)\n    - Codebase languages ( contains `.html, .py, .rs, .json, .sol` )\n    - Codebase configuration files (`docker-compose.yaml`, `.env`, `.k8s`, '.js', `.py`)\n    - Generic binary files \n        - Compiled .exe with patched secrets \n        - `.pyc` and others language-specific precompiled/cache  \n\n3. Simply, for each of the blob filetype we have to run different regexp parsers + trufflehog.\n\n## Installation (No DOCKER) \n\n### Prerequirements \n- Trufflehog \n- GH (Github-CLI) \n    - You need to setup your GH cli before run the program.\n\nThen run with python 3.12 \n- `pip install -r requirements.txt`\n\n## Using GH, to locate the repositories you want to parse. \n\n### Solidity projects with \"hardhat\"\n- `gh search repos --limit 500 --json fullName --jq '.[].fullName' 'hardhat language:Solidity' \u003e hardhat_solidity_repos.txt`\n\n### Python projects using web3.py\n- `gh search repos --limit 500 --json fullName --jq '.[].fullName' 'language:Python web3.py' \u003e python_web3py_repos.txt`\n\n### Files named .env that might contain PRIVATE_KEY\n- `gh search code --limit 500 --json repository.fullName,path --jq '.[] | .repository.fullName + \"/\" + .path' 'filename:.env PRIVATE_KEY' \u003e potential_dotenv_leaks.txt`\n\n\nMake sure that redis is running \nTo start redis:\n- `docker-compose up redis`\n\n### Usage (Manual) \n\nStart Crawler Worker(s):\n- `rq worker -c config web3_crawler_queue --url redis://localhost:6379/0`\n\nStart Analyzer Worker(s):\n- `rq worker -c config web3_analyzer_queue --url redis://localhost:6379/1`\n\nSubmit Initial Jobs:\n- `python gitsens/submit_jobs.py`\n! Warning. If you're submitting jobs to analyzer directly, specify (populize) file, commonly named `direct_repos_to_analyze.txt`. \nTo check the details, look at the `gitsens/submit_jobs` implementation.\n\n## Installation (Docker)\n\nSimply make sure you have docker engine, docker-compose\nBut you should probably login in your gh via github-cli. Docker-compose will mount this folder from your local machine \n```yaml\nvolumes:\n  - ~/.config/gh:/root/.config/gh:ro\n```\n\n### Usage (Docker) \n- `docker-compose up --build`\n\nCheck logs \n\n`docker-compose logs -f crawler_worker`\n`docker-compose logs -f analyzer_worker`\n\n\n## Process\nAfter start the analyzer will clone with batch the specified repository in \n`./analysis_output`\n\nAnd the tree will looks like\n\n```\nanalysis_output\n    ├── cloned_repos \n    ├── custom_regex_findings\n    ├── dangling_blobs\n    ├── restored_files\n    └── trufflehog_findings\n```\n\n## Constraints \n- GH api limit 50000 requrests/hour\n- Limited heuristics\n- No post-analysis of output data \n\n## How to win \n\nIt's a secret. Stay tuned.\n\n\n## Credits \n\nThanks trufflehog for the great security and reconnaissance tool!\nYou can find it at - https://github.com/trufflesecurity/trufflehog\n\nP.S: Honestly, I just wanted to get a better understanding of the git\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvasillieux%2Fgitsens","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvasillieux%2Fgitsens","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvasillieux%2Fgitsens/lists"}