{"id":22799190,"url":"https://github.com/dhammon/codescuttle","last_synced_at":"2025-10-10T23:33:31.566Z","repository":{"id":159043423,"uuid":"595405474","full_name":"dhammon/CodeScuttle","owner":"dhammon","description":null,"archived":false,"fork":false,"pushed_at":"2023-02-05T23:52:21.000Z","size":18,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-05T21:43:24.499Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dhammon.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-31T02:14:49.000Z","updated_at":"2023-02-05T23:54:04.000Z","dependencies_parsed_at":"2023-05-01T22:01:20.079Z","dependency_job_id":null,"html_url":"https://github.com/dhammon/CodeScuttle","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhammon%2FCodeScuttle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhammon%2FCodeScuttle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhammon%2FCodeScuttle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhammon%2FCodeScuttle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dhammon","download_url":"https://codeload.github.com/dhammon/CodeScuttle/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246365650,"owners_count":20765549,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-12T07:08:15.155Z","updated_at":"2025-10-10T23:33:26.507Z","avatar_url":"https://github.com/dhammon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CodeScuttle\n*Find leaked source code*\n```\n                                                          0\n                                             ____         \n                                            /    |        o    0\n                                           /     |           o \n                                          |    o |__       0\n                                          |    o |  |___   o\n/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~|    o |   ___|o ~/\\~/\\~/\\~/\\~\n/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~|    o |  |   ~/\\~/\\~/\\~/\\~/\\~\n/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~\n/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~/\\~\n```\n\u003e Warning! This minimally viable product/project is in an alpha stage, use at your own discretion and check back for updates\n\n# Purpose\nSometimes an organization's private source code is pushed onto GitHub as public repositories.  The purpose of this project is to give organizations an opensource tool to search GitHub for public repositories that match their source code's fingerprint.\n\n# How it works\nCodeScuttle searches GitHub public repositories using your configured queries and exclusions.  Utilizing GitHub's authenticated API, you can quickly and systematically identify when source code is leaked.\n\n\u003e Warning! Sometimes GitHub API fails to return results due to their indexing and search service.\n\n\u003e Warning! GitHub subjects users to API rate limits.  We've added logic to pause and retry when these limits are hit.  We are not responsible for your use of this software and any violations against GitHub's use policies.\n\n\n# Installation\n```\ngit clone https://github.com/dhammon/CodeScuttle\npip install -r requirements.txt\n```\n\n# Configuration\n1. Rename `config.py.example` to `config.py`\n2. Insert your GitHub API token as the `token` value in `config.py`\n3. Insert your search parameters as `queries` in `config.py`.\n4. Optional: Insert any exclude parameters as `excludes` in `config.py`\n\n\u003e Warning! GitHub doesn't search partial strings.  Example, if you are tring to find \"ThisExampleString\" and search \"ThisExample\" GitHub won't return results with \"ThisExampleString\".  GitHub search also doesn't allow for wildcards - so \"ThisExample*\" won't work either.\n\n# Use\n```\n./codescuttle.py\n```\n\n# Use Cases\n\u003e Warning! CodeScuttle only returns 30 GitHub API results per query in the `queries` section of `config.py`.  This includes results that may later be excluded using `excludes` settings in the `config.py` file.\n\n## Source Code\nInclude source code strings in double quotes with spaces between them and use as entries in the `config.py` file's `queries` section.  For example, your source code file has `someDescriptiveFunctionName` and `someDescriptiveVariableName`.  A query entry in `config.py` would look something like this:\n```\n    queries = {\n        \"mySearch\": {\n            \"description\": \"Searching for my secret source code\",\n            \"query\": '\"someDescriptiveFunctionName\" \"someDescriptiveVariableName\"'\n        },\n```\n\n## Canary Tokens\nConsider generating a long random token (canary) and insert as a comment in files you wish to monitor.  Then add this token value as a entry in the `config.py` file's `queries` section.  A query entry in the `config.py` might look something like this:\n```\n    queries = {\n        \"canary\": {\n            \"description\": \"Searching for my canary token\",\n            \"query\": '\"9edab40c7c70577cbc307c6d5894fe77\"'\n        },\n```\n\n\n## Secrets (or maybe not)\nYou 'could' use secret values as search parameters, but consider the following:\n1. Storing secrets in the config file isn't great\n2. Search parameters in GitHub's API are via GET method and could be logged by intermediaries and/or GitHub\n\n\n## Exclude Results\nThere could be false positives that you'll want to remove from the output of CodeScuttle.  You can ignore such results by writing and `excludes` entry in the `config.py` file.  For example, say you wanted to ignore any GitHub search results that included the string `dhammon`, an exclude entry might look something like this:\n```\n    excludes = {\n        \"allowList\": {\n            \"username\": \"dhammon\",\n        },\n```\nOr perhaps you want to exclude all results that include `dhammon` and the term `CodeScuttle`, an entry might then look like the following:\n```\n    excludes = {\n        \"allowList\": {\n            \"username\": \"dhammon\",\n            \"project\": \"CodeScuttle\"\n        },\n```\n\n---\nThank you for checking out CodeScuttle and happy hunting! -Daniel","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhammon%2Fcodescuttle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdhammon%2Fcodescuttle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhammon%2Fcodescuttle/lists"}