{"id":13438598,"url":"https://github.com/rndinfosecguy/Scavenger","last_synced_at":"2025-03-20T06:30:48.257Z","repository":{"id":40668910,"uuid":"124684873","full_name":"rndinfosecguy/Scavenger","owner":"rndinfosecguy","description":"Crawler (Bot) searching for credential leaks on paste sites.","archived":false,"fork":false,"pushed_at":"2022-03-31T16:12:50.000Z","size":102,"stargazers_count":618,"open_issues_count":1,"forks_count":120,"subscribers_count":29,"default_branch":"master","last_synced_at":"2024-10-28T00:23:15.606Z","etag":null,"topics":["bot","crawler","credentials","leaks","osint","paste","pastebin","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rndinfosecguy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-10T18:10:03.000Z","updated_at":"2024-10-25T22:45:11.000Z","dependencies_parsed_at":"2022-07-14T05:00:35.138Z","dependency_job_id":null,"html_url":"https://github.com/rndinfosecguy/Scavenger","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rndinfosecguy%2FScavenger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rndinfosecguy%2FScavenger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rndinfosecguy%2FScavenger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rndinfosecguy%2FScavenger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rndinfosecguy","download_url":"https://codeload.github.com/rndinfosecguy/Scavenger/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244564993,"owners_count":20473175,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot","crawler","credentials","leaks","osint","paste","pastebin","python"],"created_at":"2024-07-31T03:01:06.754Z","updated_at":"2025-03-20T06:30:43.241Z","avatar_url":"https://github.com/rndinfosecguy.png","language":"Python","readme":"# Scavenger - OSINT Bot - REWORKED\n\n---\n\n[bot in action](https://twitter.com/leak_scavenger)\n\n---\n\n[![Anurag's GitHub stats](https://github-readme-stats.vercel.app/api?username=rndinfosecguy)](https://github.com/anuraghazra/github-readme-stats)\n\n---\n\n## Intro\nJust the code of my OSINT bot searching for sensitive data leaks on paste sites.\n\nSearch terms:\n- credentials\n- private RSA keys\n- Wordpress configuration files\n- MySQL connect strings\n- onion links\n- SQL dumps\n- API keys\n- complete emails\n\nSearch terms can be customized. You can learn more about it in the configuration section.\n\n## Articles About Scavenger\n- https://jakecreps.com/2019/05/08/osint-collection-tools-for-pastebin/\n- https://jakecreps.com/2019/01/08/scavenger/\n- https://youtu.be/VCwiZ2dh17Q?t=51 (the bot is mentioned here)\n\n## Main Features\n\nFor pastebin.com the bot has two modes:\n- looking for sensitive data in the archive via scraping\n- looking for sensitive data by tracking users who publish leaks\n\nAdditional features:\n- customizable search terms\n- scan folders with text files for sensitive information\n\n## Configuration\n\n1. Delete the README.md files in every subfolder as they are only placeholders \n2. The bot searches for email:password combinations and other kinds sensitive data by default. If you want to add more search terms edit the __configs/searchterms.txt__ file or use the -3 switch in the control script\nDefault __configs/searchterms.txt__ configuration:\n```console\nmysqli_connect(\nBEGIN RSA PRIVATE KEY\nThe name of the database for WordPress\napiKey:\nReturn-Path:\ninsert into\nINSERT INTO\n.onion\n```\nIf you want to add other search terms just add them to file line by line.\nYou know a useful search terms which is missing here? Tell me! :-)\n3. For the user tracking module of pastebin.com you need to add the target users line by line to the __configs/users.txt__ file.\n\n## Usage\n\nProgram help:\n```console\n$ python3 scavenger.py -h\n\n  _________\n /   _____/ ____ _____ ___  __ ____   ____    ____   ___________\n \\_____  \\_/ ___\\\\__  \\\\  \\/ // __ \\ /    \\  / ___\\_/ __ \\_  __ \\\n /        \\  \\___ / __ \\\\   /\\  ___/|   |  \\/ /_/  \u003e  ___/|  | \\/\n/_______  /\\___  \u003e____  /\\_/  \\___  \u003e___|  /\\___  / \\___  \u003e__|\n        \\/     \\/     \\/          \\/     \\//_____/      \\/       Reworked\n\nusage: scavenger.py [-h] [-0] [-1] [-2] [-3] [-4]\n\ncontrol script\n\noptional arguments:\n  -h, --help           show this help message and exit\n  -0, --pbincom        Activate pastebin.com archive scraping module\n  -1, --pbincomTrack   Activate pastebin.com user tracking module\n  -2, --sensitivedata  Search a specific folder for sensitive data. This might\n                       be useful if you want to analyze some pastes which\n                       were not collected by the bot.\n  -3, --editsearch     Edit search terms file for additional search terms\n                       (email:password combinations will always be searched)\n  -4, --editusers      Edit user file of the pastebin.com user track module\n\nexample usage: python3 scavenger.py -0 -1\n```\n\nCrawled pastes are stored at different locations depending on their status.\n- Paste crawled but nothing was detected -\u003e __data/raw_pastes__\n- Paste crawled and an email:password combination was detected -\u003e __data/raw_pastes__ and __data/files_with_passwords__\n- Paste crawled and other sensitive data was detected -\u003e __data/raw_pastes__ and __data/otherSensitivePastes__\n\nPastes get stored in data/raw_pastes until they reach a limit of 48000 files.\nOnce there are more then 48000 pastes they get ziped and moved to the archive folder.\n\n---\n\nStart the pastebin.com archive scraping module\n```console\n$ python3 scavenger.py -0\n```\nStart pastebin.com user tracking module\n```console\n$ python3 scavenger.py -1\n```\nWhen starting one of these modules, a tmux session with the running module is created in the background.\n\nList tmux sessions\n```console\n$ tmux ls\npastebincomArchive: 1 windows (created Sun Apr 14 06:33:32 2021) [204x58]\npastebincomTrack: 1 windows (created Sun Apr 14 06:33:32 2021) [204x58]\n```\nInteract with a tmux session example\n\n```console\n$ tmux a -t pastebincomArchive\n$ tmux a -t pastebincomTrack\n```\n\nTo detach from a session hit STRG+b d.\n\n---\n\nIf you want to start a module without using the control software you can do this by calling them directly.\n\nPastebin.com archive scraper\n```console\n$ python3 pbincomArchiveScrape.py\n```\n\nPastebin.com user tracker\n```console\n$ python3 pbincomTrackUser.py\n```\n\nSearch specific folder for sensitive data:\n```console\n$ python3 findSensitiveData.py TARGET_FOLDER\n```\n\n---\n\n## To Do\n\nIf you miss anything and want me to add features or make changes, just let me know via Twitter or GitHub issue :-)\n\n","funding_links":[],"categories":["Asset Discovery","[↑](#contents)Data Leaks","Python","Python (1887)"],"sub_categories":["Data Leaks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frndinfosecguy%2FScavenger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frndinfosecguy%2FScavenger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frndinfosecguy%2FScavenger/lists"}