{"id":13537970,"url":"https://github.com/opsdisk/pagodo","last_synced_at":"2025-05-14T00:09:54.163Z","repository":{"id":37706219,"uuid":"66045666","full_name":"opsdisk/pagodo","owner":"opsdisk","description":"pagodo (Passive Google Dork) - Automate Google Hacking Database scraping and searching","archived":false,"fork":false,"pushed_at":"2025-04-27T22:51:05.000Z","size":1559,"stargazers_count":2971,"open_issues_count":2,"forks_count":508,"subscribers_count":86,"default_branch":"master","last_synced_at":"2025-05-13T10:07:05.107Z","etag":null,"topics":["bugbounty","dork","ghdb","google","google-dork","google-dorks","google-hacking-database","osint","osint-python","python","yagooglesearch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opsdisk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-08-19T02:21:53.000Z","updated_at":"2025-05-12T16:26:35.000Z","dependencies_parsed_at":"2023-02-09T23:45:31.232Z","dependency_job_id":"d375cccd-5526-40da-a8e5-e345e1d88cb9","html_url":"https://github.com/opsdisk/pagodo","commit_stats":{"total_commits":99,"total_committers":12,"mean_commits":8.25,"dds":"0.31313131313131315","last_synced_commit":"e99e7b13e1c5762052e3bc8bc38197373d5c25f9"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opsdisk%2Fpagodo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opsdisk%2Fpagodo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opsdisk%2Fpagodo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opsdisk%2Fpagodo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opsdisk","download_url":"https://codeload.github.com/opsdisk/pagodo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254044224,"owners_count":22005104,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bugbounty","dork","ghdb","google","google-dork","google-dorks","google-hacking-database","osint","osint-python","python","yagooglesearch"],"created_at":"2024-08-01T09:01:05.381Z","updated_at":"2025-05-14T00:09:49.153Z","avatar_url":"https://github.com/opsdisk.png","language":"Python","readme":"# pagodo - Passive Google Dork\n\n## Introduction\n\n`pagodo` automates Google searching for potentially vulnerable web pages and applications on the Internet.  It replaces\nmanually performing Google dork searches with a web GUI browser.\n\nThere are 2 parts.  The first is `ghdb_scraper.py` that retrieves the latest Google dorks and the second portion is\n`pagodo.py` that leverages the information gathered by `ghdb_scraper.py`.\n\nThe core Google search library now uses the more flexible [yagooglesearch](https://github.com/opsdisk/yagooglesearch)\ninstead of [googlesearch](https://github.com/MarioVilas/googlesearch).  Check out the [yagooglesearch\nREADME](https://github.com/opsdisk/yagooglesearch/blob/master/README.md) for a more in-depth explanation of the library\ndifferences and capabilities.\n\nThis version of `pagodo` also supports native HTTP(S) and SOCKS5 application support, so no more wrapping it in a tool\nlike `proxychains4` if you need proxy support.  You can specify multiple proxies to use in a round-robin fashion by\nproviding a comma separated string of proxies using the `-p` switch.\n\n## What are Google dorks?\n\nOffensive Security maintains the Google Hacking Database (GHDB) found here:\n\u003chttps://www.exploit-db.com/google-hacking-database\u003e.  It is a collection of Google searches, called dorks, that can be\nused to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.\n\n## Terms and Conditions\n\nThe terms and conditions for `pagodo` are the same terms and conditions found in\n[yagooglesearch](https://github.com/opsdisk/yagooglesearch#terms-and-conditions).\n\nThis code is supplied as-is and you are fully responsible for how it is used.  Scraping Google Search results may\nviolate their [Terms of Service](https://policies.google.com/terms).  Another Python Google search library had some\ninteresting information/discussion on it:\n\n* [Original issue](https://github.com/aviaryan/python-gsearch/issues/1)\n* [A response](https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431\u003e)\n* Author created a separate [Terms and Conditions](https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md)\n* ...that contained link to this [blog](https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/)\n\nGoogle's preferred method is to use their [API](https://developers.google.com/custom-search/v1/overview).\n\n## Installation\n\nScripts are written for Python 3.6+.  Clone the git repository and install the requirements.\n\n```bash\ngit clone https://github.com/opsdisk/pagodo.git\ncd pagodo\npython3 -m venv .venv  # If using a virtual environment.\nsource .venv/bin/activate  # If using a virtual environment.\npip install -r requirements.txt\n```\n\n## ghdb_scraper.py\n\nTo start off, `pagodo.py` needs a list of all the current Google dorks.  The repo contains a `dorks/` directory with the\ncurrent dorks when the `ghdb_scraper.py` was last run. It's advised to run `ghdb_scraper.py` to get the freshest data\nbefore running `pagodo.py`.  The `dorks/` directory contains:\n\n* the `all_google_dorks.txt` file which contains all the Google dorks, one per line\n* the `all_google_dorks.json` file which is the JSON response from GHDB\n* Individual category dorks\n\nDork categories:\n\n```python\ncategories = {\n    1: \"Footholds\",\n    2: \"File Containing Usernames\",\n    3: \"Sensitives Directories\",\n    4: \"Web Server Detection\",\n    5: \"Vulnerable Files\",\n    6: \"Vulnerable Servers\",\n    7: \"Error Messages\",\n    8: \"File Containing Juicy Info\",\n    9: \"File Containing Passwords\",\n    10: \"Sensitive Online Shopping Info\",\n    11: \"Network or Vulnerability Data\",\n    12: \"Pages Containing Login Portals\",\n    13: \"Various Online devices\",\n    14: \"Advisories and Vulnerabilities\",\n}\n```\n\n### Using ghdb_scraper.py as a script\n\nWrite all dorks to `all_google_dorks.txt`, `all_google_dorks.json`, and individual categories if you want more\ncontextual data about each dork.\n\n```bash\npython ghdb_scraper.py -s -j -i\n```\n\n### Using ghdb_scraper as a module\n\nThe `ghdb_scraper.retrieve_google_dorks()` function returns a dictionary with the following data structure:\n\n```python\nghdb_dict = {\n    \"total_dorks\": total_dorks,\n    \"extracted_dorks\": extracted_dorks,\n    \"category_dict\": category_dict,\n}\n```\n\nUsing a Python shell (like `python` or `ipython`) to explore the data:\n\n```python\nimport ghdb_scraper\n\ndorks = ghdb_scraper.retrieve_google_dorks(save_all_dorks_to_file=True)\ndorks.keys()\ndorks[\"total_dorks\"]\n\ndorks[\"extracted_dorks\"]\n\ndorks[\"category_dict\"].keys()\n\ndorks[\"category_dict\"][1][\"category_name\"]\n```\n\n## \u003cspan\u003epagodo.py\u003c/span\u003e\n\n### Using \u003cspan\u003epagodo.py\u003c/span\u003e as a script\n\n```bash\npython pagodo.py -d example.com -g dorks.txt \n```\n\n### Using pagodo as a module\n\nThe `pagodo.Pagodo.go()` function returns a dictionary with the data structure below (dorks used are made up examples):\n\n```python\n{\n    \"dorks\": {\n        \"inurl:admin\": {\n            \"urls_size\": 3,\n            \"urls\": [\n                \"https://github.com/marmelab/ng-admin\",\n                \"https://github.com/settings/admin\",\n                \"https://github.com/akveo/ngx-admin\",\n            ],\n        },\n        \"inurl:gist\": {\n            \"urls_size\": 3,\n            \"urls\": [\n                \"https://gist.github.com/\",\n                \"https://gist.github.com/index\",\n                \"https://github.com/defunkt/gist\",\n            ],\n        },\n    },\n    \"initiation_timestamp\": \"2021-08-27T11:35:30.638705\",\n    \"completion_timestamp\": \"2021-08-27T11:36:42.349035\",\n}\n```\n\nUsing a Python shell (like `python` or `ipython`) to explore the data:\n\n```python\nimport pagodo\n\npg = pagodo.Pagodo(\n    google_dorks_file=\"dorks.txt\",\n    domain=\"github.com\",\n    max_search_result_urls_to_return_per_dork=3,\n    save_pagodo_results_to_json_file=None,  # None = Auto-generate file name, otherwise pass a string for path and filename.\n    save_urls_to_file=None,  # None = Auto-generate file name, otherwise pass a string for path and filename.\n    verbosity=5,\n)\npagodo_results_dict = pg.go()\n\npagodo_results_dict.keys()\n\npagodo_results_dict[\"initiation_timestamp\"]\n\npagodo_results_dict[\"completion_timestamp\"]\n\nfor key,value in pagodo_results_dict[\"dorks\"].items():\n    print(f\"dork: {key}\")\n    for url in value[\"urls\"]:\n        print(url)\n```\n\n## Tuning Results\n\n## Scope to a specific domain\n\nThe `-d` switch can be used to scope the results to a specific domain and functions as the Google search operator:\n\n```none\nsite:github.com\n```\n\n### Wait time between Google dork searchers\n\n* `-i` - Specify the **minimum** delay between dork searches, in seconds.  Don't make this too small, or your IP will\nget HTTP 429'd quickly.\n* `-x` - Specify the **maximum** delay between dork searches, in seconds.  Don't make this too big or the searches will\ntake a long time.\n\nThe values provided by `-i` and `-x` are used to generate a list of 20 randomly wait times, that are randomly selected\nbetween each different Google dork search.\n\n### Number of results to return\n\n`-m` - The total max search results to return per Google dork.  Each Google search request can pull back at most 100\nresults at a time, so if you pick `-m 500`, 5 separate search queries will have to be made for each Google dork search,\nwhich will increase the amount of time to complete.\n\n### Save Output\n\n`-o [optional/path/to/results.json]` - Save output to a JSON file.  If you do not specify a filename, a datetimestamped\none will be generated.\n\n`-s [optional/path/to/results.txt]` - Save URLs to a text file.  If you do not specify a filename, a datetimestamped one\nwill be generated.\n\n### Save logs\n\n`--log [optional/path/to/file.log]` - Save logs to the specified file.  If you do not specify a filename, the default\nfile `pagodo.py.log` at the root of pagodo directory will be used.\n\n## Google is blocking me!\n\nPerforming 7300+ search requests to Google as fast as possible will simply not work.  Google will rightfully detect it\nas a bot and block your IP for a set period of time.  One solution is to use a bank of HTTP(S)/SOCKS proxies and pass\nthem to `pagodo`\n\n### Native proxy support\n\nPass a comma separated string of proxies to `pagodo` using the `-p` switch.\n\n```bash\npython pagodo.py -g dorks.txt -p http://myproxy:8080,socks5h://127.0.0.1:9050,socks5h://127.0.0.1:9051\n```\n\nYou could even decrease the `-i` and `-x` values because you will be leveraging different proxy IPs.  The proxies passed\nto `pagodo` are selected by round robin.\n\n### proxychains4 support\n\nAnother solution is to use `proxychains4` to round robin the lookups.\n\nInstall `proxychains4`\n\n```bash\napt install proxychains4 -y\n```\n\nEdit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers.  In\nthe example below, 2 different dynamic socks proxies have been set up with different local listening ports (9050 and\n9051).\n\n```bash\nvim /etc/proxychains4.conf\n```\n\n```ini\nround_robin\nchain_len = 1\nproxy_dns\nremote_dns_subnet 224\ntcp_read_time_out 15000\ntcp_connect_time_out 8000\n[ProxyList]\nsocks4 127.0.0.1 9050\nsocks4 127.0.0.1 9051\n```\n\nThrow `proxychains4` in front of the `pagodo.py` script and each *request* lookup will go through a different proxy (and\nthus source from a different IP).\n\n```bash\nproxychains4 python pagodo.py -g dorks/all_google_dorks.txt -o [optional/path/to/results.json] -s [optional/path/to/results.txt]\n```\n\nNote that this may not appear natural to Google if you:\n\n1) Simulate \"browsing\" to `google.com` from IP #1\n2) Make the first search query from IP #2\n3) Simulate clicking \"Next\" to make the second search query from IP #3\n4) Simulate clicking \"Next to make the third search query from IP #1\n\nFor that reason, using the built in `-p` proxy support is preferred because, as stated in the `yagooglesearch`\ndocumentation, the \"provided proxy is used for the entire life cycle of the search to make it look more human, instead\nof rotating through various proxies for different portions of the search.\"\n\n## License\n\nDistributed under the GNU General Public License v3.0. See [LICENSE](./LICENSE) for more information.\n\n## Contact\n\n[@opsdisk](https://twitter.com/opsdisk)\n\nProject Link: [https://github.com/opsdisk/pagodo](https://github.com/opsdisk/pagodo)\n","funding_links":[],"categories":["Open Sources Intelligence (OSINT)","\u003ca id=\"9eee96404f868f372a6cbc6769ccb7f8\"\u003e\u003c/a\u003e新添加的","[↑](#-table-of-contents) Google Dorks","Python","Weapons","\u003ca id=\"9eee96404f868f372a6cbc6769ccb7f8\"\u003e\u003c/a\u003e工具","Tools","OSINT Tools"],"sub_categories":["Dorking tools","\u003ca id=\"31185b925d5152c7469b963809ceb22d\"\u003e\u003c/a\u003e新添加的","Tools","OSINT Tools","Web Vulnerability Scanners"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopsdisk%2Fpagodo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopsdisk%2Fpagodo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopsdisk%2Fpagodo/lists"}