{"id":13680479,"url":"https://github.com/opsdisk/metagoofil","last_synced_at":"2025-05-15T18:03:38.442Z","repository":{"id":41559681,"uuid":"45323146","full_name":"opsdisk/metagoofil","owner":"opsdisk","description":"Search Google and download specific file types","archived":false,"fork":false,"pushed_at":"2025-02-08T21:09:10.000Z","size":84,"stargazers_count":474,"open_issues_count":4,"forks_count":100,"subscribers_count":21,"default_branch":"master","last_synced_at":"2025-05-15T18:03:19.227Z","etag":null,"topics":["dork","google","information-leakage","metagoofil","osint","penetration-testing","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opsdisk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-11-01T00:34:44.000Z","updated_at":"2025-05-11T08:26:01.000Z","dependencies_parsed_at":"2025-03-10T20:44:18.937Z","dependency_job_id":null,"html_url":"https://github.com/opsdisk/metagoofil","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opsdisk%2Fmetagoofil","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opsdisk%2Fmetagoofil/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opsdisk%2Fmetagoofil/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opsdisk%2Fmetagoofil/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opsdisk","download_url":"https://codeload.github.com/opsdisk/metagoofil/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254394719,"owners_count":22063984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dork","google","information-leakage","metagoofil","osint","penetration-testing","python"],"created_at":"2024-08-02T13:01:17.513Z","updated_at":"2025-05-15T18:03:38.377Z","avatar_url":"https://github.com/opsdisk.png","language":"Python","readme":"# Metagoofil\n\n## Introduction\n\nmetagoofil searches Google for specific types of files being publicly hosted on a web site and optionally downloads them\nto your local box.  This is useful for Open Source Intelligence gathering, penetration tests, or determining what files\nyour organization is leaking to search indexers like Google.  As an example, it uses the Google query below to find all\nthe `.pdf` files being hosted on `example.com` and optionally downloads a local copy.\n\n```none\nsite:example.com filetype:pdf\n```\n\nThis is a maintained fork of the original \u003chttps://github.com/laramies/metagoofil\u003e and is currently installed by default\non the Kali Operating System \u003chttps://gitlab.com/kalilinux/packages/metagoofil\u003e.  Unlike the original, a design decision\nwas made to not do metadata analysis and instead defer to other tools like `exiftool`.\n\n```bash\nexiftool -r *.doc | egrep -i \"Author|Creator|Email|Producer|Template\" | sort -u\n```\n\nComments, suggestions, and improvements are always welcome.  Be sure to follow [@opsdisk](https://twitter.com/opsdisk)\non Twitter for the latest updates.\n\n## Installation\n\nClone the git repository and install the requirements\n\n```bash\ngit clone https://github.com/opsdisk/metagoofil\ncd metagoofil\nvirtualenv -p python3 .venv  # If using a virtual environment.\nsource .venv/bin/activate  # If using a virtual environment.\npip install -r requirements.txt\n```\n\n## Docker Installation \u0026 Usage\n\n```bash\ngit clone https://github.com/opsdisk/metagoofil\ncd metagoofil\ndocker build -t metagoofil .\n\n# This will save the files in the host ./data directory.\ndocker run -v $PWD/data:/data metagoofil -d github.com -f -n 10 -r 4 -t pdf -w\n```\n\n## Google is blocking me!\n\nIf you start getting HTTP 429 errors, Google has rightfully detected you as a bot and will block your IP for a set\nperiod of time.  One solution is to use proxychains and a bank of proxies to round robin the lookups.\n\nInstall proxychains4\n\n```bash\napt install proxychains4 -y\n```\n\nEdit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers.  In\nthe example below, 2 different dynamic SOCKS proxies have been set up with different local listening ports (9050 and\n9051).  Don't know how to utilize SSH and dynamic SOCKS proxies?  Do yourself a favor and pick up a copy of [Cyber\nPlumber's Handbook and interactive lab](https://gumroad.com/l/cph_book_and_lab) to learn all about Secure Shell (SSH)\ntunneling, port redirection, and bending traffic like a boss.\n\n```bash\nvim /etc/proxychains4.conf\n```\n\n```bash\nround_robin\nchain_len = 1\nproxy_dns\nremote_dns_subnet 224\ntcp_read_time_out 15000\ntcp_connect_time_out 8000\n[ProxyList]\nsocks4 127.0.0.1 9050\nsocks4 127.0.0.1 9051\n```\n\nThrow `proxychains4` in front of the Python script and each lookup will go through a different proxy (and thus source\nfrom a different IP).  You could even tune down the `-e` delay time because you will be leveraging different proxy\nboxes.\n\n```bash\nproxychains4 python metagoofil.py -d https://github.com -f -t pdf,doc,xls\n```\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopsdisk%2Fmetagoofil","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopsdisk%2Fmetagoofil","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopsdisk%2Fmetagoofil/lists"}