{"id":20899999,"url":"https://github.com/the404hacking/urlextractor","last_synced_at":"2025-06-21T23:06:11.928Z","repository":{"id":109047952,"uuid":"120496900","full_name":"The404Hacking/URLExtractor","owner":"The404Hacking","description":"Information Gathering \u0026 WebSite ReConnaissance.","archived":false,"fork":false,"pushed_at":"2018-02-08T21:06:38.000Z","size":466,"stargazers_count":22,"open_issues_count":0,"forks_count":7,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-05-13T01:37:27.383Z","etag":null,"topics":["bash","information-gathering","information-gathering-tools","kali","kali-linux","linux","reconnaissance","scan","scaner","scanner","script","sh","site","the404hacking","urlextractor","web","website"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/The404Hacking.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-06T17:32:14.000Z","updated_at":"2025-04-18T19:39:58.000Z","dependencies_parsed_at":"2023-05-04T02:32:57.599Z","dependency_job_id":null,"html_url":"https://github.com/The404Hacking/URLExtractor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/The404Hacking/URLExtractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The404Hacking%2FURLExtractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The404Hacking%2FURLExtractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The404Hacking%2FURLExtractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The404Hacking%2FURLExtractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/The404Hacking","download_url":"https://codeload.github.com/The404Hacking/URLExtractor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/The404Hacking%2FURLExtractor/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261206115,"owners_count":23124838,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","information-gathering","information-gathering-tools","kali","kali-linux","linux","reconnaissance","scan","scaner","scanner","script","sh","site","the404hacking","urlextractor","web","website"],"created_at":"2024-11-18T11:17:18.614Z","updated_at":"2025-06-21T23:06:06.915Z","avatar_url":"https://github.com/The404Hacking.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# URLExtractor\n\nInformation gathering \u0026 website reconnaissance\n\n## Screenshot\n![Screenshot](Screenshot.jpg?raw=ture \"Screenshot\")\n\n------\n\n**Usage:**\n`./extractor.sh http://www.target.org/`\n\n\nFeatures:\n------\n\n* IP and hosting info like city and country (using [FreegeoIP](http://freegeoip.net/))\n* DNS servers (using [dig](http://packages.ubuntu.com/precise/dnsutils))\n* ASN, Network range, ISP name (using [RISwhois](https://www.ripe.net/analyse/archived-projects/ris-tools-web-interfaces/riswhois))\n* Load balancer test\n* Whois for abuse mail (using [Spamcop](https://www.spamcop.net/))\n* PAC (Proxy Auto Configuration) file\n* Compares hashes to diff code\n* robots.txt (recursively looking for hidden stuff)\n* Source code (looking for passwords and users)\n* External links (frames from other websites)\n* Directory FUZZ (like Dirbuster and Wfuzz - using [Dirbuster](https://www.owasp.org/index.php/Category:OWASP_DirBuster_Project)) directory list)\n* [URLvoid](http://www.urlvoid.com/) API - checks Google page rank, Alexa rank and possible blacklists \n* Provides useful links at other websites to correlate with IP/ASN\n* Option to open ALL results in browser at the end\n\nChangelog to version 0.1.9:\n------\n\n* Abuse mail using lynx istead of ~~curl~~\n* Target server name parsing fixed\n* More verbose about HTTP codes and directory discovery\n* MD5 collection for IP fixed\n* Links found now show unique URLs from array\n* [New feature] **Google** results\n* [New feature] **Bing** IP check for other hosts/vhosts\n* [New feature] Opened ports from **Shodan**\n* [New feature] **VirusTotal** information about IP\n* [New feature] **Alexa Rank** information about $TARGET_HOST\n\nRequirements:\n------\n\nTested on Kali light mini AND OSX 10.11.3 with brew\n```\nsudo apt-get install bc curl dnsutils libxml2-utils whois md5sha1sum lynx -y\n```\n\n**Configuration file:**\n```\nCURL_TIMEOUT=15 #timeout in --connect-timeout\nCURL_UA=Mozilla #user-agent (keep it simple)\nINTERNAL=NO #YES OR NO (show internal network info)\nURLVOID_KEY=your_API_key #using API from http://www.urlvoid.com/\nFUZZ_LIMIT=10 #how many lines it will read from fuzz file\nOPEN_TARGET_URLS=NO #open found URLs at the end of script\nOPEN_EXTERNAL_LINKS=NO #open external links (frames) at the end of script\n```\n\nTodo list:\n------\n\n* [x] Upload to github :)\n* [ ] Integration with other APIs\n* [ ] Add  host regex validation\n* [ ] Use GNU parallel to fuzz URLs\n* [ ] Export to CSV\n* [ ] Possible migration to python\n* [ ] Integration with JoomScan/WPScan/CMSmap\n* [ ] Integration with CipherScan\n* [ ] Check for installed packages\n\n## Download and Clone\n\u003e Download: Click [Here](https://github.com/The404Hacking/URLExtractor/archive/master.zip) (URLExtractor-master.zip)\n\n\u003e Clone: git clone [https://github.com/The404Hacking/URLExtractor.git](https://github.com/The404Hacking/URLExtractor.git)\n\n## The404Hacking | Digital Security ReSearch Group\n[The404Hacking](https://T.me/The404Hacking)\n\n## Follow us !\n[The404Hacking](https://T.me/The404Hacking) - [The404Cracking](https://T.me/The404Cracking)\n\n[Instagram](https://instagram.com/The404Hacking) - [GitHub](https://github.com/The404Hacking)\n\n[YouTube](http://yon.ir/youtube404) - [Aparat](http://www.aparat.com/The404Hacking)\n\n[Weblog](http://the404hacking.blogsky.com) - [Email](mailto:The404Hacking.Team@Gmail.Com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthe404hacking%2Furlextractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthe404hacking%2Furlextractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthe404hacking%2Furlextractor/lists"}