{"id":25848009,"url":"https://github.com/cr0mb/python-web-scanner","last_synced_at":"2026-05-25T16:30:58.331Z","repository":{"id":234588015,"uuid":"789195773","full_name":"Cr0mb/Python-Web-Scanner","owner":"Cr0mb","description":"How to Identify Active Servers \u0026 Organize IPs","archived":false,"fork":false,"pushed_at":"2025-03-16T21:43:33.000Z","size":8403,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-16T22:29:39.545Z","etag":null,"topics":["ip","ip-address-geolocation","parsing","python","redirect-urls","regex","rtsp","web","web-scanning","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Cr0mb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-19T22:43:56.000Z","updated_at":"2025-03-16T21:43:36.000Z","dependencies_parsed_at":"2024-06-26T00:27:27.754Z","dependency_job_id":"eaea8c58-ffb0-4dc6-a9d9-2aba37546d57","html_url":"https://github.com/Cr0mb/Python-Web-Scanner","commit_stats":null,"previous_names":["cr0mb/python-web-scanner"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Cr0mb/Python-Web-Scanner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cr0mb%2FPython-Web-Scanner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cr0mb%2FPython-Web-Scanner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cr0mb%2FPython-Web-Scanner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cr0mb%2FPython-Web-Scanner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Cr0mb","download_url":"https://codeload.github.com/Cr0mb/Python-Web-Scanner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cr0mb%2FPython-Web-Scanner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33483732,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-25T14:31:05.219Z","status":"ssl_error","status_checked_at":"2026-05-25T14:31:02.878Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ip","ip-address-geolocation","parsing","python","redirect-urls","regex","rtsp","web","web-scanning","web-scraping"],"created_at":"2025-03-01T10:38:48.973Z","updated_at":"2026-05-25T16:30:58.325Z","avatar_url":"https://github.com/Cr0mb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python-Web-Scanner\n\n[Watch the YouTube Video](https://www.youtube.com/watch?v=pT0DIE-ReMk\u0026t=4s)\n\nPython-based toolset for conducting advanced reconnaissance tasks on web resources associated with IP addresses. It leverages asyncio for concurrent scanning, aiohttp for asynchronous HTTP requests, and integrates various utilities for handling files, terminal interactions, and network-related tasks.\n\n## Prerequisites\n\nBefore getting started, make sure you have the following installed:\n\n- Python 3.x\n- Pip (Python package manager)\n  - `asyncio` (Standard library for asynchronous programming)\n  - `aiohttp` (For asynchronous HTTP requests)\n  - `colorama` (For colored terminal output)\n  - `pyfiglet` (For ASCII art text rendering)\n  - `requests` (For making HTTP requests)\n  - `re` (Standard library for regular expressions)\n  - `concurrent.futures` (For concurrent execution of tasks)\n  - `time` (Standard library for time-related functions)\n  - `socket` (Standard library for low-level networking interfaces)\n\n## Installation\n\nYou can install the required Python packages using pip:\n\n```\npip install asyncio aiohttp colorama pyfiglet requests\n```\n\nNmap also needs to be installed, if on windows install setup from here: [Nmap](https://nmap.org/download.html#windows)\nOr use [Chocolately](https://chocolatey.org/install)\n\n```\nchoco install nmap\n```\n\nIf Linux, simple:\n\n```\npip install nmap\n```\n\n## Components of the Script\n\n1. Web Scanner (breadscan.py)\n\n- Generates random IP addresses.\n- Scans for potential index pages on active web servers.\n- Logs active URLs in a file named \"sites.txt\".\n\n```\npython breadscan.py [-u] [-n NUM_ADDRESSES] [-i NUM_INSTANCES]\n\n-u: Scan unlimited IP addresses.\n-n: Number of IP addresses to scan (default: 0).\n-i: Number of instances to run concurrently (default: 1).\n```\n\n![image](https://github.com/Cr0mb/Python-Web-Scanner/assets/137664526/30698e54-aee9-4194-915f-84210bda2d89)\n\n2. URL Organizer (httplistorganizer.py)\n\n- Reads URLs from \"sites.txt\".\n- Extracts unique IP addresses and sorts them.\n- Writes sorted addresses to \"clean_sites.txt\".\n\n```\npython httplistorganizer.py\n```\n\n3. Redirection Checker (checker.py)\n\n- Checks redirections for websites listed in \"clean_sites.txt\".\n- Uses requests for handling HTTP requests and concurrent.futures for managing concurrent execution.\n  - check_redirect: Sends an HTTP GET request to the provided URL and follows any redirects (allow_redirects=True). It returns the final URL unless it contains the word 'login'.\n    - Retaining websites that contain 'login' slightly minimizes the amount of routers (not by much.)\n- Logs redirected URLs in \"output.txt\" excluding sites containing 'login'.\n\n```\npython checker.py\n```\n![capture](https://github.com/user-attachments/assets/b05033c7-e9db-47e3-bcab-abf21230d8d1)\n\n\n4. Regex for Domain Validation (cleanoutput.py)\n\n- This script will create filtered_output.txt with only the valid URL redirection pairs that point to domain names, excluding those that redirect to IP addresses.\n\n```\npython cleanoutput.py\n```\n\n5. Sitemap redirect checker (sitemap.py)\n\n- Checks to see if a given site has a sitemap.xml.\n\n```\npython sitemap.py\n```\n\n6. RTSP Port Checker (rtsp.py)\n\n- The rtsp.py script is designed to identify IP addresses with an open RTSP (Real-Time Streaming Protocol) port. RTSP ports are commonly used for streaming media servers, and detecting open ports can help in discovering accessible media services.\n\n```\npython rtsp.py\n```\n\n![image](https://github.com/Cr0mb/Python-Web-Scanner/assets/137664526/d6d052a9-896e-43f4-a2df-e5297fd5c6c8)\n\n7. IP location finder (location.py)\n\n- Scans IP addresses listed in clean_sites.txt using ipinfo.io api; retrieving location information such as:\n  \u003e Country, State, Region, ISP, Latitude, Longitude, and Organization\n- Uses the ip-api.com API.\n- it's slowed a little because of possible limiting to the api requests (error 429)\n  - if you want it be faster, edit 'time.sleep(1)' on line 54.\n\n![image](https://github.com/Cr0mb/Python-Web-Scanner/assets/137664526/f3502e19-2ab1-480c-a06f-65e7e110955e)\n\n8. Network Port Scanner (port.py)\n\n- Performs a basic network port scan on a list of URLs or IP addresses provided in an input file (clean_sites.txt).\n- It utilizes multithreading to efficiently scan for open ports on common services.\n\n![image](https://github.com/Cr0mb/Python-Web-Scanner/assets/137664526/19077671-5b88-446d-aa02-ad0e2102c862)\n\n9. SSH Address Extractor (ssh.py)\n   \u003e organize_ports.py will do the same thing, I should have thought of it first.\n\n- Extracts SSH addresses from a file (ports.txt) containing port scan results.\n- Identifies addresses that have port 22 (SSH) open and saves them to another file (ssh.txt).\n\n10. SSH Scanner using Nmap (nmap.py)\n\n- This Python script automates the scanning of SSH services on a list of IP addresses or hostnames provided in an input file (ssh.txt).\n- It utilizes Nmap to gather information about the SSH service running on port 22 and saves the results to nmap.txt.\n\n![image](https://github.com/Cr0mb/Python-Web-Scanner/assets/137664526/d3db342d-7c18-43ca-bd6b-4a915cd0afcd)\n\n11. Port Organizer (organize_ports.py)\n\n- This Python script that extracts IP addresses associated with specific ports from a given input file and saves them into separate text files based on the service name, similar to ssh.py.\n- Displays a menu of common ports and their corresponding service names.\n- Allows users to select a port and extracts IP addresses associated with that port into a text file named after the service.\n- Requires 'ports.txt'\n\n![image](https://github.com/Cr0mb/Python-Web-Scanner/assets/137664526/23278e19-5f60-4e22-adc5-8c8fb22df831)\n\n## How to Use\n\n1. Web Scanner:\n\n- Run breadscan.py with desired options to scan for active web servers.\n  Adjust the number of addresses and instances based on your requirements.\n\n2. URL Organizer:\n\n- Execute httplistorganizer.py to extract and organize unique IP addresses.\n\n3. Redirection Checker:\n\n- Run checker.py to check redirections for websites.\n\n4. Domain Validator\n\n- Run cleanoutput.py to consolidate domain links to a seperate file for organization.\n\n5. Sitemap Chcker\n\n- Run sitemap.py to index sites that contain /sitemap.xml\n\n6. RTSP Checker\n\n- Run rtsp.py to find sites that contain open port 554\n\n7. Location Finder\n\n- Run location.py to find more information on an ip address.\n\n8. Port Scanner\n\n- Run port.py to find out the most used ports that are open in a given address.\n\n9. SSH Extractor\n\n- Run ssh.py to organize sites that have ssh enabled seperately.\n\n10. Nmap Scanner\n\n- After running ssh.py or port_organizer.py, run nmap.py to find out information about the ssh type and protocol; printed to nmap.txt.\n\n11. Port Organizer\n\n- Extracts IP addresses associated with specific ports from a given input file and saves them into separate text files based on the service name\n\n## Updates\n\n```\nv1.9\n\u003e nmap.py now scans for all different ssh protocols, (\"OpenSSH\", \"Dropbear\", \"libssh\", \"libssh2\", \"Tectia\", \"PuTTY\")\n\u003e nmap.py now scans IP addresses listed in 'ssh.txt' regardless of whether they specify the :22 port explicitly in the file.\n```\n```\nv1.8\n\u003e Realized I could make a script (port_organizer.py) that can completely organize and seperate all of the ports assosciated with addresses; consolidating these ports and corresponding addresses to there own text files.\n```\n\n```\nv1.7\n\u003e Added 'nmap.py' that scans ssh.txt for the ssh type and protocol, saves this organized and nice into nmap.txt.\n```\n\n```\nV1.65\n\u003e Added 'ssh.py' to organize sites that have ssh enabled seperated into ssh.txt.\n```\n\n```\nV1.6\n\u003e Added a port scanner to scan for the most used ports, (works pretty fast.)\n```\n\n```\nv1.5\n\u003e Added a rtsp checker script to find out which sites contain media streaming under port 554.\n```\n\n```\nv1.45\n\u003e Enhanced Error Handling\n\u003e Validation for Empty or Invalid URLs\n\u003e Logging and Error / Timeout Indication\n```\n\n```\nv1.4\n\u003e added a sitemap checker script to find out which sites found contain a sitemap.\n```\n\n```\nv1.3\n\u003e Added 'Total Amount of sites: ' to read total amount of sites that are written in sites.txt\n\u003e Now any site will not be duplicated in sites.txt if scanned the same randomly generated address twice.\n```\n\n```\nV1.2\n\u003e Introduced a 'TIMEOUT' constant, so if a request takes longer than the specified timeout period, it will raise a 'requests.Timeout' exception.\n   \u003e Crucial when dealing with potentially slow or unresponsive web servers.\n```\n\n```\nV1.12\n\u003e Added \"cleanoutput.py\"; will show all website URLs ignoring IP addresses that don't link to a domain.\n\u003e You can use this after you use checker.py, this python script will grab all of the website link URLs from output.txt, ignoring the ones that redirect to an IP address.\n```\n\n```\nV1.1\n\u003e Updated \"checker.py\" so that chrome driver is no longer needed.\n\u003e Makes finding the redirected sites exponentially faster and less power hungry.\n\u003e Also no longer uses selenium.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcr0mb%2Fpython-web-scanner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcr0mb%2Fpython-web-scanner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcr0mb%2Fpython-web-scanner/lists"}