{"id":21204394,"url":"https://github.com/acidvegas/httpz","last_synced_at":"2026-03-16T09:32:34.511Z","repository":{"id":212597579,"uuid":"731873111","full_name":"acidvegas/httpz","owner":"acidvegas","description":"Hyper-fast HTTP Scraping Tool","archived":false,"fork":false,"pushed_at":"2025-02-14T06:32:43.000Z","size":5360,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-15T00:37:07.038Z","etag":null,"topics":["http-scanner","httpx","scanner","web-scanner","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/acidvegas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-15T04:39:38.000Z","updated_at":"2025-02-12T08:04:07.000Z","dependencies_parsed_at":"2024-11-20T20:34:19.973Z","dependency_job_id":"4c96c4c0-e7ef-416f-9509-6d46c74e87d9","html_url":"https://github.com/acidvegas/httpz","commit_stats":null,"previous_names":["acidvegas/httpz"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acidvegas%2Fhttpz","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acidvegas%2Fhttpz/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acidvegas%2Fhttpz/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acidvegas%2Fhttpz/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/acidvegas","download_url":"https://codeload.github.com/acidvegas/httpz/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252957006,"owners_count":21831420,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["http-scanner","httpx","scanner","web-scanner","web-scraping"],"created_at":"2024-11-20T20:32:17.708Z","updated_at":"2026-03-16T09:32:34.448Z","avatar_url":"https://github.com/acidvegas.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HTTPZ Web Scanner\n\n![](./.screens/preview.gif)\n\nA high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more.\n\n## Requirements\n\n- [Python](https://www.python.org/downloads/)\n  - [aiohttp](https://pypi.org/project/aiohttp/)\n  - [beautifulsoup4](https://pypi.org/project/beautifulsoup4/)\n  - [cryptography](https://pypi.org/project/cryptography/)\n  - [dnspython](https://pypi.org/project/dnspython/)\n  - [mmh3](https://pypi.org/project/mmh3/)\n  - [python-dotenv](https://pypi.org/project/python-dotenv/)\n\n## Installation\n\n### Via pip *(recommended)*\n```bash\n# Install from PyPI\npip install httpz_scanner\n\n# The 'httpz' command will now be available in your terminal\nhttpz --help\n```\n\n### From source\n```bash\n# Clone the repository\ngit clone https://github.com/acidvegas/httpz\ncd httpz\npip install -r requirements.txt\n```\n\n## Usage\n\n### Command Line Interface\n\nBasic usage:\n```bash\npython -m httpz_scanner domains.txt\n```\n\nScan with all flags enabled and output to JSONL:\n```bash\npython -m httpz_scanner domains.txt -all -c 100 -o results.jsonl -j -p\n```\n\nRead from stdin:\n```bash\ncat domains.txt | python -m httpz_scanner - -all -c 100\necho \"example.com\" | python -m httpz_scanner - -all\n```\n\nFilter by status codes and follow redirects:\n```bash\npython -m httpz_scanner domains.txt -mc 200,301-399 -ec 404,500 -fr -p\n```\n\nShow specific fields with custom timeout and resolvers:\n```bash\npython -m httpz_scanner domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt\n```\n\nFull scan with all options:\n```bash\npython -m httpz_scanner domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt\n```\n\n### Distributed Scanning\nSplit scanning across multiple machines using the `--shard` argument:\n\n```bash\n# Machine 1\nhttpz domains.txt --shard 1/3\n\n# Machine 2\nhttpz domains.txt --shard 2/3\n\n# Machine 3\nhttpz domains.txt --shard 3/3\n```\n\nEach machine will process a different subset of domains without overlap. For example, with 3 shards:\n- Machine 1 processes lines 0,3,6,9,...\n- Machine 2 processes lines 1,4,7,10,...\n- Machine 3 processes lines 2,5,8,11,...\n\nThis allows efficient distribution of large scans across multiple machines.\n\n### Python Library\n```python\nimport asyncio\nimport urllib.request\nfrom httpz_scanner import HTTPZScanner\n\nasync def scan_from_list() -\u003e list:\n    with urllib.request.urlopen('https://example.com/domains.txt') as response:\n        content = response.read().decode()\n        return [line.strip() for line in content.splitlines() if line.strip()][:20]\n    \nasync def scan_from_url():\n    with urllib.request.urlopen('https://example.com/domains.txt') as response:\n        for line in response:\n            if line := line.strip():\n                yield line.decode().strip()\n\nasync def scan_from_file():\n    with open('domains.txt', 'r') as file:\n        for line in file:\n            if line := line.strip():\n                yield line\n\nasync def main():\n    # Initialize scanner with all possible options (showing defaults)\n    scanner = HTTPZScanner(\n        concurrent_limit=100,   # Number of concurrent requests\n        timeout=5,              # Request timeout in seconds\n        follow_redirects=False, # Follow redirects (max 10)\n        check_axfr=False,       # Try AXFR transfer against nameservers\n        resolver_file=None,     # Path to custom DNS resolvers file\n        output_file=None,       # Path to JSONL output file\n        show_progress=False,    # Show progress counter\n        debug_mode=False,       # Show error states and debug info\n        jsonl_output=False,     # Output in JSONL format\n        shard=None,             # Tuple of (shard_index, total_shards) for distributed scanning\n        \n        # Control which fields to show (all False by default unless show_fields is None)\n        show_fields={\n            'status_code': True,      # Show status code\n            'content_type': True,     # Show content type\n            'content_length': True,   # Show content length\n            'title': True,            # Show page title\n            'body': True,             # Show body preview\n            'ip': True,               # Show IP addresses\n            'favicon': True,          # Show favicon hash\n            'headers': True,          # Show response headers\n            'follow_redirects': True, # Show redirect chain\n            'cname': True,            # Show CNAME records\n            'tls': True               # Show TLS certificate info\n        },\n        \n        # Filter results\n        match_codes={200,301,302},  # Only show these status codes\n        exclude_codes={404,500,503} # Exclude these status codes\n    )\n\n    # Example 1: Process file\n    print('\\nProcessing file:')\n    async for result in scanner.scan(scan_from_file()):\n        print(f\"{result['domain']}: {result['status']}\")\n\n    # Example 2: Stream URLs\n    print('\\nStreaming URLs:')\n    async for result in scanner.scan(scan_from_url()):\n        print(f\"{result['domain']}: {result['status']}\")\n\n    # Example 3: Process list\n    print('\\nProcessing list:')\n    domains = await scan_from_list()\n    async for result in scanner.scan(domains):\n        print(f\"{result['domain']}: {result['status']}\")\n\nif __name__ == '__main__':\n    asyncio.run(main())\n```\n\nThe scanner accepts various input types:\n- File paths (string)\n- Lists/tuples of domains\n- stdin (using '-')\n- Async generators that yield domains\n\nAll inputs support sharding for distributed scanning using the `shard` parameter.\n\n## Arguments\n\n| Argument      | Long Form        | Description                                                 |\n|---------------|------------------|-------------------------------------------------------------|\n| `file`        |                  | File containing domains *(one per line)*, use `-` for stdin |\n| `-d`          | `--debug`        | Show error states and debug information                     |\n| `-c N`        | `--concurrent N` | Number of concurrent checks *(default: 100)*                |\n| `-o FILE`     | `--output FILE`  | Output file path *(JSONL format)*                           |\n| `-j`          | `--jsonl`        | Output JSON Lines format to console                         |\n| `-all`        | `--all-flags`    | Enable all output flags                                     |\n| `-sh`         | `--shard N/T`    | Process shard N of T total shards *(e.g., 1/3)*             |\n\n### Output Field Flags\n\n| Flag   | Long Form            | Description                      |\n|--------| ---------------------|----------------------------------|\n| `-sc`  | `--status-code`      | Show status code                 |\n| `-ct`  | `--content-type`     | Show content type                |\n| `-ti`  | `--title`            | Show page title                  |\n| `-b`   | `--body`             | Show body preview                |\n| `-i`   | `--ip`               | Show IP addresses                |\n| `-f`   | `--favicon`          | Show favicon hash                |\n| `-hr`  | `--headers`          | Show response headers            |\n| `-cl`  | `--content-length`   | Show content length              |\n| `-fr`  | `--follow-redirects` | Follow redirects *(max 10)*      |\n| `-cn`  | `--cname`            | Show CNAME records               |\n| `-tls` | `--tls-info`         | Show TLS certificate information |\n\n### Other Options\n\n| Option      | Long Form               | Description                                         |\n|-------------|-------------------------|-----------------------------------------------------|\n| `-to N`     | `--timeout N`           | Request timeout in seconds *(default: 5)*           |\n| `-mc CODES` | `--match-codes CODES`   | Only show specific status codes *(comma-separated)* |\n| `-ec CODES` | `--exclude-codes CODES` | Exclude specific status codes *(comma-separated)*   |\n| `-p`        | `--progress`            | Show progress counter                               |\n| `-ax`       | `--axfr`                | Try AXFR transfer against nameservers               |\n| `-r FILE`   | `--resolvers FILE`      | File containing DNS resolvers *(one per line)*      |","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Facidvegas%2Fhttpz","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Facidvegas%2Fhttpz","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Facidvegas%2Fhttpz/lists"}