{"id":34030874,"url":"https://github.com/proxymesh/scrapy-proxy-headers","last_synced_at":"2026-02-11T00:09:58.406Z","repository":{"id":275191251,"uuid":"925360144","full_name":"proxymesh/scrapy-proxy-headers","owner":"proxymesh","description":"Handle custom proxy headers when making HTTPS requests through proxies in scrapy","archived":false,"fork":false,"pushed_at":"2026-02-09T19:26:01.000Z","size":26,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-09T22:59:41.388Z","etag":null,"topics":["crawlers","crawling","crawling-python","headers","http","http-proxy","https","proxy","proxy-headers","proxy-server","proxymesh","python","python3","scrapers","scraping","scraping-python","scraping-websites","scrapy","spiders","webscraping"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/scrapy-proxy-headers/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/proxymesh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-31T18:20:53.000Z","updated_at":"2026-02-09T19:26:04.000Z","dependencies_parsed_at":"2025-02-21T21:35:27.287Z","dependency_job_id":"589b037c-2d50-4fe3-829e-4271cb172a2c","html_url":"https://github.com/proxymesh/scrapy-proxy-headers","commit_stats":null,"previous_names":["proxymesh/scrapy-proxy-headers"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/proxymesh/scrapy-proxy-headers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proxymesh%2Fscrapy-proxy-headers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proxymesh%2Fscrapy-proxy-headers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proxymesh%2Fscrapy-proxy-headers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proxymesh%2Fscrapy-proxy-headers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/proxymesh","download_url":"https://codeload.github.com/proxymesh/scrapy-proxy-headers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proxymesh%2Fscrapy-proxy-headers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29322779,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-10T20:44:44.282Z","status":"ssl_error","status_checked_at":"2026-02-10T20:44:43.393Z","response_time":65,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawlers","crawling","crawling-python","headers","http","http-proxy","https","proxy","proxy-headers","proxy-server","proxymesh","python","python3","scrapers","scraping","scraping-python","scraping-websites","scrapy","spiders","webscraping"],"created_at":"2025-12-13T18:04:04.017Z","updated_at":"2026-02-11T00:09:58.384Z","avatar_url":"https://github.com/proxymesh.png","language":"Python","readme":"# Scrapy Proxy Headers\n\n[![PyPI version](https://badge.fury.io/py/scrapy-proxy-headers.svg)](https://badge.fury.io/py/scrapy-proxy-headers)\n[![Documentation](https://readthedocs.org/projects/scrapy-proxy-headers/badge/?version=latest)](https://scrapy-proxy-headers.readthedocs.io/)\n\n**Send custom headers to proxies and receive proxy response headers in Scrapy.**\n\n## The Problem\n\nWhen making HTTPS requests through a proxy, Scrapy cannot send custom headers to the proxy itself. This is because HTTPS requests create an encrypted tunnel (via HTTP CONNECT) - any headers you add to `request.headers` are encrypted and only visible to the destination server, not the proxy.\n\n```\n┌──────────┐     CONNECT      ┌───────┐     Encrypted     ┌────────────┐\n│  Scrapy  │ ───────────────► │ Proxy │ ════════════════► │ Target URL │\n└──────────┘  (unencrypted)   └───────┘    (tunnel)       └────────────┘\n                  │                              │\n           Proxy headers             request.headers\n           go HERE                   go here (encrypted)\n```\n\nThis extension solves the problem by:\n1. Sending custom headers to the proxy during the CONNECT handshake\n2. Capturing response headers from the proxy's CONNECT response\n3. Making those headers available in your spider\n\n## Installation\n\n```bash\npip install scrapy-proxy-headers\n```\n\n## Quick Start\n\n### 1. Configure the Download Handler\n\nIn your Scrapy `settings.py`:\n\n```python\nDOWNLOAD_HANDLERS = {\n    \"https\": \"scrapy_proxy_headers.HTTP11ProxyDownloadHandler\"\n}\n```\n\nOr in your spider's `custom_settings`:\n\n```python\nclass MySpider(scrapy.Spider):\n    custom_settings = {\n        \"DOWNLOAD_HANDLERS\": {\n            \"https\": \"scrapy_proxy_headers.HTTP11ProxyDownloadHandler\"\n        }\n    }\n```\n\n### 2. Send Proxy Headers\n\nUse `request.meta[\"proxy_headers\"]` to send headers to the proxy:\n\n```python\nimport scrapy\n\nclass MySpider(scrapy.Spider):\n    name = \"example\"\n    \n    def start_requests(self):\n        yield scrapy.Request(\n            url=\"https://api.ipify.org?format=json\",\n            meta={\n                \"proxy\": \"http://your-proxy:port\",\n                \"proxy_headers\": {\"X-ProxyMesh-Country\": \"US\"}\n            }\n        )\n    \n    def parse(self, response):\n        # Proxy response headers are available in response.headers\n        proxy_ip = response.headers.get(\"X-ProxyMesh-IP\")\n        self.logger.info(f\"Proxy IP: {proxy_ip}\")\n```\n\n### 3. Receive Proxy Response Headers\n\nHeaders from the proxy's CONNECT response are automatically merged into `response.headers`:\n\n```python\ndef parse(self, response):\n    # Access headers sent by the proxy\n    proxy_ip = response.headers.get(b\"X-ProxyMesh-IP\")\n    if proxy_ip:\n        print(f\"Request made through IP: {proxy_ip.decode()}\")\n```\n\n## Complete Example\n\n```python\nimport scrapy\n\nclass ProxyHeadersSpider(scrapy.Spider):\n    name = \"proxy_headers_demo\"\n    \n    custom_settings = {\n        \"DOWNLOAD_HANDLERS\": {\n            \"https\": \"scrapy_proxy_headers.HTTP11ProxyDownloadHandler\"\n        }\n    }\n    \n    def start_requests(self):\n        yield scrapy.Request(\n            url=\"https://api.ipify.org?format=json\",\n            meta={\n                \"proxy\": \"http://us.proxymesh.com:31280\",\n                \"proxy_headers\": {\"X-ProxyMesh-Country\": \"US\"}\n            },\n            callback=self.parse_ip\n        )\n    \n    def parse_ip(self, response):\n        data = response.json()\n        proxy_ip = response.headers.get(b\"X-ProxyMesh-IP\")\n        \n        self.logger.info(f\"Public IP: {data['ip']}\")\n        if proxy_ip:\n            self.logger.info(f\"Proxy IP: {proxy_ip.decode()}\")\n        \n        yield {\n            \"public_ip\": data[\"ip\"],\n            \"proxy_ip\": proxy_ip.decode() if proxy_ip else None\n        }\n```\n\n## How It Works\n\n1. **HTTP11ProxyDownloadHandler** - Custom download handler that manages proxy header caching\n2. **ScrapyProxyHeadersAgent** - Agent that reads `proxy_headers` from request meta\n3. **TunnelingHeadersAgent** - Sends custom headers in the CONNECT request\n4. **TunnelingHeadersTCP4ClientEndpoint** - Captures proxy response headers from CONNECT response\n\nThe handler also caches proxy response headers by proxy URL. This ensures headers remain available even when Scrapy reuses existing tunnel connections for subsequent requests.\n\n## Test Harness\n\nA test harness is included to verify proxy header functionality:\n\n```bash\n# Basic test\nPROXY_URL=http://your-proxy:port TEST_URL=https://api.ipify.org python test_proxy_headers.py\n\n# With custom proxy header\nPROXY_URL=http://your-proxy:port \\\nPROXY_HEADER=X-ProxyMesh-IP \\\nSEND_PROXY_HEADER=X-ProxyMesh-Country \\\nSEND_PROXY_VALUE=US \\\npython test_proxy_headers.py\n\n# Verbose output\npython test_proxy_headers.py -v\n```\n\n### Environment Variables\n\n| Variable | Description | Default |\n|----------|-------------|---------|\n| `PROXY_URL` | Proxy URL (also checks `HTTPS_PROXY`) | Required |\n| `TEST_URL` | URL to request | `https://api.ipify.org?format=json` |\n| `PROXY_HEADER` | Response header to check for | `X-ProxyMesh-IP` |\n| `SEND_PROXY_HEADER` | Header name to send to proxy | Optional |\n| `SEND_PROXY_VALUE` | Value for the send header | Optional |\n\n## Documentation\n\nFull documentation is available at [scrapy-proxy-headers.readthedocs.io](https://scrapy-proxy-headers.readthedocs.io/).\n\n## Use Cases\n\n- **Geographic targeting**: Send `X-ProxyMesh-Country` to route through specific countries\n- **Session consistency**: Request the same IP across multiple requests\n- **Debugging**: Capture proxy response headers to see which IP was assigned\n- **Load balancing**: Use proxy headers to control request distribution\n\n## Requirements\n\n- Python 3.8+\n- Scrapy 2.0+\n\n## License\n\nBSD License - see [LICENSE](LICENSE) for details.\n\n## Links\n\n- [PyPI](https://pypi.org/project/scrapy-proxy-headers/)\n- [Documentation](https://scrapy-proxy-headers.readthedocs.io/)\n- [GitHub](https://github.com/proxymesh/scrapy-proxy-headers)\n- [ProxyMesh](https://proxymesh.com)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproxymesh%2Fscrapy-proxy-headers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fproxymesh%2Fscrapy-proxy-headers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproxymesh%2Fscrapy-proxy-headers/lists"}