{"id":50055815,"url":"https://github.com/fawadss1/scrapy-stealth","last_synced_at":"2026-05-21T13:05:38.892Z","repository":{"id":353319825,"uuid":"1218726070","full_name":"fawadss1/scrapy-stealth","owner":"fawadss1","description":"Stealthy Crawling. Maximum Results. A pluggable anti-bot and stealth framework for Scrapy.","archived":false,"fork":false,"pushed_at":"2026-05-18T06:50:18.000Z","size":299,"stargazers_count":6,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-05-18T08:38:47.634Z","etag":null,"topics":["anti-bot","cloudflare-bypass","framework","proxy-rotation","scraping-python","scrapy"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/scrapy-stealth","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fawadss1.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-23T06:46:06.000Z","updated_at":"2026-05-18T06:48:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/fawadss1/scrapy-stealth","commit_stats":null,"previous_names":["fawadss1/scrapy-stealth"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/fawadss1/scrapy-stealth","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fawadss1%2Fscrapy-stealth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fawadss1%2Fscrapy-stealth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fawadss1%2Fscrapy-stealth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fawadss1%2Fscrapy-stealth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fawadss1","download_url":"https://codeload.github.com/fawadss1/scrapy-stealth/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fawadss1%2Fscrapy-stealth/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33301534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-21T12:23:38.849Z","status":"ssl_error","status_checked_at":"2026-05-21T12:22:11.673Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anti-bot","cloudflare-bypass","framework","proxy-rotation","scraping-python","scrapy"],"created_at":"2026-05-21T13:05:35.613Z","updated_at":"2026-05-21T13:05:38.883Z","avatar_url":"https://github.com/fawadss1.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/fawadss1/scrapy-stealth/master/docs/static/logo.png\" alt=\"scrapy-stealth logo\" width=\"925\"/\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003escrapy-stealth\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\u003cstrong\u003eStealthy Crawling. Maximum Results.\u003c/strong\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003eA pluggable anti-bot and stealth framework for Scrapy.\u003c/p\u003e\n\n[![PyPI version](https://img.shields.io/pypi/v/scrapy-stealth?color=blue)](https://pypi.org/project/scrapy-stealth/)\n[![Python versions](https://img.shields.io/pypi/pyversions/scrapy-stealth)](https://pypi.org/project/scrapy-stealth/)\n[![Downloads](https://static.pepy.tech/badge/scrapy-stealth)](https://pepy.tech/project/scrapy-stealth)\n[![GitHub release](https://img.shields.io/github/v/release/fawadss1/scrapy-stealth)](https://github.com/fawadss1/scrapy-stealth/releases)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green)](https://github.com/fawadss1/scrapy-stealth/blob/master/LICENSE)\n[![Changelog](https://img.shields.io/badge/changelog-releases-informational)](https://github.com/fawadss1/scrapy-stealth/releases)\n\n`scrapy-stealth` extends Scrapy with browser impersonation, proxy rotation, fingerprint cycling, and intelligent retry strategies —\ndesigned for large-scale, production-grade crawling.\n\n---\n\n## 🧠 Why scrapy-stealth?\n\nScrapy is fast and powerful, but modern websites use advanced anti-bot protections such as:\n\n* TLS fingerprinting\n* Browser behavior detection\n* Rate limiting and IP blocking\n\n`scrapy-stealth` helps by adding:\n\n* 🧬 Browser-level impersonation (TLS + HTTP/2 fingerprints)\n* 🔁 Smarter retry strategies\n* 🌐 Proxy and fingerprint rotation\n* 🛡️ Anti-bot detection\n\n### Result\n\n* Higher success rate\n* Lower proxy cost\n* More stable crawls\n\n---\n\n## 📊 Comparison\n\n| Feature                      | scrapy-stealth | scrapy-impersonate | scrapy-playwright | scrapy-splash | Scrapy (default) |\n|------------------------------|:--------------:|:------------------:|:-----------------:|:-------------:|:----------------:|\n| TLS fingerprint spoofing     |       ✅        |         ✅          |         ❌         |       ❌       |        ❌         |\n| HTTP/2 support               |       ✅        |         ✅          |         ✅         |       ❌       |        ❌         |\n| Browser impersonation        |       ✅        |         ✅          |    ⚠️ partial     |       ❌       |        ❌         |\n| Proxy rotation (built-in)    |       ✅        |         ❌          |         ❌         |       ❌       |        ❌         |\n| Fingerprint rotation         |       ✅        |         ❌          |         ❌         |       ❌       |        ❌         |\n| Anti-bot detection           |       ✅        |         ❌          |         ❌         |       ❌       |        ❌         |\n| Smart retry logic            |       ✅        |         ❌          |         ❌         |       ❌       |        ❌         |\n| Per-request engine switching |       ✅        |         ❌          |         ❌         |       ❌       |        ❌         |\n| Headless browser required    |       ✅        |         ❌          |         ✅         |       ✅       |        ❌         |\n| JavaScript rendering         |       ️✅       |         ❌          |         ✅         |       ✅       |        ❌         |\n| Screenshot / snapshot        |       ✅        |         ❌          |         ✅         |       ✅       |        ❌         |\n| Native Scrapy integration    |       ✅        |         ✅          |         ✅         |       ✅       |        ✅         |\n| Memory footprint             |     🟢 Low     |       🟢 Low       |      🔴 High      |    🔴 High    |      🟢 Low      |\n\n\u003e ⚠️ `scrapy-playwright` passes real browser TLS but does not spoof fingerprint profiles like `scrapy-stealth` does.\n\u003e `scrapy-impersonate` provides TLS/HTTP2 impersonation via `curl_cffi` but lacks built-in rotation, detection, or per-request engine switching.\n\u003e JavaScript rendering is available via the optional `browser` driver — use it selectively for pages that require a full browser.\n\n---\n\n## ✨ Features\n\n* 🔌 Pluggable engine system (`scrapy`, `stealth`)\n* 🧠 Per-request engine selection via `request.meta`\n* 🌐 Proxy support and rotation\n* 🧬 Browser fingerprint rotation\n* 🔁 Smart retry logic\n* 🛡️ Anti-bot detection (status + content-based, Cloudflare, Akamai)\n* ⚡  Thread-safe async integration\n* 🖥️ Real-browser engine (CDP) for JS-heavy pages\n* 📸 Built-in snapshot decorator (`scrapy_stealth.decorators.snapshot`)\n\n---\n\n## 📦 Installation\n\n```bash\npip install scrapy-stealth\n```\n\n\u003e Requires Python 3.11+ and Scrapy 2.12–2.x\n\n---\n\n## ⚙️ Setup\n\n### Option 1 — Global (`settings.py`)\n\n```python\n# 1. Enable the middleware\nDOWNLOADER_MIDDLEWARES = {\n    \"scrapy_stealth.StealthDownloaderMiddleware\": 950,\n}\n\n# 2. (Optional) Route ALL requests through stealth automatically — no meta needed per request\nSTEALTH_ENABLED = True\nSTEALTH_DRIVER  = \"turbo\"   # \"basic\" (default), \"turbo\", or \"browser\"\n\n# 3. (Optional) Proxy list for automatic rotation\n#    Used when rotate_proxy=True (per-request) or when STEALTH_ENABLED=True with rotate_proxy\n#    Supported schemes: http, https, socks4, socks5\nSTEALTH_PROXIES = [\n    \"http://proxy1:8080\",\n    \"http://proxy2:8080\",\n    \"http://user:pass@proxy3:8080\",  # with authentication\n    \"socks5://proxy4:1080\",\n]\n```\n\n### Option 2 — Per-spider (`custom_settings`)\n\nConfigure the middleware and all stealth settings directly on the spider — no changes to `settings.py` required.\n\n```python\nclass MySpider(scrapy.Spider):\n    name = \"example\"\n\n    custom_settings = {\n        \"DOWNLOADER_MIDDLEWARES\": {\n            \"scrapy_stealth.StealthDownloaderMiddleware\": 950,\n        },\n        \"STEALTH_ENABLED\": True,\n        \"STEALTH_DRIVER\": \"turbo\",\n        \"STEALTH_PROXIES\": [\n            \"http://proxy1:8080\",\n            \"http://user:pass@proxy2:8080\",\n            \"socks5://proxy3:1080\",\n        ],\n    }\n```\n\n\u003e Proxies are validated at startup — invalid format or unsupported scheme raises `ValueError` immediately.\n\n---\n\n## 🚀 Quick Start\n\n**Option A — Per-request** (stealth only on specific requests):\n\n```python\nyield scrapy.Request(\n    url=\"https://example.com\",\n    meta={\"stealth\": {}},\n)\n```\n\n**Option B — Global mode** (stealth on every request automatically):\n\n```python\n# settings.py or custom_settings\nSTEALTH_ENABLED = True\nSTEALTH_DRIVER  = \"turbo\"\n```\n\n```python\n# No meta needed — all requests go through stealth\nyield scrapy.Request(url=\"https://example.com\")\n\n# Opt out for a specific request\nyield scrapy.Request(url=\"https://api.internal/health\", meta={\"stealth\": False})\n```\n\n---\n\n## 🔧 Global Configuration\n\nCustomise package-wide defaults via the shared `config` instance.\nAll settings must be applied **at module level**, before the spider class — the engine client is\ncreated at middleware initialisation, so changes inside `start_requests` or `parse` will have no effect.\n\n```python\n# myspider.py\nimport scrapy\nfrom scrapy_stealth.config import config\n\nconfig.DEFAULT_ENGINE  = \"stealth\"      # \"scrapy\" (native) or \"stealth\" (browser impersonation)\nconfig.DEFAULT_PROFILE = \"chrome_147\"   # browser profile when meta[\"stealth\"][\"profile\"] is not set\nconfig.DEFAULT_TIMEOUT = 30             # stealth request timeout in seconds\nconfig.STEALTH_DRIVER  = \"turbo\"        # \"basic\" (default), \"turbo\", or \"browser\"\nconfig.HTTP2           = True           # False for servers that only support HTTP/1.1\nconfig.BLOCK_CODES    |= {407}          # extend blocked status codes (|= keeps defaults)\nconfig.BLOCK_KEYWORDS.append(\"banned\")  # extend blocked body-text patterns\nconfig.BROWSER_HEADLESS = True          # browser driver: headless mode (False = visible window, more stealthy)\nconfig.BROWSER_SETTLE_S = 4.0          # browser driver: seconds to wait after navigation for JS to finish\n\n\nclass MySpider(scrapy.Spider):\n    name = \"example\"\n    ...\n```\n\n```python\n# ❌ wrong — too late, the engine client is already created\nclass MySpider(scrapy.Spider):\n    def start_requests(self):\n        config.HTTP2 = False  # has no effect\n        ...\n```\n\nYou can also read any value programmatically:\n\n```python\nconfig.get(\"DEFAULT_ENGINE\")          # \"scrapy\"\nconfig.get(\"MISSING_KEY\", \"default\")  # \"default\"\n```\n\n| Attribute          | Type             | Default                           | Description                                                                                                  |\n|--------------------|------------------|-----------------------------------|--------------------------------------------------------------------------------------------------------------|\n| `DEFAULT_ENGINE`   | `str`            | `\"scrapy\"`                        | Engine used when `request.meta[\"stealth\"]` key is absent                                                     |\n| `DEFAULT_PROFILE`  | `str`            | `\"chrome_147\"`                    | Browser profile used when none is specified                                                                  |\n| `DEFAULT_TIMEOUT`  | `int`            | `30`                              | Request timeout in seconds                                                                                   |\n| `STEALTH_DRIVER`   | `str`            | `\"basic\"`                         | Default driver: `\"basic\"`, `\"turbo\"`, or `\"browser\"`. Also readable from Scrapy settings as `STEALTH_DRIVER` |\n| `HTTP2`            | `bool`           | `True`                            | HTTP/2 mode; overridable per-request via `meta[\"stealth\"][\"http2\"]`                                          |\n| `BLOCK_CODES`      | `frozenset[int]` | `{403, 429, 503}`                 | HTTP status codes considered blocked                                                                         |\n| `BLOCK_KEYWORDS`   | `list[str]`      | `[\"captcha\", \"access denied\", …]` | Body-text patterns considered blocked                                                                        |\n| `BROWSER_HEADLESS` | `bool`           | `True`                            | Browser driver: headless mode (`False` = visible window, more stealthy)                                      |\n| `BROWSER_SETTLE_S` | `float`          | `4.0`                             | Browser driver: seconds to wait after navigation for JS to finish rendering                                  |\n\nFor one-off overrides on a single request, set `meta[\"stealth\"][\"driver\"]` or `meta[\"stealth\"][\"http2\"]` (see Per-Request Configuration below).\n\n---\n\n## ⚙️ Per-Request Configuration\n\nAll options are passed via `request.meta[\"stealth\"]`.\n\nThe presence of `meta[\"stealth\"]` (a dict) activates the stealth engine. Omit the key to use the default Scrapy engine.\nWhen `STEALTH_ENABLED = True`, all requests are stealth by default — pass `meta={\"stealth\": False}` to opt out for a specific request.\n\n```python\nyield scrapy.Request(\n    url,\n    meta={\n        \"stealth\": {\n            \"driver\": \"turbo\",\n            \"profile\": \"chrome_147\",\n            \"proxy\": \"http://user:pass@proxy:8080\",\n            \"stealth_timeout\": 60,\n            \"http2\": True,\n            \"rotate_proxy\": True,\n            \"rotate_profile\": True,\n        }\n    },\n)\n```\n\n| Key               | Type    | Description                                                                                                     |\n|-------------------|---------|-----------------------------------------------------------------------------------------------------------------|\n| `driver`          | `str`   | `\"basic\"`, `\"turbo\"`, or `\"browser\"` — overrides `config.STEALTH_DRIVER` per-request                            |\n| `profile`         | `str`   | Browser profile (e.g. `\"chrome_147\"`, `\"safari_ios_18_1_1\"`)                                                    |\n| `proxy`           | `str`   | Explicit proxy URL                                                                                              |\n| `stealth_timeout` | `int`   | Per-request timeout in seconds (overrides default 30s)                                                          |\n| `http2`           | `bool`  | `True` = HTTP/2, `False` = HTTP/1.1 (overrides `config.HTTP2` for this request)                                 |\n| `rotate_proxy`    | `bool`  | Auto-pick a proxy from `STEALTH_PROXIES`                                                                        |\n| `rotate_profile`  | `bool`  | Auto-pick a random browser profile                                                                              |\n| `headless`        | `bool`  | Browser driver only: `True` = headless, `False` = visible window (more stealthy)                                |\n| `settle`          | `float` | Browser driver only: seconds to wait for JS after navigation (default `4.0`)                                    |\n| `snapshot`        | `bool`  | Browser driver only: capture a PNG snapshot — result available as `response.meta[\"snapshot_content\"]` (`bytes`) |\n\n---\n\n## 🖥️ Browser Engine\n\nFor sites protected by Cloudflare JS challenges or heavy JavaScript rendering, use the `browser` driver.\nIt runs a real Chrome instance via the DevTools Protocol (no WebDriver), keeping one persistent browser\nand opening a new tab per request.\n\n**Per-request (most common):**\n\n```python\nyield scrapy.Request(\n    url,\n    meta={\n        \"stealth\": {\n            \"driver\": \"browser\",\n            \"headless\": False,   # visible window — harder to detect (default: True)\n            \"settle\": 4.0,       # seconds to wait for JS after page load\n        }\n    },\n)\n```\n\n**Heavy Cloudflare sites — increase settle time:**\n\n```python\nmeta={\"stealth\": {\"driver\": \"browser\", \"headless\": False, \"settle\": 12}}\n```\n\n**Global default (all stealth requests use browser engine):**\n\n```python\nfrom scrapy_stealth.config import config\n\nconfig.STEALTH_DRIVER   = \"browser\"\nconfig.BROWSER_HEADLESS = False   # more stealthy\nconfig.BROWSER_SETTLE_S = 6.0    # longer wait for JS\n```\n\n\u003e **Performance note**: the browser engine is slower than `basic`/`turbo` (~5-15s per page vs \u003c2s).\n\u003e Use it selectively — route only JS-protected URLs to `\"browser\"` and keep everything else on `\"turbo\"`.\n\n---\n\n## 📸 Screenshots\n\nCapture a PNG screenshot of any page rendered by the `browser` driver and save it to disk.\n\n### Enable on the request\n\n```python\nyield scrapy.Request(\n    url,\n    meta={\n        \"stealth\": {\n            \"driver\": \"browser\",\n            \"snapshot\": True,\n        }\n    },\n    callback=self.parse,\n)\n```\n\nThe raw PNG bytes are available at `response.meta[\"snapshot_content\"]` inside your callback.\n\n### Auto-save with `snapshot` decorator\n\n```python\nfrom scrapy_stealth.decorators import snapshot\n\nclass MySpider(scrapy.Spider):\n\n    @snapshot\n    def parse(self, response): ...\n\n    @snapshot(path=\"stealth_shots/page.png\")\n    def parse(self, response): ...\n\n    @snapshot(path=lambda r: r.url.split(\"/\")[-1] + \".png\")\n    def parse(self, response): ...\n```\n\n\u003e **Note:** Requires `driver=\"browser\"` and `snapshot=True` in the request meta.\n\u003e Logs an error if no snapshot data is found in the response.\n\n### Custom handling (without the built-in helper)\n\nThe screenshot is just `bytes` in `response.meta[\"snapshot_content\"]` — do anything you like with it:\n\n```python\ndef parse(self, response):\n    shot: bytes | None = response.meta.get(\"snapshot_content\")\n    if shot is None:\n        return  # screenshot was not requested or capture failed\n\n    # Save manually\n    with open(\"page.png\", \"wb\") as f:\n        f.write(shot)\n\n    # Pass to a pipeline via item\n    yield {\"url\": response.url, \"screenshot\": shot}\n```\n\n---\n\n## 🔁 Automatic Rotation\n\n```python\nyield scrapy.Request(\n    url,\n    meta={\n        \"stealth\": {\n            \"rotate_proxy\": True,\n            \"rotate_profile\": True,\n        }\n    },\n)\n```\n\n---\n\n## 🧩 Strategies\n\n### Proxy Rotation\n\n```python\nfrom scrapy_stealth.strategies.proxy import ProxyRotator\n\nproxy_rotator = ProxyRotator([\n    \"http://proxy1:8080\",\n    \"http://proxy2:8080\",\n])\n\nyield scrapy.Request(\n    url,\n    meta={\n        \"stealth\": {\n            \"proxy\": proxy_rotator.get(),\n        }\n    },\n)\n```\n\n---\n\n### Fingerprint Rotation\n\n```python\nfrom scrapy_stealth.strategies.fingerprint import ProfileRotator\n\nfp = ProfileRotator()\n\nyield scrapy.Request(\n    url,\n    meta={\n        \"stealth\": {\n            \"profile\": fp.get(),\n        }\n    },\n)\n```\n\n---\n\n### Intelligent Retry\n\n```python\nfrom scrapy_stealth.strategies.retry import RetryHandler\n\nretry = RetryHandler()\n\n\ndef parse(self, response):\n    if retry.should_retry(response):\n        yield retry.build(response.request)\n        return\n```\n\n---\n\n## 🛡️ Anti-Bot Detection\n\n```python\nfrom scrapy_stealth.detectors.antibot import AntiBotDetector\n\ndetector = AntiBotDetector()\n\nif detector.is_blocked(response):\n    print(\"Blocked!\")\n```\n\n---\n\n## 📊 Example\n\n```python\nimport scrapy\n\n\nclass ExampleSpider(scrapy.Spider):\n    name = \"example\"\n\n    def start_requests(self):\n        yield scrapy.Request(\n            \"https://example.com\",\n            meta={\n                \"stealth\": {\n                    \"rotate_proxy\": True,\n                    \"rotate_profile\": True,\n                }\n            },\n        )\n\n    def parse(self, response):\n        yield {\n            \"title\": response.css(\"title::text\").get(),\n            \"url\": response.url,\n        }\n```\n\n---\n\n## ⚡ Performance Insight\n\nUsing stealth selectively:\n\n* ⚡ Faster crawling (Scrapy for simple pages)\n* 💰 Lower proxy cost\n* 🛡️ Better success rate on protected pages\n\n---\n\n## 📜 Changelog\n\nSee [CHANGELOG.md](https://github.com/fawadss1/scrapy-stealth/blob/master/CHANGELOG.md) for a full history of changes, or browse [GitHub Releases](https://github.com/fawadss1/scrapy-stealth/releases).\n\n---\n\n## 🤝 Contributing\n\nSee [CONTRIBUTING.md](https://github.com/fawadss1/scrapy-stealth/blob/master/CONTRIBUTING.md) for guidelines on how to contribute.\n\n---\n\n## 📄 License\n\nThis project is licensed under the **MIT License** — free to use, modify, and distribute.\nSee [LICENSE](https://github.com/fawadss1/scrapy-stealth/blob/master/LICENSE) for the full text.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffawadss1%2Fscrapy-stealth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffawadss1%2Fscrapy-stealth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffawadss1%2Fscrapy-stealth/lists"}