{"id":13464648,"url":"https://github.com/spider-rs/spider","last_synced_at":"2026-04-02T13:28:27.522Z","repository":{"id":37890937,"uuid":"116590428","full_name":"spider-rs/spider","owner":"spider-rs","description":"Web crawler and scraper for Rust","archived":false,"fork":false,"pushed_at":"2026-02-01T13:18:25.000Z","size":8357,"stargazers_count":2216,"open_issues_count":0,"forks_count":178,"subscribers_count":17,"default_branch":"main","last_synced_at":"2026-02-01T13:51:00.457Z","etag":null,"topics":["automation","crawler","headless-chrome","indexer","rust","scraping","spider"],"latest_commit_sha":null,"homepage":"https://spider.cloud","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spider-rs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2018-01-07T18:49:20.000Z","updated_at":"2026-02-01T13:18:29.000Z","dependencies_parsed_at":"2024-02-19T15:30:14.870Z","dependency_job_id":"3cc3622c-bdf7-4544-8ce7-a413ab840796","html_url":"https://github.com/spider-rs/spider","commit_stats":{"total_commits":832,"total_committers":13,"mean_commits":64.0,"dds":0.08533653846153844,"last_synced_commit":"42d5b20a2f6cc43316829740a918e36e085fa2ac"},"previous_names":["madeindjs/spider"],"tags_count":795,"template":false,"template_full_name":null,"purl":"pkg:github/spider-rs/spider","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spider-rs","download_url":"https://codeload.github.com/spider-rs/spider/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29048698,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T15:43:47.601Z","status":"ssl_error","status_checked_at":"2026-02-03T15:43:46.709Z","response_time":96,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","crawler","headless-chrome","indexer","rust","scraping","spider"],"created_at":"2024-07-31T14:00:47.982Z","updated_at":"2026-04-02T13:28:27.517Z","avatar_url":"https://github.com/spider-rs.png","language":"Rust","funding_links":[],"categories":["All","Tools","Rust","🤖 AI-Powered Scraping","Web Crawler","GUI \u0026 Computer Control AI Agents","Tooling"],"sub_categories":["Browser \u0026 Web Automation"],"readme":"# Spider\n\n[![Build Status](https://github.com/spider-rs/spider/actions/workflows/rust.yml/badge.svg)](https://github.com/spider-rs/spider/actions)\n[![Crates.io](https://img.shields.io/crates/v/spider.svg)](https://crates.io/crates/spider)\n[![Downloads](https://img.shields.io/crates/d/spider.svg)](https://crates.io/crates/spider)\n[![Documentation](https://docs.rs/spider/badge.svg)](https://docs.rs/spider)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Discord](https://img.shields.io/discord/1254585814021832755.svg?logo=discord\u0026style=flat-square)](https://discord.spider.cloud)\n\n[Website](https://spider.cloud) |\n[Guides](https://spider.cloud/guides) |\n[API Docs](https://docs.rs/spider/latest/spider) |\n[Examples](./examples/) |\n[Discord](https://discord.spider.cloud)\n\nA high-performance web crawler and scraper for Rust. [200-1000x faster](#benchmarks) than popular alternatives, with HTTP, headless Chrome, and WebDriver rendering in a single library.\n\n- **Crawl 100k+ pages in minutes** on a single machine. [See benchmarks.](#benchmarks)\n- **HTTP, Chrome CDP, WebDriver, and [AI automation](./spider_agent/)** in one dependency.\n- **Production-ready** with caching, proxy rotation, anti-bot bypass, and [distributed crawling](./spider_worker/). [Feature-gated](https://doc.rust-lang.org/cargo/reference/features.html) so you only compile what you use.\n\n## Quick Start\n\n### Command Line\n\n```bash\ncargo install spider_cli\nspider --url https://example.com\n```\n\n### Rust\n\n```toml\n[dependencies]\nspider = \"2\"\n```\n\n```rust\nuse spider::tokio;\nuse spider::website::Website;\n\n#[tokio::main]\nasync fn main() {\n    let mut website = Website::new(\"https://example.com\");\n    website.crawl().await;\n    println!(\"Pages found: {}\", website.get_links().len());\n}\n```\n\n### Streaming\n\nProcess each page the moment it's crawled, not after:\n\n```rust\nuse spider::tokio;\nuse spider::website::Website;\n\n#[tokio::main]\nasync fn main() {\n    let mut website = Website::new(\"https://example.com\");\n    let mut rx = website.subscribe(0).unwrap();\n\n    tokio::spawn(async move {\n        while let Ok(page) = rx.recv().await {\n            println!(\"- {}\", page.get_url());\n        }\n    });\n\n    website.crawl().await;\n    website.unsubscribe();\n}\n```\n\n### Headless Chrome\n\nAdd one feature flag to render JavaScript-heavy pages:\n\n```toml\n[dependencies]\nspider = { version = \"2\", features = [\"chrome\"] }\n```\n\n```rust\nuse spider::features::chrome_common::RequestInterceptConfiguration;\nuse spider::website::Website;\n\n#[tokio::main]\nasync fn main() {\n    let mut website = Website::new(\"https://example.com\")\n        .with_chrome_intercept(RequestInterceptConfiguration::new(true))\n        .with_stealth(true)\n        .build()\n        .unwrap();\n\n    website.crawl().await;\n}\n```\n\n\u003e Also supports [WebDriver](./examples/webdriver.rs) (Selenium Grid, remote browsers) and [AI-driven automation](./spider_agent/). See [examples](./examples/) for more.\n\n## Benchmarks\n\nCrawling 185 pages ([source](./benches/BENCHMARKS.md), 10 samples averaged):\n\n**Apple M1 Max** (10-core, 64 GB RAM):\n\n| Crawler | Language | Time | vs Spider |\n|---------|----------|-----:|----------:|\n| **spider** | **Rust** | **73 ms** | **baseline** |\n| node-crawler | JavaScript | 15 s | 205x slower |\n| colly | Go | 32 s | 438x slower |\n| wget | C | 70 s | 959x slower |\n\n**Linux** (2-core, 7 GB RAM):\n\n| Crawler | Language | Time | vs Spider |\n|---------|----------|-----:|----------:|\n| **spider** | **Rust** | **50 ms** | **baseline** |\n| node-crawler | JavaScript | 3.4 s | 68x slower |\n| colly | Go | 30 s | 600x slower |\n| wget | C | 60 s | 1200x slower |\n\nThe gap grows with site size. Spider handles 100k+ pages in minutes where other crawlers take hours. This comes from Rust's async runtime ([tokio](https://tokio.rs)), lock-free data structures, and optional [io_uring](https://en.wikipedia.org/wiki/Io_uring) on Linux. [Full details](./benches/BENCHMARKS.md)\n\n## Why Spider?\n\nMost crawlers force a choice between fast HTTP-only or slow-but-flexible browser automation. Spider supports both, and you can mix them in the same crawl.\n\n**Supports HTTP, Chrome, and WebDriver.** Switch rendering modes with a feature flag. Use HTTP for speed, Chrome CDP for JavaScript-heavy pages, and WebDriver for Selenium Grid or cross-browser testing.\n\n**Built for production.** Caching (memory, disk, hybrid), proxy rotation, anti-bot fingerprinting, ad blocking, depth budgets, cron scheduling, and distributed workers. All of this has been hardened through [Spider Cloud](https://spider.cloud).\n\n**AI automation included.** [spider_agent](./spider_agent/) adds multimodal LLM-driven automation: navigate pages, fill forms, solve challenges, and extract structured data with OpenAI or any compatible API.\n\n## Features\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eCrawling\u003c/strong\u003e\u003c/summary\u003e\n\n- Concurrent and streaming crawls with backpressure\n- [Decentralized crawling](./spider_worker/) for horizontal scaling\n- Caching: memory, disk (SQLite), or [hybrid Chrome cache](./examples/cache_chrome_hybrid.rs)\n- Proxy support with rotation\n- Cron job scheduling\n- Depth budgeting, blacklisting, whitelisting\n- Smart mode that auto-detects JS-rendered content and upgrades to Chrome\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eBrowser Automation\u003c/strong\u003e\u003c/summary\u003e\n\n- [Chrome DevTools Protocol](https://github.com/spider-rs/chromey): headless or headed, stealth mode, screenshots, request interception\n- [WebDriver](./examples/webdriver.rs): Selenium Grid, remote browsers, cross-browser testing\n- AI-powered challenge solving (deterministic + [Chrome built-in AI](https://developer.chrome.com/docs/ai/prompt-api))\n- [Anti-bot fingerprinting](https://github.com/spider-rs/spider_fingerprint), [ad blocking](https://github.com/spider-rs/spider_network_blocker), [firewall](https://github.com/spider-rs/spider_firewall)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eData Processing\u003c/strong\u003e\u003c/summary\u003e\n\n- [HTML transformations](https://github.com/spider-rs/spider_transformations) (Markdown, text, structured extraction)\n- CSS/XPath scraping with [spider_utils](./spider_utils/README.md#CSS_Scraping)\n- [OpenAI](./examples/openai.rs) and [Gemini](./examples/gemini.rs) integration for content analysis\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eAI Agent\u003c/strong\u003e\u003c/summary\u003e\n\n- [spider_agent](./spider_agent/): concurrent-safe multimodal web automation agent\n- Multiple LLM providers (OpenAI, any OpenAI-compatible API, Chrome built-in AI)\n- Web research with search providers (Serper, Brave, Bing, Tavily)\n- 110 built-in automation skills for web challenges\n\n\u003c/details\u003e\n\n## Spider Cloud\n\nFor managed proxy rotation, anti-bot bypass, and CAPTCHA handling, [Spider Cloud](https://spider.cloud) plugs in with one line:\n\n```rust\nlet mut website = Website::new(\"https://protected-site.com\")\n    .with_spider_cloud(\"your-api-key\")  // enable with features = [\"spider_cloud\"]\n    .build()\n    .unwrap();\n```\n\n| Mode | Strategy | Best For |\n|------|----------|----------|\n| **Proxy** (default) | All traffic through Spider Cloud proxy | General crawling with IP rotation |\n| **Smart** (recommended) | Proxy + auto-fallback on bot detection | Production (speed + reliability) |\n| **Fallback** | Direct first, API on failure | Cost-efficient, most sites work without help |\n| **Unblocker** | All requests through unblocker | Aggressive bot protection |\n\n\u003e Free credits on signup. [Get started at spider.cloud](https://spider.cloud)\n\n### Spider Browser Cloud\n\nConnect to a remote Rust-based browser via CDP over WebSocket for automation, scraping, and AI extraction:\n\n```rust\nuse spider::configuration::SpiderBrowserConfig;\n\n// Simple — just an API key\nlet mut website = Website::new(\"https://example.com\")\n    .with_spider_browser(\"your-api-key\")  // features = [\"spider_cloud\", \"chrome\"]\n    .build()\n    .unwrap();\n\n// Full config — stealth, country targeting, custom options\nlet browser_cfg = SpiderBrowserConfig::new(\"your-api-key\")\n    .with_stealth(true)\n    .with_country(\"us\");\n\nlet mut website = Website::new(\"https://example.com\")\n    .with_spider_browser_config(browser_cfg)\n    .build()\n    .unwrap();\n```\n\nWebSocket endpoint: `wss://browser.spider.cloud/v1/browser` — supports CDP and WebDriver BiDi protocols.\n\n### Parallel Backends\n\nRace alternative browser engines alongside the primary crawl. The best HTML response wins — higher reliability and coverage for JS-heavy pages.\n\n```rust\nuse spider::configuration::{BackendEndpoint, BackendEngine, ParallelBackendsConfig};\n\nlet mut website = Website::new(\"https://example.com\");\n\n// Race a secondary browser engine alongside the primary crawl.\nwebsite.configuration.parallel_backends = Some(ParallelBackendsConfig {\n    backends: vec![BackendEndpoint {\n        engine: BackendEngine::default(),\n        endpoint: Some(\"ws://127.0.0.1:9222\".to_string()),\n        binary_path: None,\n        protocol: None,\n        proxy: None, // inherits from website proxies config\n    }],\n    grace_period_ms: 500,       // wait up to 500ms for a better result\n    fast_accept_threshold: 80,  // accept immediately if quality \u003e= 80\n    ..Default::default()\n});\n\nwebsite.crawl().await;\n```\n\n## Get Spider\n\n| Package | Language | Install |\n|---------|----------|---------|\n| [spider](https://crates.io/crates/spider) | Rust | `cargo add spider` |\n| [spider_cli](./spider_cli/) | CLI | `cargo install spider_cli` |\n| [spider-nodejs](https://github.com/spider-rs/spider-nodejs) | Node.js | `npm i @spider-rs/spider-rs` |\n| [spider-py](https://github.com/spider-rs/spider-py) | Python | `pip install spider_rs` |\n| [spider_agent](./spider_agent/) | Rust | `cargo add spider --features agent` |\n| [spider_mcp](./spider_mcp/) | MCP | `cargo install spider_mcp` |\n\n### MCP Server\n\nUse Spider as tools in Claude Code, Claude Desktop, or any MCP client:\n\n```bash\ncargo install spider_mcp\n```\n\n```json\n{ \"mcpServers\": { \"spider\": { \"command\": \"spider-mcp\" } } }\n```\n\nThen ask: *\"Scrape https://example.com as markdown\"* or *\"Crawl https://example.com up to 5 pages\"*\n\n### Cloud and Remote\n\n| Package | Description |\n|---------|-------------|\n| [Spider Cloud](https://spider.cloud) | Managed crawling infrastructure, no setup needed |\n| [spider-clients](https://github.com/spider-rs/spider-clients) | SDKs for Spider Cloud in multiple languages |\n| [spider-browser](https://github.com/spider-rs/spider-browser) | Remote access to Spider's Rust browser |\n\n## Resources\n\n- [64 examples](./examples/) covering crawling, Chrome, WebDriver, AI, caching, and more\n- [API documentation](https://docs.rs/spider/latest/spider)\n- [Benchmarks](./benches/BENCHMARKS.md)\n- [Changelog](CHANGELOG.md)\n\n## Contributing\n\nContributions welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for setup and guidelines.\n\nSpider has been actively developed for the past 4 years. Join the [Discord](https://discord.spider.cloud) for questions and discussion.\n\n## License\n\n[MIT](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspider-rs%2Fspider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspider-rs%2Fspider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspider-rs%2Fspider/lists"}