https://github.com/spider-rs/spider
Web crawler and scraper for Rust
https://github.com/spider-rs/spider
automation crawler headless-chrome indexer rust scraping spider
Last synced: about 1 month ago
JSON representation
Web crawler and scraper for Rust
- Host: GitHub
- URL: https://github.com/spider-rs/spider
- Owner: spider-rs
- License: mit
- Created: 2018-01-07T18:49:20.000Z (about 8 years ago)
- Default Branch: main
- Last Pushed: 2026-02-01T13:18:25.000Z (about 1 month ago)
- Last Synced: 2026-02-01T13:51:00.457Z (about 1 month ago)
- Topics: automation, crawler, headless-chrome, indexer, rust, scraping, spider
- Language: Rust
- Homepage: https://spider.cloud
- Size: 7.97 MB
- Stars: 2,216
- Watchers: 17
- Forks: 178
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
- awesome-crawler - spider - The fastest web crawler and indexer. (Rust)
README
# Spider
[](https://github.com/spider-rs/spider/actions)
[](https://crates.io/crates/spider)
[](https://docs.rs/spider)
[](https://github.com/spider-rs/spider)
[](https://discord.spider.cloud)
[Website](https://spider.cloud) |
[Guides](https://spider.cloud/guides) |
[API Docs](https://docs.rs/spider/latest/spider) |
[Chat](https://discord.spider.cloud)
The fastest web crawler written in Rust. Primitives for data curation workloads at scale.
## Why Spider?
- **Fast by Design**: Concurrent crawling with streaming responses at scale
- **Flexible Rendering**: HTTP, Chrome DevTools Protocol (CDP), or WebDriver (Selenium/remote browsers)
- **Production Ready**: Battle-tested with anti-bot mitigation, caching, and distributed crawling
## Features
### Core
- Concurrent & streaming crawls
- [Decentralized crawling](./spider_worker/) for horizontal scaling
- Caching (memory, disk, or hybrid)
- Proxy support with rotation
- Cron job scheduling
### Browser Automation
- [Chrome DevTools Protocol (CDP)](https://github.com/spider-rs/chromey) for local Chrome
- **WebDriver** support for Selenium Grid, remote browsers, and cross-browser testing
- AI-powered automation workflows
- Web challenge solving (deterministic + [AI built-in](https://developer.chrome.com/docs/ai/prompt-api))
### Data Processing
- [HTML transformations](https://github.com/spider-rs/spider_transformations)
- CSS/XPath scraping with [spider_utils](./spider_utils/README.md#CSS_Scraping)
- Smart mode for JS-rendered content detection
### Security & Control
- [Anti-bot mitigation](https://github.com/spider-rs/spider_fingerprint)
- [Ad blocking](https://github.com/spider-rs/spider_network_blocker)
- [Firewall](https://github.com/spider-rs/spider_firewall)
- Blacklisting, whitelisting, and depth budgeting
## Quick Start
The fastest way to get started is with [Spider Cloud](https://spider.cloud) - no infrastructure to manage. Pay-per-use at $1/GB data transfer, designed to keep crawling costs low.
For local development:
```toml
[dependencies]
spider = "2"
```
### Streaming Pages
Process pages as they're crawled with real-time subscriptions:
```rust
use spider::tokio;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud");
let mut rx = website.subscribe(0).unwrap();
tokio::spawn(async move {
while let Ok(page) = rx.recv().await {
println!("- {}", page.get_url());
}
});
website.crawl().await;
website.unsubscribe();
}
```
### Chrome (CDP)
Render JavaScript-heavy pages with stealth mode and request interception:
```toml
[dependencies]
spider = { version = "2", features = ["chrome"] }
```
```rust
use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud")
.with_chrome_intercept(RequestInterceptConfiguration::new(true))
.with_stealth(true)
.build()
.unwrap();
website.crawl().await;
}
```
### WebDriver (Selenium Grid)
Connect to remote browsers, Selenium Grid, or any W3C WebDriver-compatible service:
```toml
[dependencies]
spider = { version = "2", features = ["webdriver"] }
```
```rust
use spider::features::webdriver_common::{WebDriverConfig, WebDriverBrowser};
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud")
.with_webdriver(
WebDriverConfig::new()
.with_server_url("http://localhost:4444")
.with_browser(WebDriverBrowser::Chrome)
.with_headless(true)
)
.build()
.unwrap();
website.crawl().await;
}
```
## Get Spider
| Method | Best For |
|--------|----------|
| [Spider Cloud](https://spider.cloud) | Production workloads, no setup required |
| [spider](./spider/README.md) | Rust applications |
| [spider_cli](./spider_cli/README.md) | Command-line usage |
| [spider-nodejs](https://github.com/spider-rs/spider-nodejs) | Node.js projects |
| [spider-py](https://github.com/spider-rs/spider-py) | Python projects |
## Resources
- [Examples](./examples/) - Code samples for common use cases
- [Benchmarks](./benches/BENCHMARKS.md) - Performance comparisons
- [Changelog](CHANGELOG.md) - Version history
## License
[MIT](https://github.com/spider-rs/spider/blob/main/LICENSE)
## Contributing
See [CONTRIBUTING](CONTRIBUTING.md).