https://github.com/spider-rs/spider

Web crawler and scraper for Rust
https://github.com/spider-rs/spider

automation crawler headless-chrome indexer rust scraping spider

Last synced: about 1 month ago
JSON representation

Web crawler and scraper for Rust

Host: GitHub
URL: https://github.com/spider-rs/spider
Owner: spider-rs
License: mit
Created: 2018-01-07T18:49:20.000Z (about 8 years ago)
Default Branch: main
Last Pushed: 2026-02-01T13:18:25.000Z (about 2 months ago)
Last Synced: 2026-02-01T13:51:00.457Z (about 2 months ago)
Topics: automation, crawler, headless-chrome, indexer, rust, scraping, spider
Language: Rust
Homepage: https://spider.cloud
Size: 7.97 MB
Stars: 2,216
Watchers: 17
Forks: 178
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
awesome-crawler - spider - The fastest web crawler and indexer. (Rust)
awesome-ai-agents - spider-rs/spider - spider-rs/spider is a web crawler and scraper library for Rust, designed for high-performance and extensive web data extraction. (GUI & Computer Control AI Agents / Browser & Web Automation)

README

          # Spider

[![Build Status](https://github.com/spider-rs/spider/actions/workflows/rust.yml/badge.svg)](https://github.com/spider-rs/spider/actions)

[![Crates.io](https://img.shields.io/crates/v/spider.svg)](https://crates.io/crates/spider)

[![Documentation](https://docs.rs/spider/badge.svg)](https://docs.rs/spider)

[![Rust](https://img.shields.io/badge/rust-1.56.1%2B-blue.svg?maxAge=3600)](https://github.com/spider-rs/spider)

[![Discord chat](https://img.shields.io/discord/1254585814021832755.svg?logo=discord&style=flat-square)](https://discord.spider.cloud)

[Website](https://spider.cloud) |

[Guides](https://spider.cloud/guides) |

[API Docs](https://docs.rs/spider/latest/spider) |

[Chat](https://discord.spider.cloud)

The fastest web crawler written in Rust. Primitives for data curation workloads at scale.

## Why Spider?

- **Fast by Design**: Concurrent crawling with streaming responses at scale

- **Flexible Rendering**: HTTP, Chrome DevTools Protocol (CDP), or WebDriver (Selenium/remote browsers)

- **Production Ready**: Battle-tested with anti-bot mitigation, caching, and distributed crawling

## Features

### Core

- Concurrent & streaming crawls

- [Decentralized crawling](./spider_worker/) for horizontal scaling

- Caching (memory, disk, or hybrid)

- Proxy support with rotation

- Cron job scheduling

### Browser Automation

- [Chrome DevTools Protocol (CDP)](https://github.com/spider-rs/chromey) for local Chrome

- **WebDriver** support for Selenium Grid, remote browsers, and cross-browser testing

- AI-powered automation workflows

- Web challenge solving (deterministic + [AI built-in](https://developer.chrome.com/docs/ai/prompt-api))

### Data Processing

- [HTML transformations](https://github.com/spider-rs/spider_transformations)

- CSS/XPath scraping with [spider_utils](./spider_utils/README.md#CSS_Scraping)

- Smart mode for JS-rendered content detection

### Security & Control

- [Anti-bot mitigation](https://github.com/spider-rs/spider_fingerprint)

- [Ad blocking](https://github.com/spider-rs/spider_network_blocker)

- [Firewall](https://github.com/spider-rs/spider_firewall)

- Blacklisting, whitelisting, and depth budgeting

## Quick Start

The fastest way to get started is with [Spider Cloud](https://spider.cloud) - no infrastructure to manage. Pay-per-use at $1/GB data transfer, designed to keep crawling costs low.

For local development:

```toml

[dependencies]

spider = "2"

```

### Streaming Pages

Process pages as they're crawled with real-time subscriptions:

```rust

use spider::tokio;

use spider::website::Website;

#[tokio::main]

async fn main() {

    let mut website = Website::new("https://spider.cloud");

    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {

        while let Ok(page) = rx.recv().await {

            println!("- {}", page.get_url());

        }

    });

    website.crawl().await;

    website.unsubscribe();

}

```

### Chrome (CDP)

Render JavaScript-heavy pages with stealth mode and request interception:

```toml

[dependencies]

spider = { version = "2", features = ["chrome"] }

```

```rust

use spider::features::chrome_common::RequestInterceptConfiguration;

use spider::website::Website;

#[tokio::main]

async fn main() {

    let mut website = Website::new("https://spider.cloud")

        .with_chrome_intercept(RequestInterceptConfiguration::new(true))

        .with_stealth(true)

        .build()

        .unwrap();

    website.crawl().await;

}

```

### WebDriver (Selenium Grid)

Connect to remote browsers, Selenium Grid, or any W3C WebDriver-compatible service:

```toml

[dependencies]

spider = { version = "2", features = ["webdriver"] }

```

```rust

use spider::features::webdriver_common::{WebDriverConfig, WebDriverBrowser};

use spider::website::Website;

#[tokio::main]

async fn main() {

    let mut website = Website::new("https://spider.cloud")

        .with_webdriver(

            WebDriverConfig::new()

                .with_server_url("http://localhost:4444")

                .with_browser(WebDriverBrowser::Chrome)

                .with_headless(true)

        )

        .build()

        .unwrap();

    website.crawl().await;

}

```

## Get Spider

| Method | Best For |

|--------|----------|

| [Spider Cloud](https://spider.cloud) | Production workloads, no setup required |

| [spider](./spider/README.md) | Rust applications |

| [spider_cli](./spider_cli/README.md) | Command-line usage |

| [spider-nodejs](https://github.com/spider-rs/spider-nodejs) | Node.js projects |

| [spider-py](https://github.com/spider-rs/spider-py) | Python projects |

## Resources

- [Examples](./examples/) - Code samples for common use cases

- [Benchmarks](./benches/BENCHMARKS.md) - Performance comparisons

- [Changelog](CHANGELOG.md) - Version history

## License

[MIT](https://github.com/spider-rs/spider/blob/main/LICENSE)

## Contributing

See [CONTRIBUTING](CONTRIBUTING.md).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/spider-rs/spider

Awesome Lists containing this project

README