https://github.com/spider-rs/spider
A web crawler and scraper for Rust
https://github.com/spider-rs/spider
crawler data headless-chrome indexer rust scraping spider
Last synced: 23 days ago
JSON representation
A web crawler and scraper for Rust
- Host: GitHub
- URL: https://github.com/spider-rs/spider
- Owner: spider-rs
- License: mit
- Created: 2018-01-07T18:49:20.000Z (about 8 years ago)
- Default Branch: main
- Last Pushed: 2025-05-12T02:33:25.000Z (9 months ago)
- Last Synced: 2025-05-13T00:11:43.448Z (9 months ago)
- Topics: crawler, data, headless-chrome, indexer, rust, scraping, spider
- Language: Rust
- Homepage: https://spider.cloud
- Size: 6.37 MB
- Stars: 1,724
- Watchers: 16
- Forks: 144
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
- awesome-crawler - spider - The fastest web crawler and indexer. (Rust)
README
# Spider
[](https://github.com/spider-rs/spider/actions)
[](https://crates.io/crates/spider)
[](https://docs.rs/spider)
[](https://github.com/spider-rs/spider)
[](https://discord.spider.cloud)
[Website](https://spider.cloud) |
[Guides](https://spider.cloud/guides) |
[API Docs](https://docs.rs/spider/latest/spider) |
[Chat](https://discord.spider.cloud)
A web crawler and scraper, building blocks for data curation workloads.
- Concurrent
- Streaming
- [Decentralization](./spider_worker/)
- [CDP Automation](https://github.com/spider-rs/chromey)
- [Anti-Bot mitigation](https://github.com/spider-rs/spider_fingerprint)
- [HTML transformations](https://github.com/spider-rs/spider_transformations)
- [Adblocker](https://github.com/spider-rs/spider_network_blocker)
- [Firewall](https://github.com/spider-rs/spider_firewall)
- Smart Mode
- Proxy Support
- Subscriptions
- Disk persistence
- Caching memory, disk, or remote hybrid between HTTP and chrome
- Blacklisting, Whitelisting, and Budgeting Depth
- Dynamic AI Prompt Scripting Headless with Step Caching
- CSS/Xpath Scraping with [spider_utils](./spider_utils/README.md#CSS_Scraping)
- Cron Jobs
- [Changelog](CHANGELOG.md)
## Getting Started
The simplest way to get started is to use the [Spider Cloud](https://spider.cloud) hosted service. View the [spider](./spider/README.md) or [spider_cli](./spider_cli/README.md) directory for local installations. You can also use spider with Node.js using [spider-nodejs](https://github.com/spider-rs/spider-nodejs) and Python using [spider-py](https://github.com/spider-rs/spider-py).
## Benchmarks
See [BENCHMARKS](./benches/BENCHMARKS.md).
## Examples
See [EXAMPLES](./examples/).
## License
This project is licensed under the [MIT license].
[MIT license]: https://github.com/spider-rs/spider/blob/main/LICENSE
## Contributing
See [CONTRIBUTING](CONTRIBUTING.md).