https://github.com/spider-rs/spider
A web crawler and scraper for Rust
https://github.com/spider-rs/spider
crawler data headless-chrome indexer rust scraping spider
Last synced: 5 months ago
JSON representation
A web crawler and scraper for Rust
- Host: GitHub
- URL: https://github.com/spider-rs/spider
- Owner: spider-rs
- License: mit
- Created: 2018-01-07T18:49:20.000Z (over 7 years ago)
- Default Branch: main
- Last Pushed: 2025-05-12T02:33:25.000Z (5 months ago)
- Last Synced: 2025-05-13T00:11:43.448Z (5 months ago)
- Topics: crawler, data, headless-chrome, indexer, rust, scraping, spider
- Language: Rust
- Homepage: https://spider.cloud
- Size: 6.37 MB
- Stars: 1,724
- Watchers: 16
- Forks: 144
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
- awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
README
# Spider
[](https://github.com/spider-rs/spider/actions)
[](https://crates.io/crates/spider)
[](https://docs.rs/spider)
[](https://github.com/spider-rs/spider)
[](https://discord.spider.cloud)[Website](https://spider.cloud) |
[Guides](https://spider.cloud/guides) |
[API Docs](https://docs.rs/spider/latest/spider) |
[Chat](https://discord.spider.cloud)A web crawler and scraper, building blocks for data curation workloads.
- Concurrent
- Streaming
- Decentralization
- Headless Chrome Rendering
- HTTP Proxies
- Cron Jobs
- Subscriptions
- Smart Mode
- Anti-Bot mitigation
- Disk persistence
- Privacy and Efficiency through Ad, Analytics, and Custom Tiered Network Blocking
- Blacklisting, Whitelisting, and Budgeting Depth
- Dynamic AI Prompt Scripting Headless with Step Caching
- CSS/Xpath Scraping with [spider_utils](./spider_utils/README.md#CSS_Scraping)
- HTML to markdown, text, and etc transformations with [spider_transformations](./spider_transformations/README.md)- [Changelog](CHANGELOG.md)
## Getting Started
The simplest way to get started is to use the [Spider Cloud](https://spider.cloud) hosted service. View the [spider](./spider/README.md) or [spider_cli](./spider_cli/README.md) directory for local installations. You can also use spider with Node.js using [spider-nodejs](https://github.com/spider-rs/spider-nodejs) and Python using [spider-py](https://github.com/spider-rs/spider-py).
## Benchmarks
See [BENCHMARKS](./benches/BENCHMARKS.md).
## Examples
See [EXAMPLES](./examples/).
## License
This project is licensed under the [MIT license].
[MIT license]: https://github.com/spider-rs/spider/blob/main/LICENSE
## Contributing
See [CONTRIBUTING](CONTRIBUTING.md).