Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spider-rs/spider
A web crawler and scraper for Rust
https://github.com/spider-rs/spider
crawler data headless-chrome indexer rust spider web-crawler web-scraper web-scraping
Last synced: 4 days ago
JSON representation
A web crawler and scraper for Rust
- Host: GitHub
- URL: https://github.com/spider-rs/spider
- Owner: spider-rs
- License: mit
- Created: 2018-01-07T18:49:20.000Z (about 7 years ago)
- Default Branch: main
- Last Pushed: 2025-01-07T23:56:50.000Z (30 days ago)
- Last Synced: 2025-01-08T09:17:19.739Z (29 days ago)
- Topics: crawler, data, headless-chrome, indexer, rust, spider, web-crawler, web-scraper, web-scraping
- Language: Rust
- Homepage: https://spider.cloud
- Size: 4.78 MB
- Stars: 1,345
- Watchers: 15
- Forks: 113
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
- awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
README
# Spider
[![Build Status](https://github.com/spider-rs/spider/actions/workflows/rust.yml/badge.svg)](https://github.com/spider-rs/spider/actions)
[![Crates.io](https://img.shields.io/crates/v/spider.svg)](https://crates.io/crates/spider)
[![Documentation](https://docs.rs/spider/badge.svg)](https://docs.rs/spider)
[![Rust](https://img.shields.io/badge/rust-1.56.1%2B-blue.svg?maxAge=3600)](https://github.com/spider-rs/spider)
[![Discord chat](https://img.shields.io/discord/1254585814021832755.svg?logo=discord&style=flat-square)](https://discord.spider.cloud)[Website](https://spider.cloud) |
[Guides](https://spider.cloud/guides) |
[API Docs](https://docs.rs/spider/latest/spider) |
[Chat](https://discord.spider.cloud)A web crawler and scraper, building blocks for data curation workloads.
- Concurrent
- Streaming
- Decentralization
- Headless Chrome Rendering
- HTTP Proxies
- Cron Jobs
- Subscriptions
- Smart Mode
- Anti-Bot mitigation
- Disk persistence
- Privacy and Efficiency through Ad, Analytics, and Custom Tiered Network Blocking
- Blacklisting, Whitelisting, and Budgeting Depth
- Dynamic AI Prompt Scripting Headless with Step Caching
- CSS/Xpath Scraping with [spider_utils](./spider_utils/README.md#CSS_Scraping)
- HTML to markdown, text, and etc transformations with [spider_transformations](./spider_transformations/README.md)- [Changelog](CHANGELOG.md)
## Getting Started
The simplest way to get started is to use the [Spider Cloud](https://spider.cloud) hosted service. View the [spider](./spider/README.md) or [spider_cli](./spider_cli/README.md) directory for local installations. You can also use spider with Node.js using [spider-nodejs](https://github.com/spider-rs/spider-nodejs) and Python using [spider-py](https://github.com/spider-rs/spider-py).
## Benchmarks
See [BENCHMARKS](./benches/BENCHMARKS.md).
## Examples
See [EXAMPLES](./examples/).
## License
This project is licensed under the [MIT license].
[MIT license]: https://github.com/spider-rs/spider/blob/main/LICENSE
## Contributing
See [CONTRIBUTING](CONTRIBUTING.md).