Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spider-rs/spider
The fastest web crawler written in Rust. Maintained by @a11ywatch.
https://github.com/spider-rs/spider
ai-scraping crawler headless-chrome indexer llm-crawler rust scraping spider web-crawler
Last synced: 16 days ago
JSON representation
The fastest web crawler written in Rust. Maintained by @a11ywatch.
- Host: GitHub
- URL: https://github.com/spider-rs/spider
- Owner: spider-rs
- License: mit
- Created: 2018-01-07T18:49:20.000Z (almost 7 years ago)
- Default Branch: main
- Last Pushed: 2024-07-24T18:56:02.000Z (4 months ago)
- Last Synced: 2024-07-25T10:53:18.888Z (4 months ago)
- Topics: ai-scraping, crawler, headless-chrome, indexer, llm-crawler, rust, scraping, spider, web-crawler
- Language: Rust
- Homepage: https://spider.cloud
- Size: 2.06 MB
- Stars: 842
- Watchers: 13
- Forks: 78
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
- awesome-rust-list - Spider - rs/spider?style=social"/> : Spider: The fastest web crawler and indexer. [docs.rs/spider/](https://docs.rs/spider/latest/spider/) (Web Crawler)
README
# Spider
[![Build Status](https://github.com/spider-rs/spider/actions/workflows/rust.yml/badge.svg)](https://github.com/spider-rs/spider/actions)
[![Crates.io](https://img.shields.io/crates/v/spider.svg)](https://crates.io/crates/spider)
[![Documentation](https://docs.rs/spider/badge.svg)](https://docs.rs/spider)
[![Rust](https://img.shields.io/badge/rust-1.56.1%2B-blue.svg?maxAge=3600)](https://github.com/spider-rs/spider)
[![Discord chat](https://img.shields.io/discord/1254585814021832755.svg?logo=discord&style=flat-square)](https://discord.spider.cloud)[Website](https://spider.cloud) |
[Guides](https://spider.cloud/guides) |
[API Docs](https://docs.rs/spider/latest/spider) |
[Chat](https://discord.spider.cloud)A web crawler and scraper, building blocks for data curation workloads.
- Concurrent
- Streaming
- Decentralization
- Headless Chrome Rendering
- HTTP Proxies
- Cron Jobs
- Subscriptions
- Smart Mode
- Anti-Bot mitigation
- Blacklisting, Whitelisting, and Budgeting Depth
- Dynamic AI Prompt Scripting Headless with Step Caching
- CSS/Xpath Scraping with [spider_utils](./spider_utils/README.md#CSS_Scraping)
- HTML to markdown, text, and etc transformations with [spider_transformations](./spider_transformations/README.md)- [Changelog](CHANGELOG.md)
## Getting Started
The simplest way to get started is to use the [Spider Cloud](https://spider.cloud) hosted service. View the [spider](./spider/README.md) or [spider_cli](./spider_cli/README.md) directory for local installations. You can also use spider with Node.js using [spider-nodejs](https://github.com/spider-rs/spider-nodejs) and Python using [spider-py](https://github.com/spider-rs/spider-py).
## Benchmarks
See [BENCHMARKS](./benches/BENCHMARKS.md).
## Examples
See [EXAMPLES](./examples/).
## License
This project is licensed under the [MIT license].
[MIT license]: https://github.com/spider-rs/spider/blob/main/LICENSE
## Contributing
See [CONTRIBUTING](CONTRIBUTING.md).