Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/spider-rs/spider

The fastest web crawler written in Rust. Maintained by @a11ywatch.
https://github.com/spider-rs/spider

ai-scraping crawler headless-chrome indexer llm-crawler rust scraping spider web-crawler

Last synced: 16 days ago
JSON representation

The fastest web crawler written in Rust. Maintained by @a11ywatch.

Awesome Lists containing this project

README

        

# Spider

[![Build Status](https://github.com/spider-rs/spider/actions/workflows/rust.yml/badge.svg)](https://github.com/spider-rs/spider/actions)
[![Crates.io](https://img.shields.io/crates/v/spider.svg)](https://crates.io/crates/spider)
[![Documentation](https://docs.rs/spider/badge.svg)](https://docs.rs/spider)
[![Rust](https://img.shields.io/badge/rust-1.56.1%2B-blue.svg?maxAge=3600)](https://github.com/spider-rs/spider)
[![Discord chat](https://img.shields.io/discord/1254585814021832755.svg?logo=discord&style=flat-square)](https://discord.spider.cloud)

[Website](https://spider.cloud) |
[Guides](https://spider.cloud/guides) |
[API Docs](https://docs.rs/spider/latest/spider) |
[Chat](https://discord.spider.cloud)

A web crawler and scraper, building blocks for data curation workloads.

- Concurrent
- Streaming
- Decentralization
- Headless Chrome Rendering
- HTTP Proxies
- Cron Jobs
- Subscriptions
- Smart Mode
- Anti-Bot mitigation
- Blacklisting, Whitelisting, and Budgeting Depth
- Dynamic AI Prompt Scripting Headless with Step Caching
- CSS/Xpath Scraping with [spider_utils](./spider_utils/README.md#CSS_Scraping)
- HTML to markdown, text, and etc transformations with [spider_transformations](./spider_transformations/README.md)

- [Changelog](CHANGELOG.md)

## Getting Started

The simplest way to get started is to use the [Spider Cloud](https://spider.cloud) hosted service. View the [spider](./spider/README.md) or [spider_cli](./spider_cli/README.md) directory for local installations. You can also use spider with Node.js using [spider-nodejs](https://github.com/spider-rs/spider-nodejs) and Python using [spider-py](https://github.com/spider-rs/spider-py).

## Benchmarks

See [BENCHMARKS](./benches/BENCHMARKS.md).

## Examples

See [EXAMPLES](./examples/).

## License

This project is licensed under the [MIT license].

[MIT license]: https://github.com/spider-rs/spider/blob/main/LICENSE

## Contributing

See [CONTRIBUTING](CONTRIBUTING.md).