Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pmuens/crawler

Multi-threaded Web crawler with support for custom fetching and persisting logic
https://github.com/pmuens/crawler

crawler crawler-engine rust rust-lang web-crawler web-crawling

Last synced: 29 days ago
JSON representation

Multi-threaded Web crawler with support for custom fetching and persisting logic

Awesome Lists containing this project

README

        

# `Crawler`

Multi-threaded Web crawler with support for custom fetching and persisting logic.

## Usage

**NOTE:** See the crates documentation for more info.

### As a binary

The following command will run the crawler with `10` threads, starting with the URL `http://example.com` and storing the visited websites as files in the `./crawlings` directory.

```shell script
cargo run --bin crawler http://example.com ./crawlings 10
```

### As a library

```rust
extern crate crawler;

use crawler::traits::{Fetch, Persist};
use crawler::crawler::Crawler;

// ... trait implementations for `Fetch` and `Persist`

fn main() {
let url = "http://example.com";
let num_threads: usize = 2;

let persister = YourPersister::new();
let fetcher = YourFetcher::new();

let mut crawler = Crawler::new(persister, fetcher, num_threads);
let _result = crawler.start(url);
}
```