Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pmuens/crawler
Multi-threaded Web crawler with support for custom fetching and persisting logic
https://github.com/pmuens/crawler
crawler crawler-engine rust rust-lang web-crawler web-crawling
Last synced: 29 days ago
JSON representation
Multi-threaded Web crawler with support for custom fetching and persisting logic
- Host: GitHub
- URL: https://github.com/pmuens/crawler
- Owner: pmuens
- Created: 2020-10-18T09:31:56.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-10-18T09:32:16.000Z (over 4 years ago)
- Last Synced: 2024-10-20T04:44:46.430Z (3 months ago)
- Topics: crawler, crawler-engine, rust, rust-lang, web-crawler, web-crawling
- Language: Rust
- Homepage:
- Size: 53.7 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# `Crawler`
Multi-threaded Web crawler with support for custom fetching and persisting logic.
## Usage
**NOTE:** See the crates documentation for more info.
### As a binary
The following command will run the crawler with `10` threads, starting with the URL `http://example.com` and storing the visited websites as files in the `./crawlings` directory.
```shell script
cargo run --bin crawler http://example.com ./crawlings 10
```### As a library
```rust
extern crate crawler;use crawler::traits::{Fetch, Persist};
use crawler::crawler::Crawler;// ... trait implementations for `Fetch` and `Persist`
fn main() {
let url = "http://example.com";
let num_threads: usize = 2;let persister = YourPersister::new();
let fetcher = YourFetcher::new();let mut crawler = Crawler::new(persister, fetcher, num_threads);
let _result = crawler.start(url);
}
```