Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/spk/maman

Rust Web Crawler saving pages on Redis
https://github.com/spk/maman

crawler http spider web web-crawler

Last synced: 5 days ago
JSON representation

Rust Web Crawler saving pages on Redis

Awesome Lists containing this project

README

        

# Maman

Maman is a Rust Web Crawler saving pages on Redis.

Pages are send to list `:queue:maman` using
[Sidekiq job format](https://github.com/mperham/sidekiq/wiki/Job-Format)

``` json
{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
"document":"",
"urls": ["https://example.net/new"],
"headers": {"content-type": "text/html"},
"url": "https://example.net/"
}
}
```

## Dependencies

* [Redis](http://redis.io/)

## Installation

### With cargo

```
cargo install maman
```

### With [just](https://github.com/casey/just)

```
PREFIX=~/.local just install
```

## Usage

```
maman URL [LIMIT] [MIME_TYPES]
```

`LIMIT` must be an integer or `0` is the default, meaning no limit.

## Environment variables

### Defaults

* MAMAN_ENV=development
* REDIS_URL="redis://127.0.0.1/"

### Others

* RUST_LOG=maman=info

## LICENSE

The MIT License

Copyright (c) 2016-2021 Laurent Arnoud

---
[![Build](https://img.shields.io/github/workflow/status/spk/maman/CI/master.svg)](https://github.com/spk/maman/actions)
[![Version](https://img.shields.io/crates/v/maman.svg)](https://crates.io/crates/maman)
[![Documentation](https://img.shields.io/badge/doc-rustdoc-blue.svg)](https://docs.rs/maman/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT "MIT")
[![Dependency status](https://deps.rs/repo/github/spk/maman/status.svg)](https://deps.rs/repo/github/spk/maman)