Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spk/maman
Rust Web Crawler saving pages on Redis
https://github.com/spk/maman
crawler http spider web web-crawler
Last synced: 5 days ago
JSON representation
Rust Web Crawler saving pages on Redis
- Host: GitHub
- URL: https://github.com/spk/maman
- Owner: spk
- License: mit
- Created: 2016-05-02T22:04:44.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2021-06-13T12:14:10.000Z (over 3 years ago)
- Last Synced: 2024-10-03T12:24:01.307Z (about 1 month ago)
- Topics: crawler, http, spider, web, web-crawler
- Language: Rust
- Size: 203 KB
- Stars: 43
- Watchers: 5
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Maman
Maman is a Rust Web Crawler saving pages on Redis.
Pages are send to list `:queue:maman` using
[Sidekiq job format](https://github.com/mperham/sidekiq/wiki/Job-Format)``` json
{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
"document":"",
"urls": ["https://example.net/new"],
"headers": {"content-type": "text/html"},
"url": "https://example.net/"
}
}
```## Dependencies
* [Redis](http://redis.io/)
## Installation
### With cargo
```
cargo install maman
```### With [just](https://github.com/casey/just)
```
PREFIX=~/.local just install
```## Usage
```
maman URL [LIMIT] [MIME_TYPES]
````LIMIT` must be an integer or `0` is the default, meaning no limit.
## Environment variables
### Defaults
* MAMAN_ENV=development
* REDIS_URL="redis://127.0.0.1/"### Others
* RUST_LOG=maman=info
## LICENSE
The MIT License
Copyright (c) 2016-2021 Laurent Arnoud
---
[![Build](https://img.shields.io/github/workflow/status/spk/maman/CI/master.svg)](https://github.com/spk/maman/actions)
[![Version](https://img.shields.io/crates/v/maman.svg)](https://crates.io/crates/maman)
[![Documentation](https://img.shields.io/badge/doc-rustdoc-blue.svg)](https://docs.rs/maman/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT "MIT")
[![Dependency status](https://deps.rs/repo/github/spk/maman/status.svg)](https://deps.rs/repo/github/spk/maman)