Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/howie6879/ruia

Async Python 3.6+ web scraping micro-framework based on asyncio
https://github.com/howie6879/ruia

aiohttp asyncio asyncio-spider crawler crawling-framework middlewares python python-ruia ruia spider uvloop

Last synced: about 2 months ago
JSON representation

Async Python 3.6+ web scraping micro-framework based on asyncio

Awesome Lists containing this project

README

        

Ruia logo


Ruia


πŸ•ΈοΈ Async Python 3.6+ web scraping micro-framework based on asyncio.


⚑ Write less, run faster.


travis
codecov
PyPI - Python Version
PyPI
Downloads
gitter

![](https://raw.githubusercontent.com/howie6879/ruia/master/docs/images/ruia_demo.png)

## Overview

Ruia is an async web scraping micro-framework, written with `asyncio` and `aiohttp`,
aims to make crawling url as convenient as possible.

**Write less, run faster**:

- Documentation: [δΈ­ζ–‡ζ–‡ζ‘£][doc_cn] |[documentation][doc_en]
- Organization: [python-ruia][Organization]
- Plugin: [awesome-ruia](https://github.com/python-ruia/awesome-ruia)(Any contributions you make are **greatly appreciated**!)

## Features

- **Easy**: Declarative programming
- **Fast**: Powered by asyncio
- **Extensible**: Middlewares and plugins
- **Powerful**: JavaScript support

## Installation

``` shell
# For Linux & Mac
pip install -U ruia[uvloop]

# For Windows
pip install -U ruia

# New features
pip install git+https://github.com/howie6879/ruia
```

## Tutorials

1. [Overview](https://docs.python-ruia.org/en/tutorials/overview.html)
2. [Installation](https://docs.python-ruia.org/en/tutorials/installation.html)
3. [Define Data Items](https://docs.python-ruia.org/en/tutorials/item.html)
4. [Spider Control](https://docs.python-ruia.org/en/tutorials/spider.html)
5. [Request & Response](https://docs.python-ruia.org/en/tutorials/request.html)
6. [Customize Middleware](https://docs.python-ruia.org/en/tutorials/middleware.html)
7. [Write a Plugins](https://docs.python-ruia.org/en/tutorials/plugins.html)

## TODO

- [x] Cache for debug, to decreasing request limitation, [ruia-cache](https://github.com/python-ruia/ruia-cache)
- [x] Provide an easy way to debug the script, [ruia-shell](https://github.com/python-ruia/ruia-shell)
- [ ] Distributed crawling/scraping

## Contribution

Ruia is still under developing, feel free to open issues and pull requests:

- Report or fix bugs
- Require or publish plugins
- Write or fix documentation
- Add test cases











!!!Notice: We use [black](https://github.com/psf/black) to format the code.

## Thanks

- [aiohttp](https://github.com/aio-libs/aiohttp/)
- [demiurge](https://github.com/matiasb/demiurge)

[doc_cn]: https://www.howie6879.cn/ruia/
[doc_en]: https://howie6879.github.io/ruia/
[Awesome]: https://github.com/python-ruia/awesome-ruia
[Organization]: https://github.com/python-ruia