Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/howie6879/ruia
Async Python 3.6+ web scraping micro-framework based on asyncio
https://github.com/howie6879/ruia
aiohttp asyncio asyncio-spider crawler crawling-framework middlewares python python-ruia ruia spider uvloop
Last synced: 6 days ago
JSON representation
Async Python 3.6+ web scraping micro-framework based on asyncio
- Host: GitHub
- URL: https://github.com/howie6879/ruia
- Owner: howie6879
- License: apache-2.0
- Created: 2018-07-10T01:12:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-01T08:23:28.000Z (over 1 year ago)
- Last Synced: 2024-10-29T15:38:00.701Z (3 months ago)
- Topics: aiohttp, asyncio, asyncio-spider, crawler, crawling-framework, middlewares, python, python-ruia, ruia, spider, uvloop
- Language: Python
- Homepage: https://www.howie6879.com/ruia/
- Size: 4.46 MB
- Stars: 1,749
- Watchers: 42
- Forks: 181
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-asyncio - ruia - An async web scraping micro-framework based on asyncio. (Misc)
- awesome-asyncio - ruia - An async web scraping micro-framework based on asyncio. (Misc)
README
Ruia
πΈοΈ Async Python 3.6+ web scraping micro-framework based on asyncio.
β‘ Write less, run faster.
![](https://raw.githubusercontent.com/howie6879/ruia/master/docs/images/ruia_demo.png)
## Overview
Ruia is an async web scraping micro-framework, written with `asyncio` and `aiohttp`,
aims to make crawling url as convenient as possible.**Write less, run faster**:
- Documentation: [δΈζζζ‘£][doc_cn] |[documentation][doc_en]
- Organization: [python-ruia][Organization]
- Plugin: [awesome-ruia](https://github.com/python-ruia/awesome-ruia)(Any contributions you make are **greatly appreciated**!)## Features
- **Easy**: Declarative programming
- **Fast**: Powered by asyncio
- **Extensible**: Middlewares and plugins
- **Powerful**: JavaScript support## Installation
``` shell
# For Linux & Mac
pip install -U ruia[uvloop]# For Windows
pip install -U ruia# New features
pip install git+https://github.com/howie6879/ruia
```## Tutorials
1. [Overview](https://docs.python-ruia.org/en/tutorials/overview.html)
2. [Installation](https://docs.python-ruia.org/en/tutorials/installation.html)
3. [Define Data Items](https://docs.python-ruia.org/en/tutorials/item.html)
4. [Spider Control](https://docs.python-ruia.org/en/tutorials/spider.html)
5. [Request & Response](https://docs.python-ruia.org/en/tutorials/request.html)
6. [Customize Middleware](https://docs.python-ruia.org/en/tutorials/middleware.html)
7. [Write a Plugins](https://docs.python-ruia.org/en/tutorials/plugins.html)## TODO
- [x] Cache for debug, to decreasing request limitation, [ruia-cache](https://github.com/python-ruia/ruia-cache)
- [x] Provide an easy way to debug the script, [ruia-shell](https://github.com/python-ruia/ruia-shell)
- [ ] Distributed crawling/scraping## Contribution
Ruia is still under developing, feel free to open issues and pull requests:
- Report or fix bugs
- Require or publish plugins
- Write or fix documentation
- Add test cases!!!Notice: We use [black](https://github.com/psf/black) to format the code.
## Thanks
- [aiohttp](https://github.com/aio-libs/aiohttp/)
- [demiurge](https://github.com/matiasb/demiurge)[doc_cn]: https://www.howie6879.cn/ruia/
[doc_en]: https://howie6879.github.io/ruia/
[Awesome]: https://github.com/python-ruia/awesome-ruia
[Organization]: https://github.com/python-ruia