https://github.com/eugen1j/aioscrapy
Python asynchronous library for web scrapping
https://github.com/eugen1j/aioscrapy
asyncio crawler python-crawler python37 webscraper
Last synced: 3 months ago
JSON representation
Python asynchronous library for web scrapping
- Host: GitHub
- URL: https://github.com/eugen1j/aioscrapy
- Owner: eugen1j
- License: mit
- Created: 2019-05-12T14:51:30.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-08-24T09:34:22.000Z (over 4 years ago)
- Last Synced: 2025-10-09T09:16:32.109Z (3 months ago)
- Topics: asyncio, crawler, python-crawler, python37, webscraper
- Language: Python
- Size: 39.1 KB
- Stars: 10
- Watchers: 3
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Python async library for web scraping
[](https://badge.fury.io/py/aioscrapy)
[](https://github.com/eugen1j/aioscrapy/blob/master/LICENSE)
[](https://travis-ci.com/github/eugen1j/aioscrapy)
[](https://codecov.io/gh/eugen1j/aioscrapy)
[](https://codebeat.co/projects/github-com-eugen1j-aioscrapy-master)
[](https://www.codacy.com/app/eugen1j/aioscrapy?utm_source=github.com&utm_medium=referral&utm_content=eugen1j/aioscrapy&utm_campaign=Badge_Grade)
## Installing
pip install aioscrapy
## Usage
Plain text scraping
```python
import asyncio
import json
from aioscrapy import Client, WebTextClient, SingleSessionPool, Dispatcher, SimpleWorker
class CustomClient(Client[str, dict]):
def __init__(self, client: WebTextClient):
self._client = client
async def fetch(self, key: str) -> dict:
data = await self._client.fetch(key)
return json.loads(data)
async def main():
pool = SingleSessionPool()
dispatcher = Dispatcher(['https://httpbin.org/get'])
client = CustomClient(WebTextClient(pool))
worker = SimpleWorker(dispatcher, client)
result = await worker.run()
return result
loop = asyncio.get_event_loop()
print(loop.run_until_complete(main()))
```
Byte content downloading
```python
import asyncio
from aioscrapy import Client, WebByteClient, SingleSessionPool, Dispatcher, SimpleWorker
class CustomClient(Client[str, bytes]):
def __init__(self, client: WebByteClient):
self._client = client
async def fetch(self, key: str) -> bytes:
data = await self._client.fetch(key)
return data
async def main():
pool = SingleSessionPool()
dispatcher = Dispatcher(['https://httpbin.org/image'])
client = CustomClient(WebByteClient(pool))
worker = SimpleWorker(dispatcher, client)
result = await worker.run()
return result
loop = asyncio.get_event_loop()
data: dict = loop.run_until_complete(main())
for url, byte_content in data.items():
print(url + ": " + str(len(byte_content)) + " bytes")
```