Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/python-ruia/ruia-motor
A Ruia plugin that uses the motor to store data to MongoDB
https://github.com/python-ruia/ruia-motor
motor ruia
Last synced: 3 months ago
JSON representation
A Ruia plugin that uses the motor to store data to MongoDB
- Host: GitHub
- URL: https://github.com/python-ruia/ruia-motor
- Owner: python-ruia
- License: mit
- Created: 2019-02-14T05:15:32.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-06-30T08:05:15.000Z (over 2 years ago)
- Last Synced: 2024-10-12T18:11:28.893Z (3 months ago)
- Topics: motor, ruia
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ruia-motor
A [Ruia](https://github.com/howie6879/ruia) plugin that uses the motor to store data
```text
Notice: Works on ruia >= 0.8.0
```### Installation
```shell
pip install -U ruia-motor
```### Usage
`ruia-motor` will be automatically store data to mongodb:
```python
from ruia import AttrField, Item, Response, Spider, TextFieldfrom ruia_motor import RuiaMotorInsert, RuiaMotorUpdate, init_spider
class HackerNewsItem(Item):
target_item = TextField(css_select="tr.athing")
title = TextField(css_select="a.storylink")
url = AttrField(css_select="a.storylink", attr="href")async def clean_title(self, value):
return value.strip()class HackerNewsSpider(Spider):
start_urls = ["https://news.ycombinator.com/news?p=1"]
aiohttp_kwargs = {"proxy": "http://0.0.0.0:1087"}async def parse(self, response: Response):
async for item in HackerNewsItem.get_items(html=await response.text()):
# Update data
# https://motor.readthedocs.io/en/stable/api-asyncio/asyncio_motor_collection.html#motor.motor_asyncio.AsyncIOMotorCollection.update_one
yield RuiaMotorUpdate(
collection="hn_demo",
filter={"title": item.title},
update={"$set": item.results},
upsert=True,
)
# Insert data
# https://motor.readthedocs.io/en/stable/api-asyncio/asyncio_motor_collection.html#motor.motor_asyncio.AsyncIOMotorCollection.insert_one
# yield RuiaMotorInsert(collection="hn_demo", data=item.results)async def init_plugins_after_start(spider_ins):
spider_ins.mongodb_config = {"host": "127.0.0.1", "port": 27017, "db": "ruia_motor"}
init_spider(spider_ins=spider_ins)if __name__ == "__main__":
HackerNewsSpider.start(after_start=init_plugins_after_start)
```Enjoy it :)