Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brucedone/scrapy_demo
all kinds of scrapy demo
https://github.com/brucedone/scrapy_demo
cnbeta demo douban-image example imagespipeline kafak kafka mongodb oss pipeline scrapy scrapy-demo spider sqlalchemy
Last synced: 7 days ago
JSON representation
all kinds of scrapy demo
- Host: GitHub
- URL: https://github.com/brucedone/scrapy_demo
- Owner: BruceDone
- Created: 2015-11-17T12:28:46.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2023-01-31T11:24:21.000Z (about 2 years ago)
- Last Synced: 2025-01-17T11:10:40.746Z (15 days ago)
- Topics: cnbeta, demo, douban-image, example, imagespipeline, kafak, kafka, mongodb, oss, pipeline, scrapy, scrapy-demo, spider, sqlalchemy
- Language: Python
- Homepage: http://brucedone.com
- Size: 67.4 KB
- Stars: 163
- Watchers: 8
- Forks: 57
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scrapy_demo
this project scrapes a list of websites I used to crawl most often
if this project helped you, please give it a star, thanks :)# Spider list
* douban
* douban_oss
* googleplay
* cnbeta
* ka
* cnblogs# Project Feature
* `google play` uses the crawl spider and pymongo
* `douban` use the images pipeline to download image (use the headers in case of being banned), after finish it will output the txt file of item information
* `cnbeta` uses sqlalchmey to save items to mysql database (or other database if sqlalchemy supports)
* `ka` uses the kafka , this is a demo spider how to use the scrapy and kafka together , this spider will not close , if you push a message to the kafka ,the spider will start to crawl the url you just give
* `cnblogs` use the signal handler.
* `douban_oss` use the aliyun oss sdk upload the images pipeline download image to oss store.# How to use
for each project there is a run_spider.py script, just run it and enjoy :)```
python run_spider.py
```