https://github.com/kuaikuaikim/scrapy-redis-examples
A scrapy project integrated with redis(scrapy集成redis实例)
https://github.com/kuaikuaikim/scrapy-redis-examples
Last synced: 5 months ago
JSON representation
A scrapy project integrated with redis(scrapy集成redis实例)
- Host: GitHub
- URL: https://github.com/kuaikuaikim/scrapy-redis-examples
- Owner: kuaikuaikim
- Created: 2015-02-03T15:26:56.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2015-02-16T06:59:08.000Z (over 10 years ago)
- Last Synced: 2025-04-30T20:05:27.427Z (5 months ago)
- Language: Python
- Homepage:
- Size: 445 KB
- Stars: 4
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
scrapy-redis-examples
==============A scrapy project integrated with redis. we can use redis to do many things during scrapy work.
**Rember dont use it to do anything illegal!**####Usage
sudo apt-get install scrapy-0.2x
sudo apt-get install redis-server
sudo apt-get update
git clone ...
#better use scrapy shell before craw
scrapy shell http://specify_address/xxx.html
cd scrapy-redis-examples/hrtencent & scrapy crawl hrtencent
Finally you can see result files in the storage folder####New Features
1.Rembere the scrapy crawled status. Make sure every page we just craw once.
2.Improve the scrapy performance. It works faster with redis. I rewrite some core module spider logic to the redis.
3.Ouput scrapy results with distributed small files.Avoid losing all results during craw pages when been interrupted.
4.automatically downloading page images==============
一个scrapy集成redis的实例。我们可以用redis辅助scrapy很多功能模块,例如过滤,存储,性能。####新特性
1.记住爬虫状态,确保每个页面只抓取一次。
2.提高了scrapy爬虫性能,本人利用redis重写了scrapy的核心模块。
3.爬虫结果分成多个小文件,防止程序中断丢失爬虫结果。
4.自动下载页面图片