https://github.com/kuaikuaikim/scrapy-redis-examples

A scrapy project integrated with redis(scrapy集成redis实例)
https://github.com/kuaikuaikim/scrapy-redis-examples

Last synced: 5 months ago
JSON representation

A scrapy project integrated with redis(scrapy集成redis实例)

Host: GitHub
URL: https://github.com/kuaikuaikim/scrapy-redis-examples
Owner: kuaikuaikim
Created: 2015-02-03T15:26:56.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2015-02-16T06:59:08.000Z (over 10 years ago)
Last Synced: 2025-04-30T20:05:27.427Z (5 months ago)
Language: Python
Homepage:
Size: 445 KB
Stars: 4
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

scrapy-redis-examples
==============

A scrapy project integrated with redis. we can use redis to do many things during scrapy work.
**Rember dont use it to do anything illegal!**

####Usage
sudo apt-get install scrapy-0.2x
sudo apt-get install redis-server
sudo apt-get update
git clone ...

#better use scrapy shell before craw
scrapy shell http://specify_address/xxx.html

cd scrapy-redis-examples/hrtencent & scrapy crawl hrtencent
Finally you can see result files in the storage folder

####New Features
1.Rembere the scrapy crawled status. Make sure every page we just craw once.
2.Improve the scrapy performance. It works faster with redis. I rewrite some core module spider logic to the redis.
3.Ouput scrapy results with distributed small files.Avoid losing all results during craw pages when been interrupted.
4.automatically downloading page images

==============
一个scrapy集成redis的实例。我们可以用redis辅助scrapy很多功能模块，例如过滤，存储，性能。

####新特性
1.记住爬虫状态，确保每个页面只抓取一次。
2.提高了scrapy爬虫性能，本人利用redis重写了scrapy的核心模块。
3.爬虫结果分成多个小文件,防止程序中断丢失爬虫结果。
4.自动下载页面图片

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kuaikuaikim/scrapy-redis-examples

Awesome Lists containing this project

README