Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sivagao/jp-av-crawler
a scrapy crawler for jav library
https://github.com/sivagao/jp-av-crawler
Last synced: 3 months ago
JSON representation
a scrapy crawler for jav library
- Host: GitHub
- URL: https://github.com/sivagao/jp-av-crawler
- Owner: sivagao
- Created: 2014-04-28T13:26:45.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-04-28T13:27:28.000Z (over 10 years ago)
- Last Synced: 2024-04-11T12:59:36.046Z (9 months ago)
- Language: Python
- Size: 105 KB
- Stars: 14
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# QUICK NOTE
## USAGE
- 安装依赖 pip install -r requirments.txt
- 安装mongodb, 并且启动`nohup mongod&`
- scrapy crawl jv_most_wanted_item (添加了download_delayer为1.2s左右,可以适当更改)## 爬取的数据
- actor - 演员 type: list
- title - 片名 type: string
- category - 类型 type: list
- slug - 编号识别码 type: string
- downloadurl - magnet 下载地址 type: string `magnet link`
- preview - 封面 type: string `image src`## 制定爬虫
- 到spiders目录中copy一份,然后修改
- SgmlLinkExtractor - 来提取要process的link(如果详情页)
- process handler - 具体的提取数据的handler(tips: scrapy shell ), 用hsx来xpath或正则去匹配要的数据