https://github.com/auxten/douban-crawler
https://github.com/auxten/douban-crawler
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/auxten/douban-crawler
- Owner: auxten
- License: apache-2.0
- Created: 2016-11-11T08:24:36.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-11-14T06:48:01.000Z (over 8 years ago)
- Last Synced: 2025-01-09T07:29:50.378Z (5 months ago)
- Size: 5.86 KB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# douban-crawler
## 功能
* 抓取豆瓣 https://www.douban.com/group/beijingzufang/ 小组第一页内所有发帖者的头像,并以头像用户的id作为文件名存储
* 例如,这个帖子 https://www.douban.com/group/topic/49517104/ 的发帖者"梅子", 头像应该是https://img3.doubanio.com/icon/u84097702-2.jpg 应该存储为"梅子.jpg"## 加分项
* 由于爬虫网络可能会阻塞,请选用多线程/多进程/异步的方式来加速抓取
* 可以考虑用docker来封装爬虫,增强易用性
* 完善的使用文档