Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/weizhiwen/spider_star

Last synced: 12 days ago
JSON representation

爬虫项目收藏

# spider_star
**个人学习他人爬虫的项目收藏。**

---

**1、微信公众号爬虫**

基于搜狗微信搜索的微信公众号爬虫接口，可以扩展成基于搜狗搜索的爬虫，返回结果是列表，每一项均是公众号具体信息字典。

GitHub地址：https://github.com/Chyroc/WechatSogou

**2、豆瓣读书爬虫**

可以爬下豆瓣读书标签下的所有图书，按评分排名依次存储，存储到Excel中，可方便大家筛选搜罗，比如筛选评价人数>1000的高分书籍；可依据不同的主题存储到Excel不同的Sheet ，采用User Agent伪装为浏览器进行爬取，并加入随机延时来更好的模仿浏览器行为，避免爬虫被封。

GitHub地址：https://github.com/lanbing510/DouBanSpider

**3、知乎爬虫**

此项目的功能是爬取知乎用户信息以及人际拓扑关系，爬虫框架使用scrapy，数据存储使用Mongo。

GitHub地址：https://github.com/LiuRoy/zhihu_spider

**4、Bilibili用户爬虫**

总数据数：20119918，抓取字段：用户id，昵称，性别，头像，等级，经验值，粉丝数，生日，地址，注册时间，签名，等级与经验值等。抓取之后生成B站用户数据报告。

GitHub地址：https://github.com/airingursb/bilibili-user

**5、中国知网爬虫**

设置检索条件后，执行src/CnkiSpider.py抓取数据，抓取数据存储在/data目录下，每个数据文件的第一行为字段名称。

GitHub地址：https://github.com/yanzhou/CnkiSpider 、https://github.com/yanzhou/CnkiSpider

**6、QQ 群爬虫**

批量抓取 QQ 群信息，包括群名称、群号、群人数、群主、群简介等内容，最终生成 XLS(X) / CSV 结果文件。

GitHub地址：https://github.com/caspartse/QQ-Groups-Spider

**7、机票爬虫**

基于Scrapy的机票爬虫，目前整合了国内两大机票网站（去哪儿 + 携程）。

GitHub地址：https://github.com/fankcoder/findtrip

**8、QQ空间爬虫**

包括日志、说说、个人信息等，一天可抓取 400 万条数据。

GitHub地址：https://github.com/LiuXingMing/QQSpider

**9、百度云盘爬虫**

GitHub地址：https://github.com/k1995/BaiduyunSpider

**10、网易云音乐爬虫**

GitHub地址：https://github.com/RitterHou/music-163

**11、CSDN博客爬虫**

GitHub地址：https://github.com/Kevinsss/csdn-spider

**12、慕课网视频爬虫**

GitHub地址：https://github.com/qiyeboy/spider_smooc