https://github.com/librauee/reptile
🏀 Python3 网络爬虫实战(部分含详细教程)猫眼 腾讯视频 豆瓣 研招网 微博 笔趣阁小说 百度热点 B站 CSDN 网易云阅读 阿里文学 百度股票 今日头条 微信公众号 网易云音乐 拉勾 有道 unsplash 实习僧 汽车之家 英雄联盟盒子 大众点评 链家 LPL赛程 台风 梦幻西游、阴阳师藏宝阁 天气 牛客网 百度文库 睡前故事 知乎 Wish
https://github.com/librauee/reptile
python3 requests scrapy spider
Last synced: 25 days ago
JSON representation
🏀 Python3 网络爬虫实战(部分含详细教程)猫眼 腾讯视频 豆瓣 研招网 微博 笔趣阁小说 百度热点 B站 CSDN 网易云阅读 阿里文学 百度股票 今日头条 微信公众号 网易云音乐 拉勾 有道 unsplash 实习僧 汽车之家 英雄联盟盒子 大众点评 链家 LPL赛程 台风 梦幻西游、阴阳师藏宝阁 天气 牛客网 百度文库 睡前故事 知乎 Wish
- Host: GitHub
- URL: https://github.com/librauee/reptile
- Owner: librauee
- Created: 2018-04-01T07:51:54.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2021-04-19T23:39:02.000Z (about 4 years ago)
- Last Synced: 2025-05-15T23:03:53.715Z (25 days ago)
- Topics: python3, requests, scrapy, spider
- Language: Python
- Homepage:
- Size: 7.08 MB
- Stars: 1,655
- Watchers: 53
- Forks: 514
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README


# Spider Learning* **Language** : Python3
* **Content** : 一些爬虫的学习实例和自己的爬虫实战汇总,包含入门阶段和中级阶段的两阶段实战内容,技术手段包括XPath、BeautifulSoup、正则表达式、Ajax异步加载、代理IP、多线程、抓包工具、字体反爬、 JS逆向、Scrapy框架、反调试、验证码等。
* **Notice** : 欢迎关注我的微信公众号[](https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=MzkyMTAwMjQ4NA==&scene=124#wechat_redirect),与我一起成长~
* 内含大量Python学习资源,电子书,视频,扫码关注即可## 入门阶段
* 推荐嵩天教授的Python语言课和爬虫课入门,下面是课程的慕课链接
* [Python语言程序设计](https://www.icourse163.org/course/BIT-268001)
* [Python网络爬虫与信息提取](https://www.icourse163.org/course/BIT-1001870001)
* 因为网页代码的变动,课程内的部分爬虫都无法正确爬取内容,理解学习爬虫技术即可
* [戳我看课程的爬虫代码](https://github.com/librauee/Reptile/tree/master/BITcourse)
* 下面是一些重要的爬虫技术手段,有些代码辅以文章,可以拉到底部表格查阅~
### XPath* [LPL实时比赛信息](https://github.com/librauee/Reptile/blob/master/LPL/lpl.py)
* [天气信息](https://github.com/librauee/Reptile/blob/master/story/weather.py)
* [研招网](https://github.com/librauee/Reptile/tree/master/%E7%A0%94%E6%8B%9B%E7%BD%91)
* [牛客网](https://github.com/librauee/Reptile/tree/master/%E7%89%9B%E5%AE%A2%E7%BD%91)
### BeautifulSoup
* [睡前小故事](https://github.com/librauee/Reptile/blob/master/story/story.py)
* [英文短文](https://github.com/librauee/Reptile/blob/master/story/English_story.py)
* [拉钩](https://github.com/librauee/Reptile/blob/master/拉钩)
* [百度热点](https://github.com/librauee/Reptile/blob/master/%E7%99%BE%E5%BA%A6%E7%83%AD%E7%82%B9/baidu_hotspot.py)
* [笔趣阁](https://github.com/librauee/Reptile/blob/master/%E7%AC%94%E8%B6%A3%E9%98%81/Novel.py)
### 正则表达式
* [公交信息](https://github.com/librauee/Reptile/tree/master/%E5%85%AC%E4%BA%A4)
* [网易云阅读](https://github.com/librauee/Reptile/tree/master/%E7%BD%91%E6%98%93%E4%BA%91%E9%98%85%E8%AF%BB)
### Ajax异步加载
* [台风历史信息](https://github.com/librauee/Reptile/tree/master/Typhoon)
* [B站排行榜小视频](https://github.com/librauee/Reptile/blob/master/Bilibili)
* [网易云音乐评论](https://github.com/librauee/Reptile/tree/master/%E7%BD%91%E6%98%93%E4%BA%91%E9%9F%B3%E4%B9%90)
* [腾讯视频弹幕](https://github.com/librauee/Reptile/tree/master/%E8%85%BE%E8%AE%AF%E8%A7%86%E9%A2%91)
### 代理IP* [免费代理IP](https://github.com/librauee/Reptile/tree/master/%E4%BB%A3%E7%90%86IP)
* [阴阳师藏宝阁](https://github.com/librauee/Reptile/blob/master/%E9%98%B4%E9%98%B3%E5%B8%88/yys_cbg.py)### 多线程
* [今日头条图片](https://github.com/librauee/Reptile/tree/master/%E4%BB%8A%E6%97%A5%E5%A4%B4%E6%9D%A1)
* [视频下载](https://github.com/librauee/LuluHub)
### 抓包工具Fiddler
* [微信公众号文章](https://github.com/librauee/Reptile/tree/master/%E5%BE%AE%E4%BF%A1%E5%85%AC%E4%BC%97%E5%8F%B7)
* [英雄联盟盒子文章](https://github.com/librauee/Reptile/tree/master/%E8%8B%B1%E9%9B%84%E8%81%94%E7%9B%9F%E7%9B%92%E5%AD%90)
## 中级阶段### 字体反爬
* [猫眼电影](https://mp.weixin.qq.com/s/1aNU76w2m9vJWCcZTRpp_A)
* [汽车之家](https://mp.weixin.qq.com/s/zIDHQ1iRSElfV5PBAokFJw)
* [实习僧](https://mp.weixin.qq.com/s/3tyPmarn_gcsn78cSKgnAQ)
* [大众点评](https://mp.weixin.qq.com/s/q-lIhCcaCZR9L1m9r_Jmyw)### JS逆向
* [有道翻译](https://mp.weixin.qq.com/s/a-ORkG5XGSAP_-6GNilBbQ)
* [网易云音乐](https://mp.weixin.qq.com/s/prahlIq527XkirDE51jMjg)
* [阿里文学](https://mp.weixin.qq.com/s/7Z5qB8YG0oDI857N95Z0MQ)
* [梦幻西游藏宝阁](https://github.com/librauee/Reptile/tree/master/%E6%A2%A6%E5%B9%BB%E8%A5%BF%E6%B8%B8)
### Scrapy框架
* [豆瓣](https://mp.weixin.qq.com/s/FmZo2cjno1HrofWGiX4c-Q)
* [unsplash](https://mp.weixin.qq.com/s/mATihMoULt5wMYYuaJsq9A)
* [链家](https://github.com/librauee/Reptile/tree/master/%E9%93%BE%E5%AE%B6)
* [全国历史天气](https://github.com/librauee/Reptile/tree/master/%E5%85%A8%E5%9B%BD%E5%8E%86%E5%8F%B2%E5%A4%A9%E6%B0%94)
### 反调试
* [反调试问题](https://mp.weixin.qq.com/s/_09MQEhOP20cHIx7w_dFHw)### 验证码
* [知网字母验证码](https://github.com/librauee/Reptile/tree/master/%E7%9F%A5%E7%BD%91)
* [B站滑动验证码](https://github.com/librauee/Reptile/tree/master/Bilibili)***
| Number | Website | Article |
|:------:|:------:|:------:|
|1| [豆瓣](https://www.douban.com/) | [豆瓣电影排行榜](https://mp.weixin.qq.com/s/FmZo2cjno1HrofWGiX4c-Q) |
|2| [大学排名](http://www.zuihaodaxue.cn/) | |
|3| [微博](https://m.weibo.cn/) | |
|4| [研招网](https://yz.chsi.com.cn/) | [爬取研招网调剂信息](https://blog.csdn.net/lyc44813418/article/details/88739173) |
|5| [代理IP](https://www.kuaidaili.com/) | |
|6| [淘宝](https://www.taobao.com/) | |
|7| [股票](http://quote.eastmoney.com/stocklist.html) | |
|8| [猫眼](https://m.maoyan.com/) | [爬取豆瓣、猫眼流浪地球数万条评论信息](https://blog.csdn.net/lyc44813418/article/details/87522369) |
|9| [儿童故事](http://www.tom61.com/) | [给女友定时发送睡前小故事](https://blog.csdn.net/lyc44813418/article/details/88583021)|
|10| [CSDN](https://www.csdn.net/) | |
|11| [百度热点](http://top.baidu.com/) | |
|12| [笔趣阁](http://www.biqukan.com/) | |
|13| [腾讯视频](https://v.qq.com/) | [爬取腾讯视频电视剧弹幕](https://blog.csdn.net/lyc44813418/article/details/88930046) |
|14| [英文短文](http://www.zuihaodaxue.cn/) | |
|15| [公交信息](https://hangzhou.8684.cn/) | |
|16| [网易云阅读](http://yuedu.163.com/book/category/category/2100/2110/1_0_1) | |
|17| [今日头条](https://www.toutiao.com/search/?keyword=%E8%A1%97%E6%8B%8D) | |
|18| [网易云音乐](https://music.163.com/) | [JS逆向之网易云音乐](https://mp.weixin.qq.com/s/prahlIq527XkirDE51jMjg) |
|19| [拉勾](https://www.lagou.com/) | |
|20| [有道翻译](http://fanyi.youdao.com/) | [JS逆向初探之有道翻译](https://mp.weixin.qq.com/s/a-ORkG5XGSAP_-6GNilBbQ) |
|21| [阿里文学](https://www.aliwx.com.cn/) | [JS逆向之阿里文学](https://mp.weixin.qq.com/s/7Z5qB8YG0oDI857N95Z0MQ) |
|22| [unsplash](https://unsplash.com/) | [scrapy实战之unsplash](https://mp.weixin.qq.com/s/mATihMoULt5wMYYuaJsq9A) |
|23| 掌上英雄联盟 | [一键抓取掌盟文章](https://mp.weixin.qq.com/s/_EyBV6i7UG2aRS1D1nZ8-Q) |
|24| 微信公众号 | [批量下载文章](https://mp.weixin.qq.com/s/5toJ6hh5Pj8P82yjPXH32Q) |
|25| [链家](https://hz.lianjia.com/) | |
|26| [实习僧](https://www.shixiseng.com/) | [字体反爬之实习僧](https://mp.weixin.qq.com/s/3tyPmarn_gcsn78cSKgnAQ) |
|27| [汽车之家](https://www.autohome.com.cn/beijing/) | [字体反爬之汽车之家](https://mp.weixin.qq.com/s/zIDHQ1iRSElfV5PBAokFJw) |
|28| [大众点评](https://www.dianping.com/shop/563199) | [字体反爬之大众点评](https://mp.weixin.qq.com/s/q-lIhCcaCZR9L1m9r_Jmyw) |
|29| [阴阳师](https://yys.cbg.163.com/) | |
|30| [梦幻西游](https://xyq.cbg.163.com/) | |
|31| [台风](http://www.wztf121.com/) | |
|32| [全国历史天气](https://lishi.tianqi.com/) | |
|33| [牛客网](https://www.nowcoder.com/) | [Python爬取海量面经](https://mp.weixin.qq.com/s/5Q4--8KRBTrWwRLKPaMRkw) |
|34| [PentaQ电竞](https://data.pentaq.com/) | [Python爬取英雄联盟职业比赛数据](https://mp.weixin.qq.com/s/4ta-Irfa89ebG_ehyi9kjg) |
|35| [~~百度文库~~](https://wenku.baidu.com/) | 因不可抗力已删除 |
|36| [知乎](https://www.zhihu.com/) | [知乎海量表情包](https://mp.weixin.qq.com/s/GtxegoJd8uW9ZzYIqytPoQ)|
|37| [wish](https://www.wish.com/) | |