https://github.com/xingag/spider_python
python爬虫
https://github.com/xingag/spider_python
bs4 python python3 requests scrapy urllib xpath
Last synced: about 1 month ago
JSON representation
python爬虫
- Host: GitHub
- URL: https://github.com/xingag/spider_python
- Owner: xingag
- License: apache-2.0
- Created: 2018-09-12T09:20:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-12-31T04:49:40.000Z (over 1 year ago)
- Last Synced: 2024-11-11T08:43:50.459Z (7 months ago)
- Topics: bs4, python, python3, requests, scrapy, urllib, xpath
- Language: Python
- Homepage:
- Size: 3.62 MB
- Stars: 979
- Watchers: 33
- Forks: 447
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spider_python
## 前言
如果想查看详细的教程,请关注微信公众号:**AirPython**

## 普通的爬虫
* [爬取电影天堂最新的电影数据 - xpath](./spiders/spider_dytt.py)
* [爬取腾讯招聘的职位数据 - xpath](./spiders/spider_tencent_recruit.py)
* [爬取中国天气网全国天气并生成饼状图 - bs4](./spiders/spider_china_weather.py)
* [爬取古诗词网的数据 - re](./spiders/spider_gushiwen.py)
* [爬取糗事百科上的段子数据 - re](./spiders/spider_qiu_shi_bai_ke.py)
## 多线程爬虫
* [多线程爬取斗图吧的表情图并下载到本地 - xpath + threading](./spiders/spider_dou_tu_la.py)
* [使用 itchat 发送表情到指定的人和微信群](./spiders/发表情/)
* [多线程爬取百思不得姐的文字和图片信息并写入到csv中](./spiders/spider_bai_si_bu_de_jie.py)## Selenium 自动化爬虫
* [爬取拉勾网的职位信息 - selenium + requests + lxml ](./spiders/spider_lagou.py)
* [爬取 Boss 直聘网的职位信息 - selenium + lxml](./spiders/spider_boss.py)
## Scrapy 框架爬虫
* [爬取糗事百科的段子保存到 JSON 文件中](./scrapy/qsbk/readme.MD)
* [爬取微信小程序论坛的数据](./scrapy/weixin_community/readme.MD)
* [登录豆瓣网并修改个性签名](./scrapy/douban_login/readme.MD)
* [下载汽车之家的高清图片到本地](./scrapy/qczj/readme.MD)
* [爬取简书网所有文章数据](./scrapy/jianshu_spider/)
* [爬取房天下所有房的数据,包含新房、二手房](./scrapy/sfw_spider)## feapder
* [feapder AirSpider实例](./feapder/tophub_demo)
## Node.js 爬虫
* [使用 puppeteer 爬取简书文章并保存到本地](./js/jian_shu.js)
## 其他
* [使用 Python 定位到女朋友的位置](./获取女友的位置)
* [女朋友背着我,用 Python 偷偷隐藏了她的行踪](./ModifyLocation)
* [微信群聊记录](./微信聊天记录)
* [Python 调用 JAR](./Python调用JAR)