https://github.com/camera-2018/douban-playwright-python
https://github.com/camera-2018/douban-playwright-python
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/camera-2018/douban-playwright-python
- Owner: camera-2018
- Created: 2022-07-16T16:14:18.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2023-01-19T16:23:50.000Z (over 3 years ago)
- Last Synced: 2025-04-14T17:19:00.825Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 3.54 MB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Douban图书爬虫&&数据分析项目
文件结构
```bash
|- README.md # 本文件
|- 词频.png # 数据分析生成的简介词频图片
|- 科普图书数据.xlsx # 科普图书数据共1000条
|- 通信.csv # 通信图书数据共27条 (不包含简介)
|- 通信detail.csv # 通信图书数据共27条 (包含简介)
|- 通信detailxlsx.xlsx # 通信图书数据xlsx格式共27条 (包含简介)
|- analyse.ipynb # 数据分析ipynb文件
|- baidu_stopwords.txt # 百度停用词文件
|- bookdetail.py # 豆瓣图书详情页爬取(包含简介)
|- booklist.py # 豆瓣图书tag页爬取(不包含简介)
|- bookcomment.py # 浪潮之巅图书评论爬取
|- sentiment_classify.py # 情感分析
|- sentiment_analysis.csv # 情感分析数据导出
```
## 爬虫使用方法
安装
```bash
pip install pandas
pip install playwright
playwright install
```
## 数据分析使用方法
安装
```bash
pip install pandas
pip install matplotlib
pip install numpy
pip install jieba
pip install wordcloud
```