https://github.com/elliotxx/readnewspaper
自动获取电子版报纸,方便每天阅读
https://github.com/elliotxx/readnewspaper
crawler lxml newspaper pypdf2 python requests
Last synced: 8 months ago
JSON representation
自动获取电子版报纸,方便每天阅读
- Host: GitHub
- URL: https://github.com/elliotxx/readnewspaper
- Owner: elliotxx
- License: gpl-2.0
- Created: 2018-03-20T03:50:29.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-11-11T07:55:35.000Z (about 6 years ago)
- Last Synced: 2025-04-12T23:44:30.447Z (8 months ago)
- Topics: crawler, lxml, newspaper, pypdf2, python, requests
- Language: Python
- Size: 17.6 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## readNewspaper
自动获取电子版报纸,方便每天阅读
## 使用方式
```python
python BandaoNewspaper.py
```
## 目前可以获取的报纸
* 《半岛都市报》
报纸首页:http://bddsb.bandao.cn/
## 特性
* [x] 自动合并PDF
* [x] 代理IP池
* [ ] 运行脚本时自动获取代理IP池
* [ ] 通过参数选择对应日期的报纸
* [ ] 每天定时检查是否有最新报纸,如果有,生成pdf发邮件提醒
## 依赖
* PyPDF2
* requests
* lxml
## 参考资料
* 在windows下安装PyPdf2,将文件夹中的pdf文件合成为一个pdf文件
http://blog.csdn.net/andy_blogs/article/details/78041679
* Python SMTP 发送带附件电子邮件
https://blog.csdn.net/zm2714/article/details/7993732
* python+smtp发送邮件附件问题
https://segmentfault.com/q/1010000009102883