Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mouday/freeipproxy
通过抓取免费代理ip维护一个有效的proxy代理池
https://github.com/mouday/freeipproxy
crawler proxy python spider
Last synced: about 1 month ago
JSON representation
通过抓取免费代理ip维护一个有效的proxy代理池
- Host: GitHub
- URL: https://github.com/mouday/freeipproxy
- Owner: mouday
- Created: 2018-09-05T10:23:16.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T02:52:29.000Z (about 2 years ago)
- Last Synced: 2023-03-04T03:23:11.484Z (almost 2 years ago)
- Topics: crawler, proxy, python, spider
- Language: Python
- Size: 130 KB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 项目说明:
FreeIpProxy抓取公开代理ip,并采用评分制维护代理ip池,对外提供使用接口推荐使用python3
![](项目模块示意图.jpg)
## 文件目录
```
├── README.md
├── check.py 代理ip有效性检查
├── config.py 全局参数配置
├── main.py 程序入口
├── models.py 数据库交互
├── proxy.db ip存储,采用sqlite,可以在models模块中替换为其他数据库
├── proxy_source.py 代理ip抓取网站xpath配置
├── scheduler.py 定时任务调度,抓取代理ip和检测代理ip有效性
├── spider.py 抓取代理ip
└── utils.py 小工具```
## 采用flask提供接口:
(1)获取1个代理ip
GET http://127.0.0.1:5000/get
按照分数值score排序,返回最高的一个值
```
{
id: 127,
ip: "119.27.177.169",
port: "80",
style: "透明",
protocol: "HTTP",
address: "福建省厦门市",
score: 14,
source: "89ip代理",
md5: "9352df7efc296b1bdad2047e0c15b3a6",
create_time: "2018-09-05 15:32:49",
update_time: "2018-09-05 17:52:04"
}
```(2)获取多个代理ip
GET http://127.0.0.1:5000/get_list/2
最后一位为数量
```
[
{
id: 127,
ip: "119.27.177.169",
port: "80",
style: "透明",
protocol: "HTTP",
address: "福建省厦门市",
score: 14,
source: "89ip代理",
md5: "9352df7efc296b1bdad2047e0c15b3a6",
create_time: "2018-09-05 15:32:49",
update_time: "2018-09-05 17:52:04"
},
{
id: 122,
ip: "47.100.196.200",
port: "3128",
style: "透明",
protocol: "HTTP",
address: "上海市",
score: 11,
source: "89ip代理",
md5: "715de3a3237e2116e6c8a1a17f27278b",
create_time: "2018-09-05 15:32:49",
update_time: "2018-09-05 17:52:09"
}
]
```
(3)手动抓取GET http://127.0.0.1:5000/crawl
(4)手动检查
GET http://127.0.0.1:5000/check
(5)手动删除
GET http://127.0.0.1:5000/delete/8
最后一位为返回数据中的id