{"id":13776045,"url":"https://github.com/awolfly9/ipproxytool","last_synced_at":"2025-10-18T14:51:58.001Z","repository":{"id":37431529,"uuid":"75624435","full_name":"awolfly9/IPProxyTool","owner":"awolfly9","description":"python ip proxy tool  scrapy crawl. 抓取大量免费代理 ip，提取有效 ip 使用","archived":false,"fork":false,"pushed_at":"2022-12-08T07:42:07.000Z","size":290,"stargazers_count":1995,"open_issues_count":12,"forks_count":415,"subscribers_count":74,"default_branch":"master","last_synced_at":"2025-05-22T13:30:15.955Z","etag":null,"topics":["ipproxy","proxy","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/awolfly9.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-12-05T12:52:21.000Z","updated_at":"2025-05-21T06:08:30.000Z","dependencies_parsed_at":"2023-01-25T09:01:14.783Z","dependency_job_id":null,"html_url":"https://github.com/awolfly9/IPProxyTool","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/awolfly9/IPProxyTool","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awolfly9%2FIPProxyTool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awolfly9%2FIPProxyTool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awolfly9%2FIPProxyTool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awolfly9%2FIPProxyTool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/awolfly9","download_url":"https://codeload.github.com/awolfly9/IPProxyTool/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awolfly9%2FIPProxyTool/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279552896,"owners_count":26189905,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-18T02:00:06.492Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ipproxy","proxy","python"],"created_at":"2024-08-03T17:01:58.508Z","updated_at":"2025-10-18T14:51:57.974Z","avatar_url":"https://github.com/awolfly9.png","language":"Python","readme":"# IPProxyTool\n使用 scrapy 爬虫抓取代理网站，获取大量的免费代理 ip。过滤出所有可用的 ip，存入数据库以备使用。\n可以访问我的个人站点，查看我的更多有趣项目 [西瓜](http://xigua233.com/)\n\n感谢 [youngjeff](https://github.com/youngjeff) 和我一起维护该项目\n\n## 运行环境\n安装 python3  and mysql 数据库\n\ncryptography模块安装环境:\n```\nsudo yum install gcc libffi-devel python-devel openssl-devel\n```\n\n\n```\n$ pip install -r requirements.txt\n```\n\n\n\n## 下载使用\n将项目克隆到本地\n\n```\n$ git clone https://github.com/awolfly9/IPProxyTool.git\n```\n\n进入工程目录\n\n```\n$ cd IPProxyTool\n```\n修改 mysql 数据库配置 [config.py](https://github.com/awolfly9/IPProxyTool/blob/master/config.py) 中 database_config 的用户名和密码为数据库的用户名和密码\n\n```\n$ vim config.py\n---------------\n\ndatabase_config = {\n\t'host': 'localhost',\n\t'port': 3306,\n\t'user': 'root',\n\t'password': '123456',\n\t'charset': 'utf8',\n}\n```\n\nMYSQL: 导入数据表结构\n```\n$ mysql\u003e create database ipproxy;\nQuery OK, 1 row affected (0.00 sec)\n$ mysql\u003e use ipproxy;\nDatabase changed\n$ mysql\u003e source '/你的项目目录/db.sql'\n\n```\n\n\n运行启动脚本 ipproxytool.py 也可以分别运行抓取，验证，服务器接口脚本，运行方法参考项目说明\n\n```\n$ python ipproxytool.py \n```\n\n新增异步验证方式，运行方法如下\n\n```\n$ python ipproxytool.py async\n```\n\u003cbr\u003e\n\n## 项目说明\n#### 抓取代理网站\n所有抓取代理网站的代码都在 [proxy](https://github.com/awolfly9/IPProxyTool/tree/master/ipproxytool/spiders/proxy)\u003cbr/\u003e\n##### 扩展抓取其他的代理网站\n1.在 proxy 目录下新建脚本并继承自 BaseSpider \u003cbr/\u003e\n2.设置 name、urls、headers\u003cbr/\u003e\n3.重写 parse_page 方法，提取代理数据\u003cbr/\u003e\n4.将数据存入数据库 具体可以参考 [ip181](https://github.com/awolfly9/IPProxyTool/blob/master/ipproxytool/spiders/proxy/ip181.py)                 [kuaidaili](https://github.com/awolfly9/IPProxyTool/blob/master/ipproxytool/spiders/proxy/kuaidaili.py)\u003cbr/\u003e\n5.如果需要抓取特别复杂的代理网站，可以参考[peuland](https://github.com/awolfly9/IPProxyTool/blob/master/ipproxytool/spiders/proxy/peuland.py)\u003cbr/\u003e\n\n##### 修改 run_crawl_proxy.py 导入抓取库，添加到抓取队列\n\n可以单独运行 run_crawl_proxy.py 脚本开始抓取代理网站\n\n```\n$ python run_crawl_proxy.py\n```\n\n#### 验证代理 ip 是否有效\n目前验证方式：\u003cbr\u003e\n1.从上一步抓取并存储的数据库中取出所有的代理 IP \u003cbr\u003e\n2.利用取出的代理 IP 去请求 [httpbin](http://httpbin.org/get?show_env=1)\u003cbr\u003e\n3.根据请求结果判断出代理 IP 的有效性，是否支持 HTTPS 以及匿名度，并存储到表 httpbin 中\u003cbr\u003e\n4.从 httpbin 表中取出代理去访问目标网站，例如 [豆瓣](https://www.douban.com/)\u003cbr\u003e\n5.如果请求在合适的时间返回成功的数据，则认为这个代理 IP 有效。并且存入相应的表中\u003cbr\u003e\n\n一个目标网站对应一个脚本，所有验证代理 ip 的代码都在 [validator](https://github.com/awolfly9/IPProxyTool/tree/master/ipproxytool/spiders/validator)\n##### 扩展验证其他网站\n1.在 validator 目录下新建脚本并继承 Validator \u003cbr\u003e\n2.设置 name、timeout、urls、headers \u003cbr\u003e\n3.然后调用 init 方法,可以参考 [baidu](https://github.com/awolfly9/IPProxyTool/blob/master/ipproxytool/spiders/validator/baidu.py) [douban](https://github.com/awolfly9/IPProxyTool/blob/master/ipproxytool/spiders/validator/douban.py)\u003cbr\u003e\n4.如果需要特别复杂的验证方式，可以参考 [assetstore](https://github.com/awolfly9/IPProxyTool/blob/master/ipproxytool/spiders/validator/assetstore.py)\u003cbr\u003e\n##### 修改 run_validator.py 导入验证库，添加到验证队列\n可以单独运行 run_validator.py 开始验证代理ip的有效性\n\n```\n$ python run_validator.py\n```\n\n### 获取代理 ip 数据服务器接口\n在 config.py 中修改启动服务器端口配置 data_port，默认为 8000\n启动服务器\n\n```\n$ python run_server.py\n```\n\n服务器提供接口\n#### 获取\n\u003chttp://127.0.0.1:8000/select?name=httpbin\u0026anonymity=1\u0026https=yes\u0026order=id\u0026sort=desc\u0026count=100\u003e\n\n参数\n\n| Name    | Type   | Description   | must |\n| ----    | ----   | ----          | ---- |\n| name    | str    | 数据库名称      | 是   |\n| anonymity | int  | 1:高匿 2:匿名 3:透明 | 否 |\n| https     | str  | https:yes http:no  | 否 |\n| order     | str  | table 字段  | 否 |\n| sort      | str | asc 升序，desc 降序 | 否 |\n| count | int | 获取代理数量，默认 100 | 否 |\n\n\n\n\n#### 删除\n\u003chttp://127.0.0.1:8000/delete?name=httpbin\u0026ip=27.197.144.181\u003e\n\n参数\n\n| Name | Type | Description | 是否必须|\n| ----| ---- | ---- | --- |\n| name | str | 数据库名称 |  是 |\n| ip | str | 需要删除的 ip | 是 |\n\n#### 插入\n\u003chttp://127.0.0.1:8000/insert?name=httpbin\u0026ip=555.22.22.55\u0026port=335\u0026country=%E4%B8%AD%E5%9B%BD\u0026anonymity=1\u0026https=yes\u0026speed=5\u0026source=100\u003e\n\n参数\n\n| Name | Type | Description | 是否必须|\n| ----| ---- | ---- | ----|\n| name | str | 数据库名称 |是 |\n| ip | str | ip 地址 | 是|\n| port | str | 端口 |是|\n| country | str | 国家 |否|\n| anonymity | int | 1:高匿,2:匿名,3:透明  |否|\n| https | str | yes:https,no:http |否|\n| speed | float | 访问速度 |否|\n| source | str | ip 来源 |否|\n\n\n## TODO\n* 添加多数据库支持\n  * mysql\n  * redis TODO...\n  * sqlite TODO...\n* 添加抓取更多免费代理网站，目前支持的抓取的免费代理 IP 站点，目前有一些国外的站点连接不稳定\n  * (国外) \u003chttp://www.freeproxylists.net/\u003e\n  * (国外) \u003chttp://gatherproxy.com/\u003e\n  * (国内) \u003chttps://hidemy.name/en/proxy-list/\u003e\n  * (国内) \u003chttp://www.ip181.com/\u003e\n  * (国内) \u003chttp://www.kuaidaili.com/\u003e\n  * (国外) \u003chttps://proxy.peuland.com/proxy_list_by_category.htm\u003e\n  * (国外) \u003chttps://list.proxylistplus.com/\u003e\n  * (国内) \u003chttp://m.66ip.cn\u003e\n  * (国外) \u003chttp://www.us-proxy.org/\u003e\n  * (国内) \u003chttp://www.xicidaili.com\u003e\n* 分布式部署项目\n* ~~添加服务器获取接口更多筛选条件~~\n* ~~多进程验证代理 IP~~\n* ~~添加 https 支持~~\n* ~~添加检测 ip 的匿名度~~\n\n\n## 参考\n* [IPProxyPool](https://github.com/qiyeboy/IPProxyPool)\n\n\n## 项目更新\n-----------------------------2020-12-29----------------------------\u003cbr\u003e\n1. 修改之前错误的路径命名\n2. 修改mysql 表结构\n\u003cbr\u003e\n-----------------------------2017-6-23----------------------------\u003cbr\u003e\n1.python2 -\u003e python3\u003cbr\u003e\n2.web.py -\u003e flask\u003cbr\u003e\n\u003cbr\u003e\n-----------------------------2017-5-17----------------------------\u003cbr\u003e\n1.本系统在原来的基础上加入了docker。操作见下方，关于docker的相关知识可以上官网看看http://www.docker.com.\u003cbr\u003e\n\u003cbr\u003e\n-----------------------------2017-3-30----------------------------\u003cbr\u003e\n1.修改完善 readme\u003cbr\u003e\n2.数据插入支持事务\u003cbr\u003e\n\u003cbr\u003e\n-----------------------------2017-3-14----------------------------\u003cbr\u003e\n1.更改服务器接口，添加排序方式\u003cbr\u003e\n2.添加多进程方式验证代理 ip 的有效性\u003cbr\u003e\n\u003cbr\u003e\n-----------------------------2017-2-20----------------------------\u003cbr\u003e\n1.添加服务器获取接口更多筛选条件\u003cbr\u003e\n\u003cbr\u003e\n\n-----------------------------2017-2-16----------------------------\u003cbr\u003e\n1.验证代理 IP 的匿名度\u003cbr\u003e\n2.验证代理 IP HTTPS 支持\u003cbr\u003e\n3.添加 httpbin 验证并发数设置，默认为 4\n\n\n\n\n\n\n\n\n\n\n\n\n## 在系统中安装docker就可以使用本程序：\n\n下载本程序\n```\ngit clone https://github.com/awolfly9/IPProxyTool\n```\n\n然后进入目录：\n```\ncd IPProxyTool\n```\n\n创建镜像：\n```\ndocker build -t proxy .\n```\n\n运行容器：\n```\ndocker run -it proxy\n```\n\n## 在config.py中按照自己的需求修改配置信息\n```\ndatabase_config = {\n    'host': 'localhost',\n    'port': 3306,\n    'user': 'root',\n    'password': 'root',\n    'charset': 'utf8',\n}\n```\n","funding_links":[],"categories":["\u003ca id=\"d03d494700077f6a65092985c06bf8e8\"\u003e\u003c/a\u003e工具"],"sub_categories":["\u003ca id=\"b2241c68725526c88e69f1d71405c6b2\"\u003e\u003c/a\u003e代理爬取\u0026\u0026代理池"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawolfly9%2Fipproxytool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fawolfly9%2Fipproxytool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawolfly9%2Fipproxytool/lists"}