{"id":13776046,"url":"https://github.com/tidesec/proxy_pool","last_synced_at":"2025-04-07T12:08:24.661Z","repository":{"id":38326175,"uuid":"182956424","full_name":"TideSec/Proxy_Pool","owner":"TideSec","description":"Proxy_Pool（代理资源池），一个小巧的代理ip抓取+评估+存储+展示的一体化的工具，包括了web展示和接口。","archived":false,"fork":false,"pushed_at":"2020-06-22T05:23:32.000Z","size":1130,"stargazers_count":322,"open_issues_count":4,"forks_count":121,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-03-31T10:11:09.891Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TideSec.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-23T07:14:10.000Z","updated_at":"2025-03-24T02:41:05.000Z","dependencies_parsed_at":"2022-07-12T17:24:48.051Z","dependency_job_id":null,"html_url":"https://github.com/TideSec/Proxy_Pool","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TideSec%2FProxy_Pool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TideSec%2FProxy_Pool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TideSec%2FProxy_Pool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TideSec%2FProxy_Pool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TideSec","download_url":"https://codeload.github.com/TideSec/Proxy_Pool/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247648978,"owners_count":20972945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T17:01:58.604Z","updated_at":"2025-04-07T12:08:24.633Z","avatar_url":"https://github.com/TideSec.png","language":"JavaScript","readme":"# Proxy_Pool\n\nProxy_Pool，一个小巧的代理ip抓取+评估+存储+展示的一体化的工具，可自动化的搜集检测可用代理并进行评分，并添加了web展示和接口。\n\n# 安装\n\n1、从GitHub上脱下来，把代码放在web目录下。\n\n```\ngit clone https://github.com/TideSec/Proxy_Pool\n\n```\nweb服务器在unix/linux下可以用`https://github.com/teddysun/lamp`进行快速安装。\n\n在windows下可以用[phpstudy](http://phpstudy.php.cn/)进行快速部署。\n\n2、在mysql中新建数据库proxy，将proxy.sql文件导入，在include/config.inc.php中修改数据库密码。\n\n3、此时本机访问http://ip:port，应该可以看到代理web展示界面\n\n4、安装python2依赖库\n\n```\npip install lxml\npip install requests\npip install pymysql\n```\n5、在py_proxy_task/config.py文件中配置数据库连接信息及其他参数。\n\n# 使用\n\n在py_proxy_task目录下有`proxy_get.py`和`proxy_check.py`两个程序，前者负责每天抓ip存进数据库，后者负责数据库中ip的清理和评估。\n\n```bash\npython proxy_get.py\n# 等待上述程序抓取完结果后再运行评测程序\npython proxy_check.py\n```\n之后按默认配置，这两个程序每天分别执行抓取和评估工作，放服务器上长期运行即可。\n\n\n# 简介\n\n原作者代码在这里：`https://github.com/chungminglu/Proxy`\n\n我对部分代码进行了修改，完善了部分提取代理的解析代码，并加入了web展示和web接口，方便其他程序调用。\n\nweb页面我是从我的另外一个扫描器上改过来的`https://github.com/TideSec/WDScanner/`，里面可能有部分无用代码没有删除。\n\n程序的几个功能：\n\n1、每天从多个代理ip网站上抓下最新高匿ip数据。\n\n2、经过筛选后的ip将存入数据库。\n\n3、存入数据库的ip每天也要经过测试，存在剔除、评分机制，多次不合格的ip将被删除，每个ip都被评分，我们最终可以按得分排名获得稳定、低响应时间的优质ip。\t\n   \nweb展示如下图所示：\n\u003cdiv align=center\u003e\u003cimg src=images/001.png \u003e\u003c/div\u003e\n\nweb接口如下图所示：\n\u003cdiv align=center\u003e\u003cimg src=images/002.png \u003e\u003c/div\u003e\n\n# 参数设置\n\n在py_proxy_task/config.py文件可进行代理评估参数的设置。\n\n```python\nUSELESS_TIME = 4   # 最大失效次数\nSUCCESS_RATE = 0.8\nTIME_OUT_PENALTY = 10  # 超时惩罚时间\nCHECK_TIME_INTERVAL = 24*3600  # 每天更新一次\n```\n除数据库配置参数外，主要用到的几个参数说明如下：\n\n* ```USELESS_TIME```和```SUCCESS_RATE```是配合使用的，当某个```ip```的```USELESS_TIME \u003c 4 \u0026\u0026 SUCCESS_RATE \u003c 0.8```时（同时兼顾到ip短期和长期的检测表现），则剔除该ip。\n* ```TIME_OUT_PENALTY```， 当某个ip在某次检测时失效，而又没有达到上一条的条件时（比如检测了100次后第一次出现超时），设置一个```response_time```的惩罚项，此处为10秒。\n* ```CHECK_TIME_INTERVAL```， 检测周期。此处设置为每隔12小时检测一次数据库里每一个ip的可用性。\n\n# 策略\n\n* 每天如下5个代理ip网站上抓下最新高匿ip数据：\n  * ```mimi```\n  * ```66ip```\n  * ```xici```\n  * ```cn-proxy```\n  * ```kuaidaili```\n* N轮筛选\n  * 收集到的ip集合将经过N轮，间隔为t的连接测试，对于每一个ip，必须全部通过这N轮测试才能最终进入数据库。如果当天进入数据库的ip较少，则暂停一段时间（一天）再抓。\n\n* 数据库中ip评价准则\n  * 检测过程中累计超时次数\u003e```USELESS_TIME```\u0026\u0026成功率\u003c```SUCCESS_RATE```就被剔除。  \n  ```score = (success_rate + test_times / 500) / avg_response_time```  \n  原来的考虑是```score = success_rate / avg_response_time```, 即：评分=成功率/平均响应时间， 考虑到检测合格过100次的老ip比新ip更有价值，检测次数也被引入评分。\n\n\n\n# 关注我们\n\n\n**TideSec安全团队：**\n\nTide安全团队正式成立于2019年1月，是以互联网攻防技术研究为目标的安全团队，目前聚集了十多位专业的安全攻防技术研究人员，专注于网络攻防、Web安全、移动终端、安全开发、IoT/物联网/工控安全等方向。\n\n想了解更多Tide安全团队，请关注团队官网: http://www.TideSec.net 或关注公众号：\n\n\u003cdiv align=center\u003e\u003cimg src=images/ewm.png width=30% \u003e\u003c/div\u003e\n\n\n\n","funding_links":[],"categories":["\u003ca id=\"d03d494700077f6a65092985c06bf8e8\"\u003e\u003c/a\u003e工具"],"sub_categories":["\u003ca id=\"b2241c68725526c88e69f1d71405c6b2\"\u003e\u003c/a\u003e代理爬取\u0026\u0026代理池"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidesec%2Fproxy_pool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftidesec%2Fproxy_pool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidesec%2Fproxy_pool/lists"}