{"id":13459230,"url":"https://github.com/jhao104/proxy_pool","last_synced_at":"2025-05-12T05:28:44.752Z","repository":{"id":37664908,"uuid":"74762106","full_name":"jhao104/proxy_pool","owner":"jhao104","description":"Python ProxyPool for web spider","archived":false,"fork":false,"pushed_at":"2025-02-13T05:35:58.000Z","size":540,"stargazers_count":22349,"open_issues_count":301,"forks_count":5290,"subscribers_count":447,"default_branch":"master","last_synced_at":"2025-05-12T02:43:42.900Z","etag":null,"topics":["crawler","http","proxy","redis","spider"],"latest_commit_sha":null,"homepage":"https://jhao104.github.io/proxy_pool/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jhao104.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-11-25T13:49:07.000Z","updated_at":"2025-05-12T02:32:40.000Z","dependencies_parsed_at":"2023-02-01T02:31:13.195Z","dependency_job_id":"a9ebeeec-58ee-4933-ab25-22e9401634c5","html_url":"https://github.com/jhao104/proxy_pool","commit_stats":{"total_commits":482,"total_committers":69,"mean_commits":"6.9855072463768115","dds":0.5477178423236515,"last_synced_commit":"183b2905433708ba6b94b6d9d8d364a6152f5d19"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhao104%2Fproxy_pool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhao104%2Fproxy_pool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhao104%2Fproxy_pool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhao104%2Fproxy_pool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jhao104","download_url":"https://codeload.github.com/jhao104/proxy_pool/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253672707,"owners_count":21945481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","http","proxy","redis","spider"],"created_at":"2024-07-31T09:01:11.326Z","updated_at":"2025-05-12T05:28:44.728Z","avatar_url":"https://github.com/jhao104.png","language":"Python","readme":"\nProxyPool 爬虫代理IP池\n=======\n[![Build Status](https://travis-ci.org/jhao104/proxy_pool.svg?branch=master)](https://travis-ci.org/jhao104/proxy_pool)\n[![](https://img.shields.io/badge/Powered%20by-@j_hao104-green.svg)](http://www.spiderpy.cn/blog/)\n[![Packagist](https://img.shields.io/packagist/l/doctrine/orm.svg)](https://github.com/jhao104/proxy_pool/blob/master/LICENSE)\n[![GitHub contributors](https://img.shields.io/github/contributors/jhao104/proxy_pool.svg)](https://github.com/jhao104/proxy_pool/graphs/contributors)\n[![](https://img.shields.io/badge/language-Python-green.svg)](https://github.com/jhao104/proxy_pool)\n\n    ______                        ______             _\n    | ___ \\_                      | ___ \\           | |\n    | |_/ / \\__ __   __  _ __   _ | |_/ /___   ___  | |\n    |  __/|  _// _ \\ \\ \\/ /| | | ||  __// _ \\ / _ \\ | |\n    | |   | | | (_) | \u003e  \u003c \\ |_| || |  | (_) | (_) || |___\n    \\_|   |_|  \\___/ /_/\\_\\ \\__  |\\_|   \\___/ \\___/ \\_____\\\n                           __ / /\n                          /___ /\n\n### ProxyPool\n\n爬虫代理IP池项目,主要功能为定时采集网上发布的免费代理验证入库，定时验证入库的代理保证代理的可用性，提供API和CLI两种使用方式。同时你也可以扩展代理源以增加代理池IP的质量和数量。\n\n* 文档: [document](https://proxy-pool.readthedocs.io/zh/latest/) [![Documentation Status](https://readthedocs.org/projects/proxy-pool/badge/?version=latest)](https://proxy-pool.readthedocs.io/zh/latest/?badge=latest)\n\n* 支持版本: [![](https://img.shields.io/badge/Python-2.7-green.svg)](https://docs.python.org/2.7/)\n[![](https://img.shields.io/badge/Python-3.5-blue.svg)](https://docs.python.org/3.5/)\n[![](https://img.shields.io/badge/Python-3.6-blue.svg)](https://docs.python.org/3.6/)\n[![](https://img.shields.io/badge/Python-3.7-blue.svg)](https://docs.python.org/3.7/)\n[![](https://img.shields.io/badge/Python-3.8-blue.svg)](https://docs.python.org/3.8/)\n[![](https://img.shields.io/badge/Python-3.9-blue.svg)](https://docs.python.org/3.9/)\n[![](https://img.shields.io/badge/Python-3.10-blue.svg)](https://docs.python.org/3.10/)\n[![](https://img.shields.io/badge/Python-3.11-blue.svg)](https://docs.python.org/3.11/)\n\n* 测试地址: http://demo.spiderpy.cn (勿压谢谢)\n\n* 付费代理推荐: [luminati-china](https://get.brightdata.com/github_jh). 国外的亮数据BrightData（以前叫luminati）被认为是代理市场领导者，覆盖全球的7200万IP，大部分是真人住宅IP，成功率扛扛的。付费套餐多种，需要高质量代理IP的可以注册后联系中文客服。[申请免费试用](https://get.brightdata.com/github_jh) 目前有50%折扣优惠活动。(PS:用不明白的同学可以参考这个[使用教程](https://www.cnblogs.com/jhao/p/15611785.html))。\n\n\n### 运行项目\n\n##### 下载代码:\n\n* git clone\n\n```bash\ngit clone git@github.com:jhao104/proxy_pool.git\n```\n\n* releases\n\n```bash\nhttps://github.com/jhao104/proxy_pool/releases 下载对应zip文件\n```\n\n##### 安装依赖:\n\n```bash\npip install -r requirements.txt\n```\n\n##### 更新配置:\n\n\n```python\n# setting.py 为项目配置文件\n\n# 配置API服务\n\nHOST = \"0.0.0.0\"               # IP\nPORT = 5000                    # 监听端口\n\n\n# 配置数据库\n\nDB_CONN = 'redis://:pwd@127.0.0.1:8888/0'\n\n\n# 配置 ProxyFetcher\n\nPROXY_FETCHER = [\n    \"freeProxy01\",      # 这里是启用的代理抓取方法名，所有fetch方法位于fetcher/proxyFetcher.py\n    \"freeProxy02\",\n    # ....\n]\n```\n\n#### 启动项目:\n\n```bash\n# 如果已经具备运行条件, 可用通过proxyPool.py启动。\n# 程序分为: schedule 调度程序 和 server Api服务\n\n# 启动调度程序\npython proxyPool.py schedule\n\n# 启动webApi服务\npython proxyPool.py server\n\n```\n\n### Docker Image\n\n```bash\ndocker pull jhao104/proxy_pool\n\ndocker run --env DB_CONN=redis://:password@ip:port/0 -p 5010:5010 jhao104/proxy_pool:latest\n```\n### docker-compose\n\n项目目录下运行: \n``` bash\ndocker-compose up -d\n```\n\n### 使用\n\n* Api\n\n启动web服务后, 默认配置下会开启 http://127.0.0.1:5010 的api接口服务:\n\n| api | method | Description | params|\n| ----| ---- | ---- | ----|\n| / | GET | api介绍 | None |\n| /get | GET | 随机获取一个代理| 可选参数: `?type=https` 过滤支持https的代理|\n| /pop | GET | 获取并删除一个代理| 可选参数: `?type=https` 过滤支持https的代理|\n| /all | GET | 获取所有代理 |可选参数: `?type=https` 过滤支持https的代理|\n| /count | GET | 查看代理数量 |None|\n| /delete | GET | 删除代理  |`?proxy=host:ip`|\n\n\n* 爬虫使用\n\n　　如果要在爬虫代码中使用的话， 可以将此api封装成函数直接使用，例如：\n\n```python\nimport requests\n\ndef get_proxy():\n    return requests.get(\"http://127.0.0.1:5010/get/\").json()\n\ndef delete_proxy(proxy):\n    requests.get(\"http://127.0.0.1:5010/delete/?proxy={}\".format(proxy))\n\n# your spider code\n\ndef getHtml():\n    # ....\n    retry_count = 5\n    proxy = get_proxy().get(\"proxy\")\n    while retry_count \u003e 0:\n        try:\n            html = requests.get('http://www.example.com', proxies={\"http\": \"http://{}\".format(proxy)})\n            # 使用代理访问\n            return html\n        except Exception:\n            retry_count -= 1\n    # 删除代理池中代理\n    delete_proxy(proxy)\n    return None\n```\n\n### 扩展代理\n\n　　项目默认包含几个免费的代理获取源，但是免费的毕竟质量有限，所以如果直接运行可能拿到的代理质量不理想。所以，提供了代理获取的扩展方法。\n\n　　添加一个新的代理源方法如下:\n\n* 1、首先在[ProxyFetcher](https://github.com/jhao104/proxy_pool/blob/1a3666283806a22ef287fba1a8efab7b94e94bac/fetcher/proxyFetcher.py#L21)类中添加自定义的获取代理的静态方法，\n该方法需要以生成器(yield)形式返回`host:ip`格式的代理，例如:\n\n```python\n\nclass ProxyFetcher(object):\n    # ....\n\n    # 自定义代理源获取方法\n    @staticmethod\n    def freeProxyCustom1():  # 命名不和已有重复即可\n\n        # 通过某网站或者某接口或某数据库获取代理\n        # 假设你已经拿到了一个代理列表\n        proxies = [\"x.x.x.x:3128\", \"x.x.x.x:80\"]\n        for proxy in proxies:\n            yield proxy\n        # 确保每个proxy都是 host:ip正确的格式返回\n```\n\n* 2、添加好方法后，修改[setting.py](https://github.com/jhao104/proxy_pool/blob/1a3666283806a22ef287fba1a8efab7b94e94bac/setting.py#L47)文件中的`PROXY_FETCHER`项：\n\n　　在`PROXY_FETCHER`下添加自定义方法的名字:\n\n```python\nPROXY_FETCHER = [\n    \"freeProxy01\",    \n    \"freeProxy02\",\n    # ....\n    \"freeProxyCustom1\"  #  # 确保名字和你添加方法名字一致\n]\n```\n\n\n　　`schedule` 进程会每隔一段时间抓取一次代理，下次抓取时会自动识别调用你定义的方法。\n\n### 免费代理源\n\n   目前实现的采集免费代理网站有(排名不分先后, 下面仅是对其发布的免费代理情况, 付费代理测评可以参考[这里](https://zhuanlan.zhihu.com/p/33576641)): \n   \n  | 代理名称          |  状态  |  更新速度 |  可用率  |  地址 | 代码                                             |\n  |---------------|  ---- | --------  | ------  | ----- |------------------------------------------------|\n  | 站大爷           |  ✔    |     ★     |   **     | [地址](https://www.zdaye.com/)    | [`freeProxy01`](/fetcher/proxyFetcher.py#L28)  |\n  | 66代理          |  ✔    |     ★     |   *     | [地址](http://www.66ip.cn/)         | [`freeProxy02`](/fetcher/proxyFetcher.py#L50)  |\n  | 开心代理          |   ✔   |     ★     |   *     | [地址](http://www.kxdaili.com/)     | [`freeProxy03`](/fetcher/proxyFetcher.py#L63)  |\n  | FreeProxyList |   ✔  |    ★     |   *    | [地址](https://www.freeproxylists.net/zh/) | [`freeProxy04`](/fetcher/proxyFetcher.py#L74)  |\n  | 快代理           |  ✔    |     ★     |   *     | [地址](https://www.kuaidaili.com/)  | [`freeProxy05`](/fetcher/proxyFetcher.py#L92)  |\n  | 冰凌代理          |  ✔    |    ★★★    |   *     | [地址](https://www.binglx.cn/) | [`freeProxy06`](/fetcher/proxyFetcher.py#L111) |\n  | 云代理           |  ✔    |    ★     |   *     | [地址](http://www.ip3366.net/)      | [`freeProxy07`](/fetcher/proxyFetcher.py#L123) |\n  | 小幻代理          |  ✔    |    ★★    |    *    | [地址](https://ip.ihuan.me/)        | [`freeProxy08`](/fetcher/proxyFetcher.py#L133) |\n  | 免费代理库         |  ✔    |     ☆     |    *    | [地址](http://ip.jiangxianli.com/)   | [`freeProxy09`](/fetcher/proxyFetcher.py#L143) |\n  | 89代理          |  ✔    |     ☆     |   *     | [地址](https://www.89ip.cn/)         | [`freeProxy10`](/fetcher/proxyFetcher.py#L154) |\n  | 稻壳代理          |  ✔    |     ★★    |   ***   | [地址](https://www.docip.ne)         | [`freeProxy11`](/fetcher/proxyFetcher.py#L164) |\n\n  \n  如果还有其他好的免费代理网站, 可以在提交在[issues](https://github.com/jhao104/proxy_pool/issues/71), 下次更新时会考虑在项目中支持。\n\n### 问题反馈\n\n　　任何问题欢迎在[Issues](https://github.com/jhao104/proxy_pool/issues) 中反馈，同时也可以到我的[博客](http://www.spiderpy.cn/blog/message)中留言。\n\n　　你的反馈会让此项目变得更加完美。\n\n### 贡献代码\n\n　　本项目仅作为基本的通用的代理池架构，不接收特有功能(当然,不限于特别好的idea)。\n\n　　本项目依然不够完善，如果发现bug或有新的功能添加，请在[Issues](https://github.com/jhao104/proxy_pool/issues)中提交bug(或新功能)描述，我会尽力改进，使她更加完美。\n\n　　这里感谢以下contributor的无私奉献：\n\n　　[@kangnwh](https://github.com/kangnwh) | [@bobobo80](https://github.com/bobobo80) | [@halleywj](https://github.com/halleywj) | [@newlyedward](https://github.com/newlyedward) | [@wang-ye](https://github.com/wang-ye) | [@gladmo](https://github.com/gladmo) | [@bernieyangmh](https://github.com/bernieyangmh) | [@PythonYXY](https://github.com/PythonYXY) | [@zuijiawoniu](https://github.com/zuijiawoniu) | [@netAir](https://github.com/netAir) | [@scil](https://github.com/scil) | [@tangrela](https://github.com/tangrela) | [@highroom](https://github.com/highroom) | [@luocaodan](https://github.com/luocaodan) | [@vc5](https://github.com/vc5) | [@1again](https://github.com/1again) | [@obaiyan](https://github.com/obaiyan) | [@zsbh](https://github.com/zsbh) | [@jiannanya](https://github.com/jiannanya) | [@Jerry12228](https://github.com/Jerry12228)\n\n\n### Release Notes\n\n   [changelog](https://github.com/jhao104/proxy_pool/blob/master/docs/changelog.rst)\n\n\u003ca href=\"https://hellogithub.com/repository/92a066e658d147cc8bd8397a1cb88183\" target=\"_blank\"\u003e\u003cimg src=\"https://api.hellogithub.com/v1/widgets/recommend.svg?rid=92a066e658d147cc8bd8397a1cb88183\u0026claim_uid=DR60NequsjP54Lc\" alt=\"Featured｜HelloGitHub\" style=\"width: 250px; height: 54px;\" width=\"250\" height=\"54\" /\u003e\u003c/a\u003e\n","funding_links":[],"categories":["Python","\u003ca id=\"d03d494700077f6a65092985c06bf8e8\"\u003e\u003c/a\u003e工具","HarmonyOS","Python (1887)"],"sub_categories":["\u003ca id=\"b2241c68725526c88e69f1d71405c6b2\"\u003e\u003c/a\u003e代理爬取\u0026\u0026代理池","Windows Manager"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhao104%2Fproxy_pool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjhao104%2Fproxy_pool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhao104%2Fproxy_pool/lists"}