{"id":13539424,"url":"https://github.com/fate0/getproxy","last_synced_at":"2025-04-02T06:30:56.903Z","repository":{"id":21226659,"uuid":"91929475","full_name":"fate0/getproxy","owner":"fate0","description":"getproxy 是一个抓取发放代理网站，获取 http/https 代理的程序","archived":true,"fork":false,"pushed_at":"2022-08-02T08:36:49.000Z","size":6031,"stargazers_count":844,"open_issues_count":4,"forks_count":160,"subscribers_count":23,"default_branch":"master","last_synced_at":"2024-04-24T12:19:31.347Z","etag":null,"topics":["getproxy","proxy","proxy-checker","web-proxy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fate0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-21T02:57:49.000Z","updated_at":"2024-04-24T04:33:10.000Z","dependencies_parsed_at":"2022-08-07T09:16:38.600Z","dependency_job_id":null,"html_url":"https://github.com/fate0/getproxy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fate0%2Fgetproxy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fate0%2Fgetproxy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fate0%2Fgetproxy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fate0%2Fgetproxy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fate0","download_url":"https://codeload.github.com/fate0/getproxy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246767694,"owners_count":20830533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["getproxy","proxy","proxy-checker","web-proxy"],"created_at":"2024-08-01T09:01:25.681Z","updated_at":"2025-04-02T06:30:54.121Z","avatar_url":"https://github.com/fate0.png","language":"Python","readme":"# getproxy\n\n[![Build Status](https://travis-ci.org/fate0/getproxy.svg?branch=master)](https://travis-ci.org/fate0/getproxy)\n[![Updates](https://pyup.io/repos/github/fate0/getproxy/shield.svg)](https://pyup.io/repos/github/fate0/getproxy/)\n[![PyPI](https://img.shields.io/pypi/v/getproxy.svg)](https://pypi.python.org/pypi/getproxy)\n[![PyPI](https://img.shields.io/pypi/pyversions/getproxy.svg)](https://pypi.python.org/pypi/getproxy)\n\ngetproxy 是一个抓取发放代理网站，获取 http/https 代理的程序，\n每 15 min 会更新数据至 [fate0/proxylist](https://github.com/fate0/proxylist)\n\n\n## 1. 安装\n\n```\npip install -U getproxy\n```\n\n## 2. 使用\n\n### 帮助信息\n```\n➜  ~ getproxy --help\nUsage: getproxy [OPTIONS]\n\nOptions:\n  --in-proxy TEXT   Input proxy file\n  --out-proxy TEXT  Output proxy file\n  --help            Show this message and exit.\n```\n\n* `--in-proxy` 可选参数，待验证的 proxies 列表文件\n* `--out-proxy` 可选参数，输出已验证的 proxies 列表文件，如果为空，则直接输出到终端\n\n`--in-proxy` 文件格式和 `--out-proxy` 文件格式一致\n\n### 使用例子\n\n```\n(test2.7) ➜  ~ getproxy\nINFO:getproxy.getproxy:[*] Init\nINFO:getproxy.getproxy:[*] Current Ip Address: 1.1.1.1\nINFO:getproxy.getproxy:[*] Load input proxies\nINFO:getproxy.getproxy:[*] Validate input proxies\nINFO:getproxy.getproxy:[*] Load plugins\nINFO:getproxy.getproxy:[*] Grab proxies\nINFO:getproxy.getproxy:[*] Validate web proxies\nINFO:getproxy.getproxy:[*] Check 6666 proxies, Got 666 valid proxies\n\n...\n```\n\n\n## 3. 输入/返回格式\n\n每一行结果都是一个 json 字符串，格式如下:\n```json\n{\n    \"type\": \"http\",\n    \"host\": \"1.1.1.1\",\n    \"port\": 8080,\n    \"anonymity\": \"transparent\",\n    \"country\": \"CN\",\n    \"response_time\": 3.14,\n    \"from\": \"txt\"\n}\n```\n\n| 属性           | 类型    | 描述           | 可选值   |\n|-------        |--------|--------        |----------|\n| type          | str    | proxy 类型     | `http`, `https`|\n| host          | str    | proxy 地址     |                       |\n| port          | int    | 端口           |                       |\n| anonymity     | str    | 匿名性         | `transparent`, `anonymous`, `high_anonymous` |\n| country       | str    | proxy 国家     |               |\n| response_time | float  | 响应时间        |                |\n| from          | str    | 来源           |               |\n\n\n## 4. Plugin 相关\n\n### Plugin 代码格式\n\n``` python\n\nclass Proxy(object):\n    def __init__(self):\n        self.result = []\n        self.proxies = []\n\n    def start(self):\n        pass\n```\n\n### Plugin 返回结果\n\n```\n{\n    \"host\": \"1.1.1.1\",\n    \"port\": 8080,\n    \"from\": \"plugin name\"\n}\n```\n\n### Plugin 小提示\n\n* 不要在 plugin 内使用多线程、gevent 等方法\n* 如果目标网站存在分页，请在获取每页内容之后，自行添加 delay\n* 如果目标网站存在分页，请在获取每页结果之后，及时放入 `self.result` 中\n* 如果被目标网站 ban 了，可以利用已经验证的 proxies (也就是 `self.proxies`)\n\n## 5. 第三方程序调用\n\n直接运行 `getproxy` 等同于执行下面程序:\n\n``` python\n#! /usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom getproxy import GetProxy\n\ng = GetProxy()\n\n# 1. 初始化，必须步骤\ng.init()\n\n# 2. 加载 input proxies 列表\ng.load_input_proxies()\n\n# 3. 验证 input proxies 列表\ng.validate_input_proxies()\n\n# 4. 加载 plugin\ng.load_plugins()\n\n# 5. 抓取 web proxies 列表\ng.grab_web_proxies()\n\n# 6. 验证 web proxies 列表\ng.validate_web_proxies()\n\n# 7. 保存当前所有已验证的 proxies 列表\ng.save_proxies()\n\n```\n\n如果只想验证 proxies 列表，并不需要抓取别人的 proxies，则可以:\n\n``` python\ng.init()\ng.load_input_proxies()\ng.validate_input_proxies()\n\nprint(g.valid_proxies)\n```\n\n如果当前程序不需要输出 proxies 列表，而是在程序中直接使用，则可以:\n\n``` python\ng.init()\ng.load_plugins()\ng.grab_web_proxies()\ng.validate_web_proxies()\n\nprint(g.valid_proxies)\n```\n\n## 6. Q \u0026 A\n\n* 为什么不使用 xxx 数据库？\n\n数据量并不大，就算用文本格式全读进内存，也占用不了多少内存，就算真的需要存储至数据库，自己再多写几行代码就搞定。\n另外使用文本格式还有另外一个好处是可以创建这个项目 [fate0/proxylist](https://github.com/fate0/proxylist)\n\n* 和 xxx 有什么区别?\n\n简单、方便、快捷，除了 Python 环境，其他都不用设置。\n\n* 报错啦，怎么办?\n\n仔细看看错误信息，是不是一些 plugin 报错误，而且错误都是和网络相关的？\n如果是的话，可能这些 plugin 访问的网站由于众所周知的原因被 block 了。\n如果不是，赶紧提 Issue。\n\n* 还继续添加新的 plugin 吗？\n\n主要看这个项目 [fate0/proxylist](https://github.com/fate0/proxylist) 中的 `proxy.list` 数量，\n如果 `proxy.list` 行数接近 5000 个，那就不再继续添加新的 plugin，防止 travis 15min 内不结束。","funding_links":[],"categories":["\u003ca id=\"1a9934198e37d6d06b881705b863afc8\"\u003e\u003c/a\u003e通信\u0026\u0026代理\u0026\u0026反向代理\u0026\u0026隧道","\u003ca id=\"d03d494700077f6a65092985c06bf8e8\"\u003e\u003c/a\u003e工具","Python","Python (1887)"],"sub_categories":["\u003ca id=\"56acb7c49c828d4715dce57410d490d1\"\u003e\u003c/a\u003e未分类-Proxy","\u003ca id=\"b2241c68725526c88e69f1d71405c6b2\"\u003e\u003c/a\u003e代理爬取\u0026\u0026代理池"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffate0%2Fgetproxy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffate0%2Fgetproxy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffate0%2Fgetproxy/lists"}