{"id":13539410,"url":"https://github.com/henson/proxypool","last_synced_at":"2025-05-15T16:05:08.641Z","repository":{"id":49850417,"uuid":"86663265","full_name":"henson/proxypool","owner":"henson","description":"Golang实现的IP代理池","archived":false,"fork":false,"pushed_at":"2023-09-04T15:16:14.000Z","size":309,"stargazers_count":1657,"open_issues_count":38,"forks_count":342,"subscribers_count":52,"default_branch":"master","last_synced_at":"2025-04-07T21:12:58.265Z","etag":null,"topics":["go","ip","proxypool"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/henson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-03-30T05:39:43.000Z","updated_at":"2025-03-31T04:28:49.000Z","dependencies_parsed_at":"2024-01-07T22:48:26.808Z","dependency_job_id":"96f8996e-df21-4195-a6e8-c8270b092ca2","html_url":"https://github.com/henson/proxypool","commit_stats":{"total_commits":70,"total_committers":6,"mean_commits":"11.666666666666666","dds":0.4,"last_synced_commit":"faa63c3af5c5f6d3c2fbeb1c490181f642c0dbb8"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henson%2Fproxypool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henson%2Fproxypool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henson%2Fproxypool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henson%2Fproxypool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/henson","download_url":"https://codeload.github.com/henson/proxypool/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254374410,"owners_count":22060610,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","ip","proxypool"],"created_at":"2024-08-01T09:01:25.333Z","updated_at":"2025-05-15T16:05:08.621Z","avatar_url":"https://github.com/henson.png","language":"Go","funding_links":[],"categories":["开源类库","\u003ca id=\"1a9934198e37d6d06b881705b863afc8\"\u003e\u003c/a\u003e通信\u0026\u0026代理\u0026\u0026反向代理\u0026\u0026隧道","Repositories","Go","Open source library","\u003ca id=\"d03d494700077f6a65092985c06bf8e8\"\u003e\u003c/a\u003e工具"],"sub_categories":["网络","\u003ca id=\"56acb7c49c828d4715dce57410d490d1\"\u003e\u003c/a\u003e未分类-Proxy","The Internet","\u003ca id=\"b2241c68725526c88e69f1d71405c6b2\"\u003e\u003c/a\u003e代理爬取\u0026\u0026代理池"],"readme":"# Golang实现的IP代理池\n\n\u003e 采集免费的代理资源为爬虫提供有效的IP代理\n\n[![Travis Status for henson/proxypool](https://travis-ci.org/henson/proxypool.svg?branch=master)](https://travis-ci.org/henson/proxypool) [![Go Report Card](https://goreportcard.com/badge/github.com/henson/proxypool)](https://goreportcard.com/report/github.com/henson/proxypool)\n\n## 版本更新\n- 2019年12月18日 v2.4 感谢 [@sndnvaps](https://github.com/sndnvaps)\n  - 添加代理网站 ip3306, plp-ssl 两个\n  - 更新 数据库结构，添加创建时间和更新时间\n  - 更新 ip.go当中的Update(), x.Id()方法将会被x.ID()代替，所以直接更换为x.ID()\n  - 更新 在插入数据时先检查库中是否存在，如果存在就更新，不存在则插入到数据库\n  - 更新 https proxy api的相关参数\n- 2019年3月28日 v2.3 感谢 [@sndnvaps](https://github.com/sndnvaps)\n  - 修复数据库中不存在https代理源的时候查询出错的问题。解决[问题#31](https://github.com/henson/proxypool/issues/31)\n- 2019年2月2日 v2.2 感谢 [@sndnvaps](https://github.com/sndnvaps)\n  - 添加MAC Darwin平台的支持\n  - 添加完全的 sqlite3 支持\n  - 添加新代理平台【feiyi，89ip】\n  - fix一个bug【当数据库中无数据时，不能写入数据】\n- 2018年8月17日 v2.1 感谢 [@harrybi](https://github.com/harrybi)\n  - 对代理可用性的验证，增加speed字段，验证代理的速度（毫秒）\n  - 调用API获取代理IP时自动过滤掉慢的代理（\u003e=1s）\n- 2018年7月17日 v2.0 感谢 [@sndnvaps](https://github.com/sndnvaps)\n  - 使用 xorm 来处理数据库，支持 mysql、mssql、postgres 和 sqlite3\n  - 更新相应爬虫程序\n  - 加入日志\n- 2017年3月30日 v1.0\n  - 采用 mongo 作为数据持久化\n  - 结构简洁，适合二次开发\n\n### 1、代理池设计\n\n　　代理池由四部分组成：\n\n- Getter：\n\n　　代理获取接口，目前有**9**个免费代理源，每调用一次就会抓取这些网站最新的100个代理放入Channel，可自行[添加额外的代理获取接口](#4添加自定义代理采集接口)；\n\n- Channel：\n\n　　临时存放采集来的代理，通过访问稳定的网站去验证代理的有效性，有效则存入数据库；\n\n- Schedule：\n\n　　用定时的计划任务去检测数据库中代理IP的可用性，删除不可用的代理。同时也会主动通过Getter去获取最新代理；\n\n- Api：\n\n　　代理池的访问接口，提供get接口输出JSON，方便爬虫直接使用。\n\n### 2、代码实现\n\n- Api：\n\n　　api接口相关代码，提供`get`接口，输出JSON；\n\n- Getter：\n\n　　代理获取接口，目前抓取这九个网站的免费代理，当然也支持自己扩展代理接口；\n\n1. ~~[快代理](http://www.kuaidaili.com)~~\n2. [代理66](http://www.66ip.cn)\n3. [IP181](http://www.ip181.com)\n4. ~~[有代理](http://www.youdaili.net/Daili/http/)~~\n5. ~~[西刺代理](http://www.xicidaili.com/nn/)~~\n6. ~~[guobanjia](http://www.goubanjia.com/free/gngn/index)~~\n7. ~~[讯代理](http://www.xdaili.cn/freeproxy.html)~~\n8. ~~[无忧代理](http://www.data5u.com/free/index.shtml)~~\n9. [Proxylist+](https://list.proxylistplus.com)\n\n- Pkg：\n\n　　存放一些公共的模块、方法或函数；\n\n- 其他：\n\n　　配置文件:conf/app.ini，数据库、日志配置和代理获取接口配置；\n\n```ini\n; App name\nAPP_NAME = ProxyPool\n\n[server]\nHTTP_ADDR = 0.0.0.0\nHTTP_PORT = 3000\n;Session expires time\nSESSION_EXPIRES =\n\n[database]\n; Either \"mysql\", \"postgres\" or \"sqlite3\", you can connect to TiDB with MySQL protocol\nDB_TYPE = postgres\nHOST = 127.0.0.1:5432\nNAME = ProxyPool\nUSER = postgres\nPASSWD =\n; For \"postgres\" only, either \"disable\", \"require\" or \"verify-full\"\nSSL_MODE = disable\n; For \"sqlite3\" and \"tidb\", use absolute path when you start as service\nPATH = data/ProxyPool.db\n\n[log]\n; Can be \"console\" and \"file\", default is \"console\"\n; ; ; Use comma to separate multiple modes, e.g. \"console, file\"\nMODE       = file\n; Buffer length of channel, keep it as it is if you don't know what it is.\nBUFFER_LEN = 100\n; Either \"Trace\", \"Info\", \"Warn\", \"Error\", \"Fatal\", default is \"Trace\"\nLEVEL      = Info\n; Root path of log files, align will fill it automatically.\nROOT_PATH  =  \n\n; For \"console\" mode only\n[log.console]\n; leave empty to inherit\nLEVEL = Trace\n\n; For \"file\" mode only\n[log.file]\n; leave empty to inherit\nLEVEL          = Info\n; This enables automated log rotate (switch of following options)\nLOG_ROTATE     = true\n; Segment log daily\nDAILY_ROTATE   = true\n; Max size shift of single file, default is 28 means 1 \u003c\u003c 28, 256MB\nMAX_SIZE_SHIFT = 28\n; Max line number of single file\nMAX_LINES      = 1000000\n; Expired days of log file (delete after max days)\nMAX_DAYS       = 7\n\n[log.xorm]\n; Enable file rotation\nROTATE = true\n; Rotate every day\nROTATE_DAILY = true\n; Rotate once file size excesses x MB\nMAX_SIZE = 100\n; Maximum days to keep logger files\nMAX_DAYS = 3\n\n[security]\nINSTALL_LOCK = false\n```\n\n### 3、安装及使用\n\n因为有些代理网站使用了加密页面、混淆代码等反爬技术，要正确采集到代理数据得用到 [PhantomJS](http://phantomjs.org/) ，必须提前先装好。\n\n另外，本项目用到的依赖库有：\n```\nclog \"unknwon.dev/clog/v2\"\ngithub.com/go-ini/ini\ngithub.com/go-xorm/xorm\ngithub.com/go-xorm/core\ngithub.com/go-sql-driver/mysql\ngithub.com/lib/pq\ngithub.com/Aiicy/htmlquery\ngithub.com/PuerkitoBio/goquery\ngithub.com/parnurzeal/gorequest\ngithub.com/nladuo/go-phantomjs-fetcher\n```\n\n下载本项目：\n```\ngo get -u github.com/henson/proxypool\n```\n\n然后配置好相应的app.ini并启动：\n```\ngo build\n./ProxyPool\n```\n\n随机输出可用的代理：\n```\nGET http://localhost:8080/v2/ip\n```\n![HTTP](pics/http.png)\n\n随机输出HTTPS代理：\n```\nGET http://localhost:8080/v2/https\n```\n![HTTPS](pics/https.png)\n\n### 4、添加自定义代理采集接口\n\n其实很简单，只需要在getter包下新增一个采集函数（如例子的Data5u()），甚至可以不需要新建一个go文件（新建文件是为了方便归档采集函数，如例子5u.go）。\n\n```golang\n// 5u.go\n// Data5u get ip from data5u.com\nfunc Data5u() (result []*models.IP) {\n    //处理逻辑\n    ...\n    log.Println(\"Data5u done.\")\n    return\n}\n```\n\n然后在main.go的run函数中添加、删除或注释掉该采集函数的调用即可。\n\n```golang\nfunc run(ipChan chan\u003c- *models.IP) {\n    var wg sync.WaitGroup\n    funs := []func() []*models.IP{\n        getter.Data5u,\n        getter.IP66,\n        getter.KDL,\n        getter.GBJ,\n        getter.Xici,\n        getter.XDL,\n        //getter.IP181,\n        //getter.YDL,\n        getter.PLP,\n    }\n    ...\n}\n```\n\n### 5、异常恢复\n\n之前，偶尔会有朋友跟我反映程序无法编译，经过检查发现都是代理网站发生了变化（或修改了页面或关闭了网站），以致于采集程序原先设计的爬虫不能正常工作而导致了错误的发生。为此，我修改了代码，加入了容错机制，即便爬虫出错了也不会影响到主体程序的运行。出错的采集进程会被主线程忽略，其它正常的采集进程仍将继续工作。\n\n### 6、诚挚的感谢\n\n- 首先感谢您的使用，如果觉得程序还不错也能帮助您解决实际问题，不妨添个赞以鼓励本人继续努力，谢谢！\n- 如果您对程序有任何建议和意见，也欢迎提交issue。\n- 当然，如果您愿意贡献代码和我一起改进本程序，那再好不过了。\n\n## Stargazers over time\n\n[![Stargazers over time](https://starchart.cc/henson/proxypool.svg)](https://starchart.cc/henson/proxypool)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenson%2Fproxypool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhenson%2Fproxypool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenson%2Fproxypool/lists"}