{"id":22049281,"url":"https://github.com/ning1875/dynamic-sharding","last_synced_at":"2025-06-12T07:35:33.734Z","repository":{"id":44896800,"uuid":"271232703","full_name":"ning1875/dynamic-sharding","owner":"ning1875","description":"用动态分片解决pushgateway高可用 单点 HA问题    ","archived":false,"fork":false,"pushed_at":"2023-05-05T02:28:41.000Z","size":459,"stargazers_count":47,"open_issues_count":9,"forks_count":24,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-08T23:08:17.942Z","etag":null,"topics":["consistent-hashing","consul","golang","prometheus","pushgateway","watch"],"latest_commit_sha":null,"homepage":"https://ning1875.ke.qq.com/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ning1875.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-06-10T09:17:42.000Z","updated_at":"2024-09-01T17:28:53.000Z","dependencies_parsed_at":"2025-05-08T23:08:19.627Z","dependency_job_id":"9c8ac60a-4160-4b86-a5a5-37a77efd28fb","html_url":"https://github.com/ning1875/dynamic-sharding","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/ning1875/dynamic-sharding","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ning1875%2Fdynamic-sharding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ning1875%2Fdynamic-sharding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ning1875%2Fdynamic-sharding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ning1875%2Fdynamic-sharding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ning1875","download_url":"https://codeload.github.com/ning1875/dynamic-sharding/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ning1875%2Fdynamic-sharding/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259420905,"owners_count":22854691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["consistent-hashing","consul","golang","prometheus","pushgateway","watch"],"created_at":"2024-11-30T14:14:47.772Z","updated_at":"2025-06-12T07:35:33.687Z","avatar_url":"https://github.com/ning1875.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# k8s零基础入门运维课程\n- [k8s零基础入门运维课程，计算存储网络和常见的集群相关操作](https://ke.qq.com/course/5829699)\n\n# k8s纯源码解读教程(3个课程内容合成一个大课程)\n- [k8s底层原理和源码讲解之精华篇](https://ke.qq.com/course/4093533)\n- [k8s底层原理和源码讲解之进阶篇](https://ke.qq.com/course/4236389)\n- [k8s纯源码解读课程，助力你变成k8s专家](https://ke.qq.com/course/4697341)\n\n\n# k8s运维进阶调优课程\n- [k8s运维大师课程](https://ke.qq.com/course/5586848)\n\n# k8s管理运维平台实战\n- [k8s管理运维平台实战前端vue后端golang](https://ke.qq.com/course/5856444)\n\n\n# k8s二次开发课程\n- [k8s二次开发之基于真实负载的调度器](https://ke.qq.com/course/5814034)\n- [k8s-operator和crd实战开发 助你成为k8s专家](https://ke.qq.com/course/5458555)\n\n# cicd 课程\n- [tekton全流水线实战和pipeline运行原理源码解读](https://ke.qq.com/course/5467720)\n\n\n# prometheus全组件的教程\n- [01_prometheus零基础入门，grafana基础操作，主流exporter采集配置](https://ke.qq.com/course/5826832)\n- [02_prometheus全组件配置使用、底层原理解析、高可用实战](https://ke.qq.com/course/3549215)\n- [03_prometheus-thanos使用和源码解读](https://ke.qq.com/course/3883439)\n- [04_kube-prometheus和prometheus-operator实战和原理介绍](https://ke.qq.com/course/3912017)\n- [05_prometheus源码讲解和二次开发](https://ke.qq.com/course/4236995)\n- [06_prometheus监控k8s的实战配置和原理讲解，写go项目暴露业务指标](https://ke.qq.com/course/5837369)\n\n# go语言课程\n- [golang基础课程](https://ke.qq.com/course/4334898)\n- [golang实战课，一天编写一个任务执行系统，客户端和服务端架构](https://ke.qq.com/course/3550865)\n- [golang运维开发项目之k8s网络探测实战](https://ke.qq.com/course/5860635)\n- [golang运维平台实战，服务树,日志监控，任务执行，分布式探测](https://ke.qq.com/course/4334675)\n- [golang运维开发实战课程之k8s巡检平台](https://ke.qq.com/course/5818923)\n\n# 直播答疑sre职业发展规划\n- [k8s-prometheus课程答疑和运维开发职业发展规划](https://ke.qq.com/course/5506477)\n\n\n# k8s从零基础入门到专家到运维大师\n\n| 学习方向                     | 分析进阶视频  | 教程地址       | 备注  | \n|--------------------------|---------|------------|-----|\n| 01_k8s零基础入门实战            | [地址](https://www.bilibili.com/video/BV1Mt4y1P7bL/)  | \t[地址](https://ke.qq.com/course/5829699)    |     |  \n| 02_k8s纯源码解读课程，助力你变成k8s专家 | [地址](https://www.bilibili.com/video/BV1or4y1877p/)  | \t[地址](https://ke.qq.com/course/4697341)    |     |  \n| 03_k8s底层原理和源码讲解之精华篇      | [地址](https://www.bilibili.com/video/BV1T34y127gU/)  | \t[地址](https://ke.qq.com/course/4093533)    |     |  \n| 04_k8s底层原理和源码讲解之进阶篇      | [地址](https://www.bilibili.com/video/BV1si4y1f7Xo/)  | \t[地址](https://ke.qq.com/course/4236389)    |     |  \n\n\n# 01 新手课程\n- k8s零基础买这个课，只需有linux基础就能学。玩转k8s集群常用的监控/日志/控制台等组件 https://ke.qq.com/course/5829699\n- k8s中的prometheus监控实战和底层原理讲解 https://ke.qq.com/course/5837369\n# 02 源码解读课程\n- k8s源码解读看这3个课，合在一起是个大课，没有顺序\n- https://ke.qq.com/course/4697341\n- https://ke.qq.com/course/4093533\n- https://ke.qq.com/course/4236389\n# 03 进阶二开课程 \n- 偏k8s二次开发 k8s-operator和crd实战开发 https://ke.qq.com/course/5458555\n- k8s运维大师课程是 真实高并发k8s集群调优，疑难杂症解决和一些工具的开发   https://ke.qq.com/course/5586848\n- k8s二次开发之基于真实负载的调度器 https://ke.qq.com/course/5814034\n\n# k8s 开发篇\n\n\n| 学习方向                              | 分析进阶视频  | 教程地址       | 备注  | \n|-----------------------------------|---------|------------|-----|\n| 01_k8s运维大师课程                      | [地址](https://www.bilibili.com/video/BV11B4y1k7LB/)  | \t[地址](https://ke.qq.com/course/5586848)    |     |  \n| 02_k8s-operator和crd实战开发 助你成为k8s专家 | [地址](https://www.bilibili.com/video/BV1cv4y1371X/)  | \t[地址](https://ke.qq.com/course/5458555)    |     |  \n| 02_k8s二次开发之基于真实负载的调度器 | [地址](https://www.bilibili.com/video/BV1qB4y1G7Kf/)  | \t[地址](https://ke.qq.com/course/5814034)    |     |  \n\n\n\n# prometheus监控从入门到专家之路\n\n| 学习方向                                           | 分析进阶视频  | 教程地址       | 备注  | \n|------------------------------------------------|---------|------------|-----|\n| 01_prometheus零基础入门，grafana基础操作，主流exporter采集配置\t | [地址](https://www.bilibili.com/video/BV1814y1e73y/)  | \t[地址](https://ke.qq.com/course/5826832)    |     |  \n| 02_prometheus全组件配置使用、底层原理解析、高可用实战\t             | [地址](https://www.bilibili.com/video/BV1oZ4y1f7au/)  | \t[地址](https://ke.qq.com/course/3549215)    |     |  \n| 03_kube-prometheus和prometheus-operator实战和原理介绍\t | [地址](https://www.bilibili.com/video/BV1LR4y1L7jV/)  | \t[地址](https://ke.qq.com/course/3912017)    |     |  \n| 04_prometheus-thanos使用和源码解读                    | [地址](https://www.bilibili.com/video/BV1814y1e73y/)  | \t[地址](https://ke.qq.com/course/3883439)    |     |  \n| 05_prometheus源码讲解和二次开发                    | [地址](https://www.bilibili.com/video/BV1hS4y1m73Q/)  | \t[地址](https://ke.qq.com/course/4236995)    |     |  \n\n\n# golang运维开发之从0基础到运维平台\n\n| 学习方向                                           | 分析进阶视频  | 教程地址       | 备注  | \n|------------------------------------------------|---------|------------|-----|\n| 01_golang基础课程 | [地址](https://www.bilibili.com/video/BV1WT411M7Gh/)  | \t[地址](https://ke.qq.com/course/4334898)    |     |  \n| 02_golang运维平台实战，服务树,日志监控，任务执行，分布式探测\t             | [地址](https://www.bilibili.com/video/BV14T4y1k7oo)  | \t[地址](https://ke.qq.com/course/4334675)    |     |  \n| 03_golang运维开发实战课程之k8s巡检平台]\t | [地址](https://www.bilibili.com/video/BV1Ad4y1r7C4/)  | \t[地址](https://ke.qq.com/course/5818923)    |     |  \n\n\n# cicd实战\n\n| 学习方向                                           | 分析进阶视频  | 教程地址       | 备注  | \n|------------------------------------------------|---------|------------|-----|\n| 01_tekton全流水线实战和pipeline运行原理源码解读 | [地址](https://www.bilibili.com/video/BV13P4y1Z7Xv/)  | \t[地址](https://ke.qq.com/course/5458555)    |     |  \n\n\n\n\n# 直播答疑sre职业发展规划\n- [k8s-prometheus课程答疑和运维开发职业发展规划](https://ke.qq.com/course/5506477)\n\n\n# 关于白嫖和付费\n- 白嫖当然没关系，我已经贡献了很多文章和开源项目，当然还有免费的视频\n- 但是客观的讲，如果你能力超强是可以一直白嫖的，可以看源码。什么问题都可以解决\n- 看似免费的资料很多，但大部分都是边角料，核心的东西不会免费，更不会有大神给你答疑\n- thanos和kube-prometheus如果你对prometheus源码把控很好的话，再加上k8s知识的话就觉得不难了\n\n# 架构图\n![image](./images/dynamic-sharding架构图.jpg)\n# pgw是什么\n[项目介绍](https://github.com/prometheus/pushgateway)\n\n## pgw打点特点\n\n- 没有使用grouping对应的接口uri为 \n```\nhttp://pushgateway_addr/metrics/job/\u003cJOB_NAME\u003e\n```\n- 使用grouping对应的接口uri为 \n```\nhttp://pushgateway_addr/metrics/job/\u003cJOB_NAME\u003e/\u003cLABEL_NAME\u003e/\u003cLABEL_VALUE\u003e\n```\n- put/post方法区别在于 put只替换metrics和job相同的 post替换label全部相同的\n# pgw单点问题\n## 如果简单把pgw挂在lb后面的问题\n- lb后面rr轮询:如果不加控制的让push数据随机打到多个pushgateway实例上,prometheus无差别scrape会导致数据错乱,表现如下\n\n![image](./images/pgw_miss.png)\n![image](./images/pgw_miss2.png)\n- 根本原因是在t1时刻 指标的值为10 t2时刻 值为20\n- t1时刻轮询打点到了pgw-a上 t2时刻打点到了pgw-b上\n- 而promethues采集的时候两边全都采集导致本应该一直上升的值呈锯齿状\n## 如果对uri做静态一致性哈希+prome静态配置pgw\n- 假设有3个pgw,前面lb根据request_uri做一致性哈希\n- promethues scrape时静态配置3个pgw实例\n```\n  - job_name: pushgateway\n    honor_labels: true\n    honor_timestamps: true\n    scrape_interval: 5s\n    scrape_timeout: 4s\n    metrics_path: /metrics\n    scheme: http\n    static_configs:\n    - targets:\n      - pgw-a:9091\n      - pgw-b:9091\n\n```\n- 结果是可以做到哈希分流,但无法解决某个pgw实例挂掉,哈希到这个实例上面的请求失败问题\n\n## 解决方案是: 动态一致性哈希分流+consul service_check\n![image](./images/log.jpg)\n- dynamic-sharding服务启动会根据配置文件注册pgw服务到consul中\n- 由consul定时对pgw server做http check\n- push请求会根据请求path做一致性哈希分离,eg:\n```\n# 仅job不同\n- http://pushgateway_addr/metrics/job/job_a\n- http://pushgateway_addr/metrics/job/job_b\n- http://pushgateway_addr/metrics/job/job_c\n# label不同\n- http://pushgateway_addr/metrics/job/job_a/tag_a/value_a\n- http://pushgateway_addr/metrics/job/job_a/tag_a/value_b\n```\n- 当多个pgw中实例oom或异常重启,consul check service会将bad实例标记为down\n~~- dynamic-sharding轮询检查实例数量变化~~\n- dynamic-sharding 会`Watch` pgw节点数量变化\n- 重新生成哈希环,rehash将job分流\n- 同时promethues使用consul服务发现的pgw实例列表,无需手动变更\n- 采用redirect而不处理请求,简单高效\n- dynamic-sharding本身无状态,可启动多个实例作为流量接入层和pgw server之间\n- 扩容时同时也需要重启所有存量pgw服务\n- 不足:没有解决promethues单点问题和分片问题\n项目地址: [https://github.com/ning1875/dynamic-sharding](https://github.com/ning1875/dynamic-sharding)\n\n## 使用指南\n   \n\u003e 编译或下载\n```shell script\n# 编译build\n$ git clone https://github.com/ning1875/dynamic-sharding.git\n$ cd  dynamic-sharding \u0026\u0026 make \n# 下载 ：releases中直接下载tag包\n# 如https://github.com/ning1875/dynamic-sharding/releases/download/v2.0/dynamic-sharding-2.0.linux-amd64.tar.gz\n```\n\n\u003e 修改配置\n```shell script\n# 修改配置文件\n# 补充dynamic-sharding.yml中的信息:\n```\n\n\u003e 启动dynamic-sharding服务\n\n```shell script\n./dynamic-sharding --config.file=dynamic-sharding.yml\n```\n \n\u003e 和promtheus集成 \n\u003e Add the following text to your promtheus.yaml's scrape_configs section\n```yaml\nscrape_configs:\n  - job_name: pushgateway\n    consul_sd_configs:\n      - server: $cousul_api\n        services:\n          - pushgateway\n    relabel_configs:\n    - source_labels:  [\"__meta_consul_dc\"]\n      target_label: \"dc\"\n\n```\n\u003e 调用方调用 dynamic-sharding接口即可 eg: http://localhost:9292/\n\n## 运维指南\n\n### pgw节点故障 (无需关心) \n\u003e eg: 启动了4个pgw实例,其中一个宕机了,则流量从4-\u003e3,以此类推\n\n\n### pgw节点恢复 \n\u003e eg: 启动了4个pgw实例,其中一个宕机了,过一会儿恢复了,那么它会被consul unregister掉\n\u003e 避免出现和扩容一样的case: 再次rehash的job 会持续在原有pgw被prome scrap，而且value不会更新\n\n\n\n### 扩容\n\u003e 修改yml配置文件将pgw servers 调整到扩容后的数量,重启服务dynamic-sharding \n\u003e 注意 同时也要重启所有存量pgw服务,不然rehash的job 会持续在原有pgw被prome scrap，而且value不会更新\n\n\n\n### 缩容\n\n```shell script\n# 方法一\n## 调用cousul api  \ncurl -vvv --request PUT 'http://$cousul_api/v1/agent/service/deregister/$pgw_addr_$pgw_port'\neg: curl -vvv --request PUT 'http://localhost:8500/v1/agent/service/deregister/1.1.1.1_9091'\n\n## 修改yml配置文件将pgw servers 调整到缩容后的数量，避免服务重启时再次注册缩容节点\n\n# 方法二\n## 停止缩容节点服务,consul会将服务踢出,然后再注销\n\n```\n\n\n\n\n### 使用python sdk时遇到的 urllib2.HTTPError: HTTP Error 307: Temporary Redirect 问题\n#### 原因\n- 查看代码得知python sdk在构造pgw实例时使用默认的handler方法，而其没有`follow_redirect`导致的\n\n```python\ndef push_to_gateway(gateway, job, registry, grouping_key=None, timeout=30,handler=default_handler):\n```\n\n#### 解决方法\n\n- 使用requests库自定义一个handler，初始化的时候指定\n\n```python\ndef custom_handle(url, method, timeout, headers, data):\n    def handle():\n         h = {}\n         for k, v in headers:\n            h[k] = v\n         if method == 'PUT':\n            resp = requests.put(url, data=data, headers=h, timeout=timeout)\n         elif method == 'POST':\n            resp = requests.post(url, data=data, headers=h, timeout=timeout)\n         elif method == 'DELETE':\n            resp = requests.delete(url, data=data, headers=h, timeout=timeout)\n         else:\n            return\n         if resp.status_code \u003e= 400:\n            raise IOError(\"error talking to pushgateway: {0} {1}\".format(resp.status_code, resp.text))\n    return handle\n \n# push_to_gateway(push_addr, job='some_job', registry=r1, handler=custom_handle)\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fning1875%2Fdynamic-sharding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fning1875%2Fdynamic-sharding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fning1875%2Fdynamic-sharding/lists"}