{"id":13560795,"url":"https://github.com/AliyunContainerService/gpushare-scheduler-extender","last_synced_at":"2025-04-03T16:31:10.588Z","repository":{"id":38291393,"uuid":"171210225","full_name":"AliyunContainerService/gpushare-scheduler-extender","owner":"AliyunContainerService","description":"GPU Sharing Scheduler for Kubernetes Cluster","archived":false,"fork":false,"pushed_at":"2023-12-29T07:04:43.000Z","size":5786,"stargazers_count":1458,"open_issues_count":110,"forks_count":312,"subscribers_count":38,"default_branch":"master","last_synced_at":"2025-03-31T16:13:04.998Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AliyunContainerService.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-18T03:44:41.000Z","updated_at":"2025-03-26T04:14:06.000Z","dependencies_parsed_at":"2024-06-18T19:47:06.102Z","dependency_job_id":"5629a48b-3343-443e-8225-52588f382c21","html_url":"https://github.com/AliyunContainerService/gpushare-scheduler-extender","commit_stats":{"total_commits":73,"total_committers":34,"mean_commits":"2.1470588235294117","dds":0.589041095890411,"last_synced_commit":"c0b61411cddabd1b19500878ab355afeeae5589e"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AliyunContainerService%2Fgpushare-scheduler-extender","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AliyunContainerService%2Fgpushare-scheduler-extender/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AliyunContainerService%2Fgpushare-scheduler-extender/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AliyunContainerService%2Fgpushare-scheduler-extender/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AliyunContainerService","download_url":"https://codeload.github.com/AliyunContainerService/gpushare-scheduler-extender/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247036963,"owners_count":20873061,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:00:49.568Z","updated_at":"2025-04-03T16:31:10.565Z","avatar_url":"https://github.com/AliyunContainerService.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# GPU Sharing Scheduler Extender in Kubernetes \n\n[![CircleCI](https://circleci.com/gh/AliyunContainerService/gpushare-scheduler-extender.svg?style=svg)](https://circleci.com/gh/AliyunContainerService/gpushare-scheduler-extender)\n[![Build Status](https://travis-ci.org/AliyunContainerService/gpushare-scheduler-extender.svg?branch=master)](https://travis-ci.org/AliyunContainerService/gpushare-scheduler-extender) \n[![Go Report Card](https://goreportcard.com/badge/github.com/AliyunContainerService/gpushare-scheduler-extender)](https://goreportcard.com/report/github.com/AliyunContainerService/gpushare-scheduler-extender)\n\n\n## Overview\n\nMore and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods. The community is also very interested in this [topic](https://github.com/kubernetes/kubernetes/issues/52757).\n\nNow there is a GPU sharing solution on native Kubernetes: it is based on scheduler extenders and device plugin mechanism, so you can reuse this solution easily in your own Kubernetes. \n\n## Prerequisites\n\n- Kubernetes 1.11+\n- golang 1.19+\n- NVIDIA drivers ~= 361.93\n- Nvidia-docker version \u003e 2.0 (see how to [install](https://github.com/NVIDIA/nvidia-docker) and it's [prerequisites](https://github.com/nvidia/nvidia-docker/wiki/Installation-\\(version-2.0\\)#prerequisites))\n- Docker configured with Nvidia as the [default runtime](https://github.com/NVIDIA/nvidia-docker/wiki/Advanced-topics#default-runtime).\n\n## Design\n\nFor more details about the design of this project, please read this [Design document](docs/designs/designs.md).\n\n## Setup\n\nYou can follow this [Installation Guide](docs/install.md). If you are using [Alibaba Cloud Kubernetes](https://cn.aliyun.com/product/kubernetes), please follow this [doc](deployer/README.md) to install with Helm Charts.\n\n## User Guide\n\nYou can check this [User Guide](docs/userguide.md).\n\n## Developing\n\n### Scheduler Extender\n\n```bash\ngit clone https://github.com/AliyunContainerService/gpushare-scheduler-extender.git \u0026\u0026 cd gpushare-scheduler-extender\nmake build-image\n```\n\n### Device Plugin\n\n```bash\ngit clone https://github.com/AliyunContainerService/gpushare-device-plugin.git \u0026\u0026 cd gpushare-device-plugin\ndocker build -t cheyang/gpushare-device-plugin .\n```\n\n### Kubectl Extension\n\n- golang \u003e 1.10\n\n```bash\nmkdir -p $GOPATH/src/github.com/AliyunContainerService\ncd $GOPATH/src/github.com/AliyunContainerService\ngit clone https://github.com/AliyunContainerService/gpushare-device-plugin.git\ncd gpushare-device-plugin\ngo build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/*.go\n```\n\n## Demo\n\n### - Demo 1: Deploy multiple GPU Shared Pods and schedule them on the same GPU device in binpack way \n\n[![](demo1.jpg)](http://cloud.video.taobao.com//play/u/2987821887/p/2/e/6/t/1/214292079721.mp4)\n\n### - Demo 2:  Avoid GPU memory requests that fit at the node level, but not at the GPU device level\n\n[![](demo2.jpg)](http://cloud.video.taobao.com//play/u/2987821887/p/2/e/6/t/1/214235285109.mp4)\n\n## Related Project\n\n- [gpushare device plugin](https://github.com/AliyunContainerService/gpushare-device-plugin.git)\n\n## Roadmap\n\n- Integrate Nvidia MPS as the option for isolation\n- Automated Deployment for the Kubernetes cluster which is deployed by kubeadm\n- Scheduler Extener High Availablity\n- Generic Solution for GPU, RDMA and other devices\n\n## Adopters\n\nIf you are intrested in GPUShare and would like to share your experiences with others, you are warmly welcome to add your information on [ADOPTERS.md](docs/ADOPTERS.md) page. We will continuousely discuss new requirements and feature design with you in advance.\n\n\n## Acknowledgments\n\n- GPU sharing solution is based on [Nvidia Docker2](https://github.com/NVIDIA/nvidia-docker), and their [gpu sharing design](https://docs.google.com/document/d/1ZgKH_K4SEfdiE_OfxQ836s4yQWxZfSjS288Tq9YIWCA/edit#heading=h.r88v2xgacqr) is our reference. The Nvidia Community is very supportive and We are very grateful.\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAliyunContainerService%2Fgpushare-scheduler-extender","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAliyunContainerService%2Fgpushare-scheduler-extender","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAliyunContainerService%2Fgpushare-scheduler-extender/lists"}