{"id":13707159,"url":"https://github.com/volcano-sh/devices","last_synced_at":"2025-06-29T06:32:09.574Z","repository":{"id":39871916,"uuid":"247379463","full_name":"volcano-sh/devices","owner":"volcano-sh","description":"Device plugins for Volcano, e.g. GPU","archived":false,"fork":false,"pushed_at":"2024-09-14T07:56:16.000Z","size":67786,"stargazers_count":115,"open_issues_count":25,"forks_count":43,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-04T20:01:53.260Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/volcano-sh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-15T01:10:26.000Z","updated_at":"2025-02-27T07:04:29.000Z","dependencies_parsed_at":"2023-12-26T07:23:50.339Z","dependency_job_id":"23e6bd6f-45cb-4e68-bd28-0e18b5dd218a","html_url":"https://github.com/volcano-sh/devices","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/volcano-sh/devices","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcano-sh%2Fdevices","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcano-sh%2Fdevices/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcano-sh%2Fdevices/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcano-sh%2Fdevices/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/volcano-sh","download_url":"https://codeload.github.com/volcano-sh/devices/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/volcano-sh%2Fdevices/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262546987,"owners_count":23327076,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T22:01:22.876Z","updated_at":"2025-06-29T06:32:09.550Z","avatar_url":"https://github.com/volcano-sh.png","language":"Go","readme":"# Volcano device plugin for Kubernetes\n\n**Note**:\nThis is based on [Nvidia Device Plugin](https://github.com/NVIDIA/k8s-device-plugin) to support soft isolation of GPU card.\nAnd collaborate with volcano, it is possible to enable GPU sharing.\n\n## Table of Contents\n\n- [About](#about)\n- [Prerequisites](#prerequisites)\n- [Quick Start](#quick-start)\n  - [Preparing your GPU Nodes](#preparing-your-gpu-nodes)\n  - [Enabling GPU Support in Kubernetes](#enabling-gpu-support-in-kubernetes)\n  - [Running GPU Sharing Jobs](#running-gpu-sharing-jobs)\n  - [Running GPU Number Jobs](#running-gpu-number-jobs)\n- [Docs](#docs)\n- [Issues and Contributing](#issues-and-contributing)\n\n\n## About\n\nThe Volcano device plugin for Kubernetes is a Daemonset that allows you to automatically:\n- Expose the number of GPUs on each node of your cluster\n- Keep track of the health of your GPUs\n- Run GPU enabled containers in your Kubernetes cluster.\n\nThis repository contains Volcano's official implementation of the [Kubernetes device plugin](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md).\n\n## Prerequisites\n\nThe list of prerequisites for running the Volcano device plugin is described below:\n* NVIDIA drivers ~= 384.81\n* nvidia-docker version \u003e 2.0 (see how to [install](https://github.com/NVIDIA/nvidia-docker) and it's [prerequisites](https://github.com/nvidia/nvidia-docker/wiki/Installation-\\(version-2.0\\)#prerequisites))\n* docker configured with nvidia as the [default runtime](https://github.com/NVIDIA/nvidia-docker/wiki/Advanced-topics#default-runtime).\n* Kubernetes version \u003e= 1.10\n\n## Quick Start\n\n### Preparing your GPU Nodes\n\nThe following steps need to be executed on all your GPU nodes.\nThis README assumes that the NVIDIA drivers and nvidia-docker have been installed.\n\nNote that you need to install the nvidia-docker2 package and not the nvidia-container-toolkit.\nThis is because the new `--gpus` options hasn't reached kubernetes yet. Example:\n```bash\n# Add the package repositories\n$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)\n$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -\n$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list\n\n$ sudo apt-get update \u0026\u0026 sudo apt-get install -y nvidia-docker2\n$ sudo systemctl restart docker\n```\n\nYou will need to enable the nvidia runtime as your default runtime on your node.\nWe will be editing the docker daemon config file which is usually present at `/etc/docker/daemon.json`:\n```json\n{\n    \"default-runtime\": \"nvidia\",\n    \"runtimes\": {\n        \"nvidia\": {\n            \"path\": \"/usr/bin/nvidia-container-runtime\",\n            \"runtimeArgs\": []\n        }\n    }\n}\n```\n\u003e *if `runtimes` is not already present, head to the install page of [nvidia-docker](https://github.com/NVIDIA/nvidia-docker)*\n\n### Enabling GPU Support in Kubernetes\n\nOnce you have enabled this option on *all* the GPU nodes you wish to use,\nyou can then enable GPU support in your cluster by deploying the following Daemonset:\n\n```shell\n$ kubectl create -f volcano-device-plugin.yml\n```\n\n**Note** that volcano device plugin can be configured. For example, it can specify gpu strategy by adding in the yaml file ''args: [\"--gpu-strategy=number\"]'' under ''image: volcanosh/volcano-device-plugin''. More configuration can be found at [volcano device plugin configuration](https://github.com/volcano-sh/devices/blob/master/doc/config.md).\n\n### Running GPU Sharing Jobs (Without memory isolation)\n\nNVIDIA GPUs can now be shared via container level resource requirements using the resource name volcano.sh/gpu-memory:\n\nThe node resource capability and allocatable metadata will show volcano.sh/gpu-number, but user **can not** specify this resource name at the container level. This is because the device-plugin patches volcano.sh/gpu-number to show the total number of gpus, which is only used for volcano scheduler to calculate the memory for each gpu. GPU number in this mode is not registered in kubelet and does not have health-check on it.\n\n```yaml\napiVersion: v1\nkind: Pod\nmetadata:\n  name: gpu-pod1\nspec:\n  schedulerName: volcano\n  containers:\n    - name: cuda-container\n      image: nvidia/cuda:9.0-devel\n      resources:\n        limits:\n          volcano.sh/gpu-memory: 1024 # requesting 1024MB GPU memory\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: gpu-pod2\nspec:\n  schedulerName: volcano\n  containers:\n    - name: cuda-container\n      image: nvidia/cuda:9.0-devel\n      resources:\n        limits:\n          volcano.sh/gpu-memory: 1024 # requesting 1024MB GPU memory\n```\n\n\u003e **WARNING:** *if you don't request GPUs when using the device plugin with NVIDIA images all\n\u003e the GPUs on the machine will be exposed inside your container.*\n\n### Running GPU Number Jobs (Without number isolation)\n\nNVIDIA GPUs can now be requested via container level resource requirements using the resource name volcano.sh/gpu-number:\n\n```shell script\n$ cat \u003c\u003cEOF | kubectl apply -f -\napiVersion: v1\nkind: Pod\nmetadata:\n  name: gpu-pod1\nspec:\n  containers:\n    - name: cuda-container\n      image: nvidia/cuda:9.0-devel\n      command: [\"sleep\"]\n      args: [\"100000\"]\n      resources:\n        limits:\n          volcano.sh/gpu-number: 1 # requesting 1 gpu cards\nEOF\n```\n\n## Docs\n\nPlease note that:\n- the device plugin feature is beta as of Kubernetes v1.11.\n- the Volcano device plugin is alpha and is missing\n    - More comprehensive GPU health checking features\n    - GPU cleanup features\n    - GPU hard isolation\n    - ...\n\nThe next sections are focused on building the device plugin and running it.\n\n### With Docker\n\n#### Build\n```shell\n$ make ubuntu20.04.\n```\n\n#### Run locally\n```shell\n$ docker run --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:{version}\n```\n\n#### Deploy as DaemonSet:\n```shell\n$ kubectl create -f nvidia-device-plugin.yml\n```\n\n# Issues and Contributing\n[Checkout the Contributing document!](CONTRIBUTING.md)\n\n* You can report a bug by [filing a new issue](https://github.com/volcano-sh/devices)\n* You can contribute by opening a [pull request](https://help.github.com/articles/using-pull-requests/)\n\n## Versioning\n\nThe version exactly matches with [Volcano](https://github.com/volcano-sh/volcano).\n\n## Upgrading Kubernetes with the device plugin\n\nUpgrading Kubernetes when you have a device plugin deployed doesn't require you to do any,\nparticular changes to your workflow.\nThe API is versioned and is pretty stable (though it is not guaranteed to be non breaking),\nupgrading kubernetes won't require you to deploy a different version of the device plugin and you will\nsee GPUs re-registering themselves after you node comes back online.\n\n\nUpgrading the device plugin is a more complex task. It is recommended to drain GPU tasks as\nwe cannot guarantee that GPU tasks will survive a rolling upgrade.\nHowever we make best efforts to preserve GPU tasks during an upgrade.\n","funding_links":[],"categories":["Go"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvolcano-sh%2Fdevices","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvolcano-sh%2Fdevices","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvolcano-sh%2Fdevices/lists"}