{"id":17604292,"url":"https://github.com/rocm/k8s-device-plugin","last_synced_at":"2026-05-18T11:01:35.796Z","repository":{"id":31597139,"uuid":"127923995","full_name":"ROCm/k8s-device-plugin","owner":"ROCm","description":"Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster","archived":false,"fork":false,"pushed_at":"2025-04-07T12:31:43.000Z","size":15947,"stargazers_count":318,"open_issues_count":17,"forks_count":60,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-04-08T12:09:08.829Z","etag":null,"topics":["k8s","kubernetes","kubernetes-device-plugins","rocm"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ROCm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-03T14:49:37.000Z","updated_at":"2025-04-07T12:31:40.000Z","dependencies_parsed_at":"2023-01-14T19:30:18.425Z","dependency_job_id":"17c1c90a-160a-439d-b868-cbefc8cb459f","html_url":"https://github.com/ROCm/k8s-device-plugin","commit_stats":{"total_commits":119,"total_committers":22,"mean_commits":5.409090909090909,"dds":0.5714285714285714,"last_synced_commit":"4b7fda41a7b619fd80896118d6c81a90f2a469e5"},"previous_names":["rocm/k8s-device-plugin","radeonopencompute/k8s-device-plugin"],"tags_count":32,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2Fk8s-device-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2Fk8s-device-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2Fk8s-device-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ROCm%2Fk8s-device-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ROCm","download_url":"https://codeload.github.com/ROCm/k8s-device-plugin/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247838444,"owners_count":21004580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["k8s","kubernetes","kubernetes-device-plugins","rocm"],"created_at":"2024-10-22T14:08:36.550Z","updated_at":"2026-05-18T11:01:28.741Z","avatar_url":"https://github.com/ROCm.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AMD GPU device plugin for Kubernetes\n\n[![Go Report Card](https://goreportcard.com/badge/github.com/ROCm/k8s-device-plugin)](https://goreportcard.com/report/github.com/ROCm/k8s-device-plugin)\n\n## Introduction\n\nThis is a [Kubernetes][k8s] [device plugin][dp] implementation that enables the registration of AMD GPU in a container cluster for compute workload.  With the appropriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to run jobs that require AMD GPU.\n\nMore information about [ROCm][rocm].\n\n## Prerequisites\n\n* [ROCm capable machines][sysreq]\n* [kubeadm capable machines][kubeadm] (if you are using kubeadm to deploy your k8s cluster)\n* [ROCm kernel][rock] ([Installation guide][rocminstall]) or latest AMD GPU Linux driver ([Installation guide][amdgpuinstall])\n* A [Kubernetes deployment][k8sinstall]\n* If device health checks are enabled, the pods must be allowed to run in privileged mode (for example the `--allow-privileged=true` flag for kube-apiserver), in order to access `/dev/kfd`\n\n## Limitations\n\n* This plugin targets Kubernetes v1.18+.\n\n## Deployment\n\nThe device plugin needs to be run on all the nodes that are equipped with AMD GPU.  The simplest way of doing so is to create a Kubernetes [DaemonSet][ds], which runs a copy of a pod on all (or some) Nodes in the cluster.  We have a pre-built Docker image on [DockerHub][dhk8samdgpudp] that you can use for your DaemonSet.  This repository also has a pre-defined yaml file named `k8s-ds-amdgpu-dp.yaml`.  You can create a DaemonSet in your Kubernetes cluster by running this command:\n\n```\nkubectl create -f k8s-ds-amdgpu-dp.yaml\n```\n\nor directly pull from the web using\n\n```\nkubectl create -f https://raw.githubusercontent.com/ROCm/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml\n```\n\nIf you want to enable the experimental device health check, please use `k8s-ds-amdgpu-dp-health.yaml` **after** `--allow-privileged=true` is set for kube-apiserver.\n\n### Helm Chart\n\nIf you want to deploy this device plugin using Helm, a [Helm Chart][helmamdgpu] is available via [Artifact Hub][artifacthub].\n\n## Example workload\n\nYou can restrict workloads to a node with a GPU by adding `resources.limits` to the pod definition.  An example pod definition is provided in `example/pod/alexnet-gpu.yaml`.  This pod runs the timing benchmark for AlexNet on AMD GPU and then goes to sleep. You can create the pod by running:\n\n```\nkubectl create -f alexnet-gpu.yaml\n```\n\nor\n\n```\nkubectl create -f https://raw.githubusercontent.com/ROCm/k8s-device-plugin/master/example/pod/alexnet-gpu.yaml\n```\n\nand then check the pod status by running\n\n```\nkubectl describe pods\n```\n\nAfter the pod is created and running, you can see the benchmark result by running:\n\n```\nkubectl logs alexnet-tf-gpu-pod alexnet-tf-gpu-container\n```\n\nFor comparison, an example pod definition of running the same benchmark with CPU is provided in `example/pod/alexnet-cpu.yaml`.\n\n## Labelling node with additional GPU properties\n\nPlease see [AMD GPU Kubernetes Node Labeller](cmd/k8s-node-labeller/README.md) for details.  An example configuration is in [k8s-ds-amdgpu-labeller.yaml](k8s-ds-amdgpu-labeller.yaml):\n\n```\nkubectl create -f k8s-ds-amdgpu-labeller.yaml\n```\n\nor\n\n```\nkubectl create -f https://raw.githubusercontent.com/ROCm/k8s-device-plugin/master/k8s-ds-amdgpu-labeller.yaml\n```\n\n# Health per GPU\n* Extends more granular health detection per GPU using the exporter health\n  service over grpc socket service mounted on /var/lib/amd-metrics-exporter/\n\n## Notes\n\n* This plugin uses [`go modules`][gm] for dependencies management\n* Please consult the `Dockerfile` on how to build and use this plugin independent of a docker image\n\n## TODOs\n\n* Add proper GPU health check (health check without `/dev/kfd` access.)\n\n[artifacthub]: https://artifacthub.io/packages/helm/amd-gpu-helm/amd-gpu\n[ds]: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/\n[dp]: https://kubernetes.io/docs/concepts/cluster-administration/device-plugins/\n[helmamdgpu]: https://artifacthub.io/packages/helm/amd-gpu-helm/amd-gpu\n[rocm]: https://rocm.docs.amd.com/en/latest/what-is-rocm.html\n[rock]: https://github.com/ROCm/ROCK-Kernel-Driver\n[rocminstall]: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html\n[amdgpuinstall]: https://amdgpu-install.readthedocs.io/en/latest/\n[sysreq]: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html\n[gm]: https://blog.golang.org/using-go-modules\n[kubeadm]: https://kubernetes.io/docs/setup/independent/install-kubeadm/#before-you-begin\n[k8sinstall]: https://kubernetes.io/docs/setup/independent/install-kubeadm\n[k8s]: https://kubernetes.io\n[dhk8samdgpudp]: https://hub.docker.com/r/rocm/k8s-device-plugin/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocm%2Fk8s-device-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frocm%2Fk8s-device-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frocm%2Fk8s-device-plugin/lists"}