Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/curt-park/k8s-gpu-sharing-pods
Tricks for GPU sharing pods on Kubernetes without any use of middleware like HAMi or DRA
https://github.com/curt-park/k8s-gpu-sharing-pods
argo-workflows gpu gpu-sharing helm kubernetes minikube
Last synced: 10 days ago
JSON representation
Tricks for GPU sharing pods on Kubernetes without any use of middleware like HAMi or DRA
- Host: GitHub
- URL: https://github.com/curt-park/k8s-gpu-sharing-pods
- Owner: Curt-Park
- License: mit
- Created: 2024-08-11T22:39:48.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-24T06:50:26.000Z (5 months ago)
- Last Synced: 2024-11-15T19:35:19.898Z (2 months ago)
- Topics: argo-workflows, gpu, gpu-sharing, helm, kubernetes, minikube
- Language: Mustache
- Homepage:
- Size: 93.8 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GPU Sharing Pods on Kubernetes
By default, Kubernetes doesn't allow GPU sharing cases as follows:
- A pod with multiple containers that share a single GPU.
- Multiple pods that share a single GPU.In this repository, I introduce some tricks for GPU sharing pods on Kubernetes with only use of NVIDIA device plugin.
## Prerequisites
- Install [Kubectl](https://kubernetes.io/docs/tasks/tools/)
- Install [Minikube](https://minikube.sigs.k8s.io/docs/start)
- Install [HELM](https://helm.sh/docs/intro/install/)
- Install [Argo Workflows CLI](https://github.com/argoproj/argo-workflows/releases/tag/v3.5.10)## K8s Cluster Creation
```bash
make cluster
kubectl get pods --all-namespaces
# Check `nvidia-device-plugin-daemonset` is running.
```## Argo Workflow Installation
```bash
kubectl create namespace argo
helm install argo-workflows charts/argo-workflows -n argo
# Wait for argo-workflows ready...
make port-forward
```Open http://localhost:2746/
Login with the token:
```bash
kubectl apply -f secret.yaml
kubectl get secret # Check `argo-workflows-admin.service-account-token` created.
make token
# Paste all strings including Bearer.
```Execute a simple workflow for testing:
```bash
argo submit --watch workflows/hello-world.yaml
```## Examples
Create a workflow template that have parallel jobs sharing GPU(s).
```bash
kubectl apply -f workflows/templates/gpu-sharing-workflowtemplate.yaml
```Trigger the gpu allocation and gpu-sharing workflow execution.
```bash
# time slicing with 1 GPU
argo submit --watch workflows/submit-gpu-sharing-workflow.yaml
# time slicing with 2 GPUs
argo submit --watch workflows/submit-gpu-sharing-workflow.yaml -p gpus=2 # 2 gpus
# MPS with 1 GPU
argo submit --watch workflows/submit-gpu-sharing-workflow.yaml -p mps=enabled
```## References
- https://argo-workflows.readthedocs.io/en/latest/walk-through/argo-cli/
- https://argo-workflows.readthedocs.io/en/latest/fields/#workflow
- https://gist.github.com/Curt-Park/bb20f76ba2b052b03b2e1ea9834517a6
- https://docs.nvidia.com/deploy/mps/#starting-and-stopping-mps-on-linux