https://github.com/squat/kubeconeu2018
KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
https://github.com/squat/kubeconeu2018
container-linux gpu kubecon kubenetes nvidia terraform
Last synced: 11 months ago
JSON representation
KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
- Host: GitHub
- URL: https://github.com/squat/kubeconeu2018
- Owner: squat
- License: mit
- Created: 2018-05-02T09:37:26.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-05-15T14:41:55.000Z (over 7 years ago)
- Last Synced: 2025-02-08T16:44:33.969Z (about 1 year ago)
- Topics: container-linux, gpu, kubecon, kubenetes, nvidia, terraform
- Language: HCL
- Size: 20.5 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# KubeCon EU 2018
This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.
[](https://www.youtube.com/watch?v=i6V4KPh_D5g)
[](https://asciinema.org/a/DE7RVqDsHSPjackcPmQwFElaX)
## Prerequisites
You will need a Google Cloud account with available quota for NVIDIA GPUs.
## Getting Started
Edit the `require.tf` Terraform file and uncomment and add the details for your Google Cloud project:
```sh
$EDITOR require.tf
```
Modify the provided `terraform.tfvars` file to suit your project:
```sh
$EDITOR terraform.tfvars
```
## Running
1. create cluster:
```sh
terraform apply --auto-approve
```
2. get nodes:
```sh
export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
watch -n 1 kubectl get nodes
```
3. create GPU manifests:
```sh
kubectl apply -f manifests
```
4. check status of driver installer:
```sh
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
```
5. check status of device plugin:
```sh
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
```
6. verify worker node has allocatable GPUs:
```sh
kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
```
7. let's inspect the GPU workload:
```sh
less manifests/darkapi.yaml
```
8. let's see if the GPU workload has been scheduled:
```sh
watch -n 2 kubectl get pods
kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
```
9. for fun, let's test the GPU workload:
```sh
export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
~/code/darkapi/client http://$INGRESS/api/yolo
```
10. finally, let's clean up:
```sh
terraform destroy --auto-approve
```
## Projects Leveraged In This Demo
| Component | URL |
|:------------------------:|:------------------------------------------------------------------------------------------------------------:|
| Kubernetes installer | https://github.com/poseidon/typhoon |
| GPU driver installer | https://github.com/squat/modulus |
| Kubernetes device plugin | https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml |
| sample workload | https://github.com/squat/darkapi |