Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/squat/kubeconeu2018
KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
https://github.com/squat/kubeconeu2018
container-linux gpu kubecon kubenetes nvidia terraform
Last synced: about 2 months ago
JSON representation
KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
- Host: GitHub
- URL: https://github.com/squat/kubeconeu2018
- Owner: squat
- License: mit
- Created: 2018-05-02T09:37:26.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-05-15T14:41:55.000Z (over 6 years ago)
- Last Synced: 2024-10-28T15:50:47.559Z (3 months ago)
- Topics: container-linux, gpu, kubecon, kubenetes, nvidia, terraform
- Language: HCL
- Size: 20.5 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# KubeCon EU 2018
This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.
[![youtube](https://img.youtube.com/vi/i6V4KPh_D5g/0.jpg)](https://www.youtube.com/watch?v=i6V4KPh_D5g)
[![asciicast](https://asciinema.org/a/DE7RVqDsHSPjackcPmQwFElaX.png)](https://asciinema.org/a/DE7RVqDsHSPjackcPmQwFElaX)## Prerequisites
You will need a Google Cloud account with available quota for NVIDIA GPUs.
## Getting Started
Edit the `require.tf` Terraform file and uncomment and add the details for your Google Cloud project:
```sh
$EDITOR require.tf
```Modify the provided `terraform.tfvars` file to suit your project:
```sh
$EDITOR terraform.tfvars
```## Running
1. create cluster:
```sh
terraform apply --auto-approve
```2. get nodes:
```sh
export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
watch -n 1 kubectl get nodes
```3. create GPU manifests:
```sh
kubectl apply -f manifests
```4. check status of driver installer:
```sh
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
```5. check status of device plugin:
```sh
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
```6. verify worker node has allocatable GPUs:
```sh
kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
```7. let's inspect the GPU workload:
```sh
less manifests/darkapi.yaml
```8. let's see if the GPU workload has been scheduled:
```sh
watch -n 2 kubectl get pods
kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
```9. for fun, let's test the GPU workload:
```sh
export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
~/code/darkapi/client http://$INGRESS/api/yolo
```10. finally, let's clean up:
```sh
terraform destroy --auto-approve
```## Projects Leveraged In This Demo
| Component | URL |
|:------------------------:|:------------------------------------------------------------------------------------------------------------:|
| Kubernetes installer | https://github.com/poseidon/typhoon |
| GPU driver installer | https://github.com/squat/modulus |
| Kubernetes device plugin | https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml |
| sample workload | https://github.com/squat/darkapi |