Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/squat/kubeconeu2018

KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
https://github.com/squat/kubeconeu2018

container-linux gpu kubecon kubenetes nvidia terraform

Last synced: about 2 months ago
JSON representation

KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux

Host: GitHub
URL: https://github.com/squat/kubeconeu2018
Owner: squat
License: mit
Created: 2018-05-02T09:37:26.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2018-05-15T14:41:55.000Z (over 6 years ago)
Last Synced: 2024-10-28T15:50:47.559Z (3 months ago)
Topics: container-linux, gpu, kubecon, kubenetes, nvidia, terraform
Language: HCL
Size: 20.5 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# KubeCon EU 2018

This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.

[![youtube](https://img.youtube.com/vi/i6V4KPh_D5g/0.jpg)](https://www.youtube.com/watch?v=i6V4KPh_D5g)
[![asciicast](https://asciinema.org/a/DE7RVqDsHSPjackcPmQwFElaX.png)](https://asciinema.org/a/DE7RVqDsHSPjackcPmQwFElaX)

## Prerequisites

You will need a Google Cloud account with available quota for NVIDIA GPUs.

## Getting Started

Edit the `require.tf` Terraform file and uncomment and add the details for your Google Cloud project:

```sh
$EDITOR require.tf
```

Modify the provided `terraform.tfvars` file to suit your project:

```sh
$EDITOR terraform.tfvars
```

## Running

1. create cluster:

```sh
terraform apply --auto-approve
```

2. get nodes:

```sh
export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
watch -n 1 kubectl get nodes
```

3. create GPU manifests:

```sh
kubectl apply -f manifests
```

4. check status of driver installer:

```sh
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
```

5. check status of device plugin:

```sh
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
```

6. verify worker node has allocatable GPUs:

```sh
kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
```

7. let's inspect the GPU workload:

```sh
less manifests/darkapi.yaml
```

8. let's see if the GPU workload has been scheduled:

```sh
watch -n 2 kubectl get pods
kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
```

9. for fun, let's test the GPU workload:

```sh
export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
~/code/darkapi/client http://$INGRESS/api/yolo
```

10. finally, let's clean up:

```sh
terraform destroy --auto-approve
```

## Projects Leveraged In This Demo