Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/squat/kubeconeu2018

KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
https://github.com/squat/kubeconeu2018

container-linux gpu kubecon kubenetes nvidia terraform

Last synced: about 2 months ago
JSON representation

KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux

Awesome Lists containing this project

README

        

# KubeCon EU 2018

This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.

[![youtube](https://img.youtube.com/vi/i6V4KPh_D5g/0.jpg)](https://www.youtube.com/watch?v=i6V4KPh_D5g)
[![asciicast](https://asciinema.org/a/DE7RVqDsHSPjackcPmQwFElaX.png)](https://asciinema.org/a/DE7RVqDsHSPjackcPmQwFElaX)

## Prerequisites

You will need a Google Cloud account with available quota for NVIDIA GPUs.

## Getting Started

Edit the `require.tf` Terraform file and uncomment and add the details for your Google Cloud project:

```sh
$EDITOR require.tf
```

Modify the provided `terraform.tfvars` file to suit your project:

```sh
$EDITOR terraform.tfvars
```

## Running

1. create cluster:

```sh
terraform apply --auto-approve
```

2. get nodes:

```sh
export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
watch -n 1 kubectl get nodes
```

3. create GPU manifests:

```sh
kubectl apply -f manifests
```

4. check status of driver installer:

```sh
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
```

5. check status of device plugin:

```sh
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
```

6. verify worker node has allocatable GPUs:

```sh
kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
```

7. let's inspect the GPU workload:

```sh
less manifests/darkapi.yaml
```

8. let's see if the GPU workload has been scheduled:

```sh
watch -n 2 kubectl get pods
kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
```

9. for fun, let's test the GPU workload:

```sh
export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
~/code/darkapi/client http://$INGRESS/api/yolo
```

10. finally, let's clean up:

```sh
terraform destroy --auto-approve
```

## Projects Leveraged In This Demo

| Component | URL |
|:------------------------:|:------------------------------------------------------------------------------------------------------------:|
| Kubernetes installer | https://github.com/poseidon/typhoon |
| GPU driver installer | https://github.com/squat/modulus |
| Kubernetes device plugin | https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml |
| sample workload | https://github.com/squat/darkapi |