An open API service indexing awesome lists of open source software.

https://github.com/Deep-Spark/ix-device-plugin

The IX device plugin is a DaemonSet for Kubernetes, which can help to expose the Iluvatar GPU in the Kubernetes cluster.
https://github.com/Deep-Spark/ix-device-plugin

Last synced: 3 months ago
JSON representation

The IX device plugin is a DaemonSet for Kubernetes, which can help to expose the Iluvatar GPU in the Kubernetes cluster.

Awesome Lists containing this project

README

        

# IX device plugin for Kubernetes

## Table of Contents
- [About](#about)
- [Prerequisites](#prerequisites)
- [Building the IX device plugin](#building-the-ix-device-plugin)
- [Configuring the IX device plugin](#configuring-the-ix-device-plugin)
- [Enabling GPU Support in Kubernetes](#enabling-gpu-support-in-kubernetes)
- [Running GPU Jobs](#running-gpu-jobs)
- [Split GPU Board to Multiple GPU Devices](#split-gpu-board-to-multiple-gpu-devices)
- [Shared Access to GPUs](#shared-access-to-gpus)

## About

The IX device plugin for Kubernetes is a Daemonset that allows you to automatically:
- Expose the number of GPUs on each nodes of your cluster
- Keep track of the health of your GPUs
- Run GPU enabled containers in your Kubernetes cluster.

## Prerequisites

The list of prerequisites for running the IX device plugin is described below:
* Iluvatar driver and software stack >= v1.1.0
* Kubernetes version >= 1.10

## Building the IX device plugin

```shell
make all
```
This will build the ix-device-plugin binary and ix-device-plugin image, see logging for more details.

## Configuring the IX device plugin

The IX device plugin has a number of options that can be configured for it.
These options can be configured via a config file when launching the device plugin. Here we explain what
each of these options are and how to configure them in configmap.
```yaml
# ix-config.yaml
apiVersion: v1
kind: ConfigMap
data:
ix-config: |-
version: "4.2.0"
flags:
splitboard: false
sharing:
timeSlicing:
replicas: 4

metadata:
name: ix-config
namespace: kube-system
```
```shell
kubectl create -f ix-config.yaml
```
| `Field`| `Type ` | `Description` |
|--------|------------------------------|------------------|
| `flags.splitboard` | boolean | Split GPU devices in every board(eg.BI-V150) if `splitboard` is `true`|
| `sharing.timeSlicing.replicas` | integer | Specifies the number of GPU time-slicing ​​replicas for shared access|

## Enabling GPU Support in Kubernetes

Once you have configured the options above on all the GPU nodes in your
cluster, you can enable GPU support by deploying the following Daemonset:
```yaml
# ix-device-plugin.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: iluvatar-device-plugin
namespace: kube-system
labels:
app.kubernetes.io/name: iluvatar-device-plugin
spec:
selector:
matchLabels:
app.kubernetes.io/name: iluvatar-device-plugin
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
app.kubernetes.io/name: iluvatar-device-plugin
spec:
priorityClassName: "system-node-critical"
securityContext:
null
containers:
- name: iluvatar-device-plugin
securityContext:
capabilities:
drop:
- ALL
privileged: true
image: "ix-device-plugin:4.2.0"
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- ls
- /var/lib/kubelet/device-plugins/iluvatar-gpu.sock
periodSeconds: 5
startupProbe:
exec:
command:
- ls
- /var/lib/kubelet/device-plugins/iluvatar-gpu.sock
periodSeconds: 5
resources:
{}
volumeMounts:
- mountPath: /var/lib/kubelet/device-plugins
name: device-plugin
- mountPath: /run/udev
name: udev-ctl
readOnly: true
- mountPath: /sys
name: sys
readOnly: true
- mountPath: /dev
name: dev
- name: ixc
mountPath: /ixconfig
volumes:
- hostPath:
path: /var/lib/kubelet/device-plugins
name: device-plugin
- hostPath:
path: /run/udev
name: udev-ctl
- hostPath:
path: /sys
name: sys
- hostPath:
path: /etc/udev/
name: udev-etc
- hostPath:
path: /dev
name: dev
- name: ixc
configMap:
name: ix-config
```
```shell
kubectl create -f ix-device-plugin.yaml
```

## Running GPU Jobs

GPU can be exposed to a pod by adding `iluvatar.com/gpu` to the pod definition, and you can restrict the GPU resource by adding `resources.limits` to the pod definition.

```yaml
$ cat <
...
```

That is, `sharing.timeSlicing.replicas`, a number of replicas can now be specified. These replicas represent the number of shared accesses that will be granted for a GPU.

For example:

```yaml
version: "4.2.0"
flags:
splitboard: false
sharing:
timeSlicing:
replicas: 4
```

If this configuration were applied to a node with 2 GPUs on it, the plugin
would now advertise 8 `iluvatar.com/gpu` resources to Kubernetes instead of 2.

```
$ kubectl describe node
...
Capacity:
iluvatar.com/gpu: 8
...
```