{"id":13560703,"url":"https://github.com/Deep-Spark/ix-device-plugin","last_synced_at":"2025-04-03T16:31:03.867Z","repository":{"id":233723427,"uuid":"787737662","full_name":"Deep-Spark/ix-device-plugin","owner":"Deep-Spark","description":"The IX device plugin is a DaemonSet for Kubernetes, which can help to expose the Iluvatar GPU in the Kubernetes cluster.","archived":false,"fork":false,"pushed_at":"2025-01-07T16:38:55.000Z","size":2678,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-07T17:53:34.608Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Deep-Spark.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-17T04:55:53.000Z","updated_at":"2025-01-07T16:38:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"ab873305-fc01-47c8-8771-55bd2db9fc63","html_url":"https://github.com/Deep-Spark/ix-device-plugin","commit_stats":null,"previous_names":["deep-spark/ix-device-plugin"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Spark%2Fix-device-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Spark%2Fix-device-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Spark%2Fix-device-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Spark%2Fix-device-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Deep-Spark","download_url":"https://codeload.github.com/Deep-Spark/ix-device-plugin/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247036936,"owners_count":20873053,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:00:48.838Z","updated_at":"2025-04-03T16:31:03.860Z","avatar_url":"https://github.com/Deep-Spark.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# IX device plugin for Kubernetes\n\n## Table of Contents\n- [About](#about)\n- [Prerequisites](#prerequisites)\n- [Building the IX device plugin](#building-the-ix-device-plugin)\n- [Configuring the IX device plugin](#configuring-the-ix-device-plugin)\n- [Enabling GPU Support in Kubernetes](#enabling-gpu-support-in-kubernetes)\n- [Running GPU Jobs](#running-gpu-jobs)\n- [Split GPU Board to Multiple GPU Devices](#split-gpu-board-to-multiple-gpu-devices)\n- [Shared Access to GPUs](#shared-access-to-gpus)\n\n## About\n\nThe IX device plugin for Kubernetes is a Daemonset that allows you to automatically:\n- Expose the number of GPUs on each nodes of your cluster\n- Keep track of the health of your GPUs\n- Run GPU enabled containers in your Kubernetes cluster.\n\n## Prerequisites\n\nThe list of prerequisites for running the IX device plugin is described below:\n* Iluvatar driver and software stack \u003e= v1.1.0\n* Kubernetes version \u003e= 1.10\n\n## Building the IX device plugin\n\n```shell\nmake all\n```\nThis will build the ix-device-plugin binary and ix-device-plugin image, see logging for more details.\n\n## Configuring the IX device plugin\n\nThe IX device plugin has a number of options that can be configured for it.\nThese options can be configured via a config file when launching the device plugin. Here we explain what\neach of these options are and how to configure them in configmap.\n```yaml\n# ix-config.yaml\napiVersion: v1\nkind: ConfigMap\ndata:\nix-config: |-\n    version: \"4.2.0\"\n    flags:\n      splitboard: false\n    sharing:\n      timeSlicing:\n          replicas: 4 \n\nmetadata:\n  name: ix-config\n  namespace: kube-system\n```\n```shell\nkubectl create -f ix-config.yaml\n```\n| `Field`|        `Type `               |   `Description` |\n|--------|------------------------------|------------------|\n| `flags.splitboard`       | boolean  | Split GPU devices in every board(eg.BI-V150) if `splitboard` is `true`|\n| `sharing.timeSlicing.replicas`       | integer  | Specifies the number of GPU time-slicing ​​replicas for shared access|\n\n## Enabling GPU Support in Kubernetes\n\nOnce you have configured the options above on all the GPU nodes in your\ncluster, you can enable GPU support by deploying the following Daemonset:\n```yaml\n# ix-device-plugin.yaml\napiVersion: apps/v1\nkind: DaemonSet\nmetadata:\n  name: iluvatar-device-plugin\n  namespace: kube-system\n  labels:\n    app.kubernetes.io/name: iluvatar-device-plugin\nspec:\n  selector:\n    matchLabels:\n      app.kubernetes.io/name: iluvatar-device-plugin\n  template:\n    metadata:\n      annotations:\n        scheduler.alpha.kubernetes.io/critical-pod: \"\"\n      labels:\n        app.kubernetes.io/name: iluvatar-device-plugin\n    spec:\n      priorityClassName: \"system-node-critical\"\n      securityContext:\n        null\n      containers:\n        - name: iluvatar-device-plugin\n          securityContext:\n            capabilities:\n              drop:\n              - ALL\n            privileged: true\n          image: \"ix-device-plugin:4.2.0\"\n          imagePullPolicy: IfNotPresent\n          livenessProbe:\n            exec:\n              command:\n              - ls\n              - /var/lib/kubelet/device-plugins/iluvatar-gpu.sock\n            periodSeconds: 5\n          startupProbe:\n            exec:\n              command:\n              - ls\n              - /var/lib/kubelet/device-plugins/iluvatar-gpu.sock\n            periodSeconds: 5\n          resources:\n            {}\n          volumeMounts:\n            - mountPath: /var/lib/kubelet/device-plugins\n              name: device-plugin\n            - mountPath: /run/udev\n              name: udev-ctl\n              readOnly: true\n            - mountPath: /sys\n              name: sys\n              readOnly: true\n            - mountPath: /dev\n              name: dev\n            - name: ixc\n              mountPath: /ixconfig\n      volumes:\n        - hostPath:\n            path: /var/lib/kubelet/device-plugins\n          name: device-plugin\n        - hostPath:\n            path: /run/udev\n          name: udev-ctl\n        - hostPath:\n            path: /sys\n          name: sys\n        - hostPath:\n            path: /etc/udev/\n          name: udev-etc\n        - hostPath:\n            path: /dev\n          name: dev\n        - name: ixc\n          configMap:\n              name: ix-config\n```\n```shell\nkubectl create -f ix-device-plugin.yaml\n```\n\n## Running GPU Jobs\n\nGPU can be exposed to a pod by adding `iluvatar.com/gpu` to the pod definition, and you can restrict the GPU resource by adding `resources.limits` to the pod definition.\n\n```yaml\n$ cat \u003c\u003cEOF | kubectl apply -f -\napiVersion: v1\nkind: Pod\nmetadata:\n  name: corex-example\nspec:\n  containers:\n  - name: corex-example\n    image: corex:4.0.0\n    command: [\"/usr/local/corex/bin/ixsmi\"]\n    args: [\"-l\"]\n    resources:\n      limits:\n        iluvatar.com/gpu: 1 # requesting 1 GPUs\nEOF\n```\n\n```shell\nkubectl logs corex-example\n+-----------------------------------------------------------------------------+\n|  IX-ML: 4.0.0       Driver Version: 4.1.0       CUDA Version: N/A           |\n|-------------------------------+----------------------+----------------------|\n| GPU  Name                     | Bus-Id               | Clock-SM  Clock-Mem  |\n| Fan  Temp  Perf  Pwr:Usage/Cap|      Memory-Usage    | GPU-Util  Compute M. |\n|===============================+======================+======================|\n| 0    Iluvatar BI-V150S        | 00000000:8A:00.0     | 500MHz    1600MHz    |\n| 0%   33C   P0    N/A / N/A    | 114MiB / 32768MiB    | 0%        Default    |\n+-------------------------------+----------------------+----------------------+\n\n+-----------------------------------------------------------------------------+\n| Processes:                                                       GPU Memory |\n|  GPU        PID      Process name                                Usage(MiB) |\n|=============================================================================|\n|  No running processes found                                                 |\n+-----------------------------------------------------------------------------+\n```\n\n## Split GPU Board to Multiple GPU Devices\n\nThe IX device plugin allows splitting one GPU board into multiple GPU Devices through a set of\nextended options in its configuration file. \n\n### With SplitBoard\n\nThe extended options for splitting board can be seen below:\n\n```yaml\nversion: \"4.2.0\"\nflags:\n    splitboard: false\n```\n\nThat is, `flags.splitboard`, a boolean flag can now be specified. If this flag is set to true, the plugin will split the GPU board into multiple GPUs and\nkubelet will advertise multiple `iluvatar.com/gpu` resources to Kubernetes instead of 1 for one GPU board. Otherwise, the plugin will advertise only 1 `iluvatar.com/gpu` resource for one GPU board.\n\nFor example:\n\n```yaml\nversion: \"4.2.0\"\nflags:\n    splitboard: true\n```\n\nIf this configuration were applied to a node with 1 GPUs(eg. Bi-V150, which has 2 GPU chips on it) on it, the plugin\nwould now advertise 2 `iluvatar.com/gpu` resources to Kubernetes instead of 1.\n\n```\n$ kubectl describe node\n...\nCapacity:\n  iluvatar.com/gpu: 2\n...\n```\n\n## Shared Access to GPUs\n\nThe IX device plugin allows oversubscription of GPUs through a set of\nextended options in its configuration file. \n\n### With Time-Slicing\n\nThe extended options for sharing using time-slicing can be seen below:\n\n```yaml\nversion: \"4.2.0\"\nsharing:\n    timeSlicing:\n        replicas: \u003cnum-replicas\u003e\n    ...\n```\n\nThat is, `sharing.timeSlicing.replicas`, a number of replicas can now be specified. These replicas represent the number of shared accesses that will be granted for a GPU.\n\nFor example:\n\n```yaml\nversion: \"4.2.0\"\nflags:\n    splitboard: false\nsharing:\n    timeSlicing:\n        replicas: 4\n```\n\nIf this configuration were applied to a node with 2 GPUs on it, the plugin\nwould now advertise 8 `iluvatar.com/gpu` resources to Kubernetes instead of 2.\n\n```\n$ kubectl describe node\n...\nCapacity:\n  iluvatar.com/gpu: 8\n...\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDeep-Spark%2Fix-device-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDeep-Spark%2Fix-device-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDeep-Spark%2Fix-device-plugin/lists"}