{"id":19888248,"url":"https://github.com/project-codeflare/instaslice","last_synced_at":"2025-05-02T17:31:57.682Z","repository":{"id":228667682,"uuid":"770968640","full_name":"project-codeflare/instaslice","owner":"project-codeflare","description":"InstaSlice facilitates the use of Dynamic Resource Allocation (DRA) on Kubernetes clusters for GPU sharing","archived":false,"fork":false,"pushed_at":"2024-08-02T17:06:57.000Z","size":140,"stargazers_count":17,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-08-02T19:47:30.857Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/project-codeflare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-12T13:09:35.000Z","updated_at":"2024-08-02T17:07:05.000Z","dependencies_parsed_at":"2024-03-19T22:44:01.120Z","dependency_job_id":null,"html_url":"https://github.com/project-codeflare/instaslice","commit_stats":null,"previous_names":["project-codeflare/instaslice"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Finstaslice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Finstaslice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Finstaslice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Finstaslice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/project-codeflare","download_url":"https://codeload.github.com/project-codeflare/instaslice/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224324498,"owners_count":17292521,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T18:06:44.602Z","updated_at":"2025-05-02T17:31:57.674Z","avatar_url":"https://github.com/project-codeflare.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Note - we have moved to https://github.com/openshift/instaslice-operator\n\n# Note - Kubecon EU 2024 code (DRA code) is now available in the legacy branch\n\n# InstaSlice\n\nExperimental InstaSlice works with GPU operator to create mig slices on demand.\n\n## Getting Started\n\n### Prerequisites\n- [Go](https://go.dev/doc/install) v1.22.0+\n- [Docker](https://docs.docker.com/get-docker/) v17.03+\n- [Docker buildx plugin](https://github.com/docker/buildx) for building cross-platform images.\n- [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) v1.11.3+.\n- Access to a [KinD](https://kind.sigs.k8s.io/docs/user/quick-start/) cluster.\n\n### Install KinD cluster with GPU operator\n\n- Make sure the GPUs on the host have MIG enabled\n\n```sh\n+-----------------------------------------------------------------------------------------+\n| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |\n|-----------------------------------------+------------------------+----------------------+\n| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |\n|                                         |                        |               MIG M. |\n|=========================================+========================+======================|\n|   0  NVIDIA A100-PCIE-40GB          Off |   00000000:0E:00.0 Off |                   On |\n| N/A   36C    P0             33W /  250W |       0MiB /  40960MiB |     N/A      Default |\n|                                         |                        |              Enabled |\n+-----------------------------------------+------------------------+----------------------+\n|   1  NVIDIA A100-PCIE-40GB          Off |   00000000:0F:00.0 Off |                   On |\n| N/A   40C    P0             32W /  250W |       0MiB /  40960MiB |     N/A      Default |\n|                                         |                        |              Enabled |\n+-----------------------------------------+------------------------+----------------------+\n\n+-----------------------------------------------------------------------------------------+\n| MIG devices:                                                                            |\n+------------------+----------------------------------+-----------+-----------------------+\n| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |\n|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |\n|                  |                                  |        ECC|                       |\n|==================+==================================+===========+=======================|\n|  No MIG devices found                                                                   |\n+-----------------------------------------------------------------------------------------+\n\n+-----------------------------------------------------------------------------------------+\n| Processes:                                                                              |\n|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |\n|        ID   ID                                                               Usage      |\n|=========================================================================================|\n|  No running processes found                                                             |\n```\n\n- Run the below script\n```sh\nsh ./deploy/setup.sh\n```\nNOTE: Please check if all the pods in GPU operator are completed or Running before moving to the next step.\n\n```sh\n(base) openstack@netsres62:~/asmalvan/instaslice2$ kubectl get pods -n gpu-operator\nNAME                                                              READY   STATUS      RESTARTS   AGE\ngpu-feature-discovery-578q8                                       1/1     Running     0          102s\ngpu-operator-1714053627-node-feature-discovery-gc-9b857c99phlnn   1/1     Running     0          7m21s\ngpu-operator-1714053627-node-feature-discovery-master-6df78zgsz   1/1     Running     0          7m21s\ngpu-operator-1714053627-node-feature-discovery-worker-47tpx       1/1     Running     0          7m19s\ngpu-operator-54b8bfbfd8-rmzbd                                     1/1     Running     0          7m21s\nnvidia-container-toolkit-daemonset-wkc5h                          1/1     Running     0          6m21s\nnvidia-cuda-validator-cn8lg                                       0/1     Completed   0          88s\nnvidia-dcgm-exporter-h75xg                                        1/1     Running     0          102s\nnvidia-device-plugin-daemonset-452dk                              1/1     Running     0          101s\nnvidia-mig-manager-htt7z                                          1/1     Running     0          2m21s\nnvidia-operator-validator-kh6jf                                   1/1     Running     0          102s\n```\n\n- After all the pods are Running/Completed, run nvidia-smi on the host and check if MIG slices appear on the all the GPUs of the host.\n\n```sh\n(base) openstack@netsres62:~/asmalvan/instaslice2$ nvidia-smi\nThu Apr 25 10:08:24 2024\n+-----------------------------------------------------------------------------------------+\n| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |\n|-----------------------------------------+------------------------+----------------------+\n| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |\n|                                         |                        |               MIG M. |\n|=========================================+========================+======================|\n|   0  NVIDIA A100-PCIE-40GB          Off |   00000000:0E:00.0 Off |                   On |\n| N/A   45C    P0             71W /  250W |      87MiB /  40960MiB |     N/A      Default |\n|                                         |                        |              Enabled |\n+-----------------------------------------+------------------------+----------------------+\n|   1  NVIDIA A100-PCIE-40GB          Off |   00000000:0F:00.0 Off |                   On |\n| N/A   49C    P0             69W /  250W |      87MiB /  40960MiB |     N/A      Default |\n|                                         |                        |              Enabled |\n+-----------------------------------------+------------------------+----------------------+\n\n+-----------------------------------------------------------------------------------------+\n| MIG devices:                                                                            |\n+------------------+----------------------------------+-----------+-----------------------+\n| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |\n|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |\n|                  |                                  |        ECC|                       |\n|==================+==================================+===========+=======================|\n|  0    2   0   0  |              37MiB / 19968MiB    | 42      0 |  3   0    2    0    0 |\n|                  |                 0MiB / 32767MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  0    3   0   1  |              25MiB /  9856MiB    | 28      0 |  2   0    1    0    0 |\n|                  |                 0MiB / 16383MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  0    9   0   2  |              12MiB /  4864MiB    | 14      0 |  1   0    0    0    0 |\n|                  |                 0MiB /  8191MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  0   10   0   3  |              12MiB /  4864MiB    | 14      0 |  1   0    0    0    0 |\n|                  |                 0MiB /  8191MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  1    2   0   0  |              37MiB / 19968MiB    | 42      0 |  3   0    2    0    0 |\n|                  |                 0MiB / 32767MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  1    3   0   1  |              25MiB /  9856MiB    | 28      0 |  2   0    1    0    0 |\n|                  |                 0MiB / 16383MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  1    9   0   2  |              12MiB /  4864MiB    | 14      0 |  1   0    0    0    0 |\n|                  |                 0MiB /  8191MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  1   10   0   3  |              12MiB /  4864MiB    | 14      0 |  1   0    0    0    0 |\n|                  |                 0MiB /  8191MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n\n+-----------------------------------------------------------------------------------------+\n| Processes:                                                                              |\n|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |\n|        ID   ID                                                               Usage      |\n|=========================================================================================|\n|  No running processes found                                                             |\n+-----------------------------------------------------------------------------------------+\n(base) openstack@netsres62:~/asmalvan/instaslice2$\n```\n\n\n- Delete mig slices using the commmand\n\n```sh\nsudo nvidia-smi mig -dci \u0026\u0026 sudo nvidia-smi mig -dgi\n\nuccessfully destroyed compute instance ID  0 from GPU  0 GPU instance ID  9\nSuccessfully destroyed compute instance ID  0 from GPU  0 GPU instance ID 10\nSuccessfully destroyed compute instance ID  0 from GPU  0 GPU instance ID  3\nSuccessfully destroyed compute instance ID  0 from GPU  0 GPU instance ID  2\nSuccessfully destroyed compute instance ID  0 from GPU  1 GPU instance ID  9\nSuccessfully destroyed compute instance ID  0 from GPU  1 GPU instance ID 10\nSuccessfully destroyed compute instance ID  0 from GPU  1 GPU instance ID  3\nSuccessfully destroyed compute instance ID  0 from GPU  1 GPU instance ID  2\nSuccessfully destroyed GPU instance ID  9 from GPU  0\nSuccessfully destroyed GPU instance ID 10 from GPU  0\nSuccessfully destroyed GPU instance ID  3 from GPU  0\nSuccessfully destroyed GPU instance ID  2 from GPU  0\nSuccessfully destroyed GPU instance ID  9 from GPU  1\nSuccessfully destroyed GPU instance ID 10 from GPU  1\nSuccessfully destroyed GPU instance ID  3 from GPU  1\nSuccessfully destroyed GPU instance ID  2 from GPU  1\n```\n\n- Create placeholder slice to make k8s-device-plugin happy using the command\n\n```sh\nsudo nvidia-smi mig -cgi 3g.20gb -C\nSuccessfully created GPU instance ID  2 on GPU  0 using profile MIG 3g.20gb (ID  9)\nSuccessfully created compute instance ID  0 on GPU  0 GPU instance ID  2 using profile MIG 3g.20gb (ID  2)\nSuccessfully created GPU instance ID  2 on GPU  1 using profile MIG 3g.20gb (ID  9)\nSuccessfully created compute instance ID  0 on GPU  1 GPU instance ID  2 using profile MIG 3g.20gb (ID  2)\n```\n\n- Run the below command to patch device plugin with configmap created by the setup script. For OpenShift replace clusterpolicies.nvidia.com/cluster-policy to clusterpolicies.nvidia.com/gpu-cluster-policy and namespace to nvidia-gpu-operator\n\n```sh\n(base) openstack@netsres62:~/asmalvan/instaslice2$ kubectl patch clusterpolicies.nvidia.com/cluster-policy     -n gpu-operator --type merge     -p '{\"spec\": {\"devicePlugin\": {\"config\": {\"name\": \"test\"}}}}'\n```\n\nYou are now all set to dynamically create slices on the cluster using InstaSlice.\n\n### Running the controller\n\n- Refer to section `To Deploy on the cluster`\n\n### Submitting the workload\n\n- Submit a sample workload using the command\n\n```sh\nkubectl apply -f ./samples/test-pod.yaml\npod/cuda-vectoradd-5 created\n```\n\n- check the status of the workload using commands\n\n```sh\nkubectl get pods\nNAME               READY   STATUS    RESTARTS   AGE\ncuda-vectoradd-5   1/1     Running   0          15s\nkubectl logs cuda-vectoradd-5\nGPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-31cfe05c-ed13-cd17-d7aa-c63db5108c24)\n  MIG 1g.5gb      Device  0: (UUID: MIG-c5720b34-e550-5278-90e6-d99a979aafd1)\n[Vector addition of 50000 elements]\nCopy input data from the host memory to the CUDA device\nCUDA kernel launch with 196 blocks of 256 threads\nCopy output data from the CUDA device to the host memory\nTest PASSED\nDone\n\n+-----------------------------------------------------------------------------------------+\n| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |\n|-----------------------------------------+------------------------+----------------------+\n| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |\n|                                         |                        |               MIG M. |\n|=========================================+========================+======================|\n|   0  NVIDIA A100-PCIE-40GB          Off |   00000000:0E:00.0 Off |                   On |\n| N/A   52C    P0             75W /  250W |      50MiB /  40960MiB |     N/A      Default |\n|                                         |                        |              Enabled |\n+-----------------------------------------+------------------------+----------------------+\n|   1  NVIDIA A100-PCIE-40GB          Off |   00000000:0F:00.0 Off |                   On |\n| N/A   60C    P0             75W /  250W |      37MiB /  40960MiB |     N/A      Default |\n|                                         |                        |              Enabled |\n+-----------------------------------------+------------------------+----------------------+\n\n+-----------------------------------------------------------------------------------------+\n| MIG devices:                                                                            |\n+------------------+----------------------------------+-----------+-----------------------+\n| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |\n|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |\n|                  |                                  |        ECC|                       |\n|==================+==================================+===========+=======================|\n|  0    2   0   0  |              37MiB / 19968MiB    | 42      0 |  3   0    2    0    0 |\n|                  |                 0MiB / 32767MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  0   10   0   1  |              12MiB /  4864MiB    | 14      0 |  1   0    0    0    0 |\n|                  |                 0MiB /  8191MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  1    2   0   0  |              37MiB / 19968MiB    | 42      0 |  3   0    2    0    0 |\n|                  |                 0MiB / 32767MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n\n+-----------------------------------------------------------------------------------------+\n| Processes:                                                                              |\n|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |\n|        ID   ID                                                               Usage      |\n|=========================================================================================|\n|  No running processes found                                                             |\n+-----------------------------------------------------------------------------------------+\n\n```\n### Deleting the workload\n\n- Delete the pod and see the newly created MIG slice deleted\n\n```sh\nkubectl delete pod cuda-vectoradd-5\n\n+-----------------------------------------------------------------------------------------+\n| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |\n|-----------------------------------------+------------------------+----------------------+\n| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |\n|                                         |                        |               MIG M. |\n|=========================================+========================+======================|\n|   0  NVIDIA A100-PCIE-40GB          Off |   00000000:0E:00.0 Off |                   On |\n| N/A   53C    P0             75W /  250W |      37MiB /  40960MiB |     N/A      Default |\n|                                         |                        |              Enabled |\n+-----------------------------------------+------------------------+----------------------+\n|   1  NVIDIA A100-PCIE-40GB          Off |   00000000:0F:00.0 Off |                   On |\n| N/A   60C    P0             75W /  250W |      37MiB /  40960MiB |     N/A      Default |\n|                                         |                        |              Enabled |\n+-----------------------------------------+------------------------+----------------------+\n\n+-----------------------------------------------------------------------------------------+\n| MIG devices:                                                                            |\n+------------------+----------------------------------+-----------+-----------------------+\n| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |\n|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |\n|                  |                                  |        ECC|                       |\n|==================+==================================+===========+=======================|\n|  0    2   0   0  |              37MiB / 19968MiB    | 42      0 |  3   0    2    0    0 |\n|                  |                 0MiB / 32767MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n|  1    2   0   0  |              37MiB / 19968MiB    | 42      0 |  3   0    2    0    0 |\n|                  |                 0MiB / 32767MiB  |           |                       |\n+------------------+----------------------------------+-----------+-----------------------+\n\n+-----------------------------------------------------------------------------------------+\n| Processes:                                                                              |\n|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |\n|        ID   ID                                                               Usage      |\n|=========================================================================================|\n|  No running processes found                                                             |\n+-----------------------------------------------------------------------------------------+\n\n```\n\n### To Deploy on the cluster\n\n**All in one command**\n\nmake docker-build \u0026\u0026 make docker-push \u0026\u0026 make deploy\n\nCross-platform or multi-arch images can be built and pushed using\n`make docker-buildx`. When using Docker as your container tool, make\nsure to create a builder instance. Refer to\n[Multi-platform images](https://docs.docker.com/build/building/multi-platform/)\nfor documentation on building mutli-platform images with Docker.\n\nYou can change the destination platform(s) by\nsetting `PLATFORMS`, e.g.\n\n```sh\nPLATFORMS=linux/arm64,linux/amd64 make docker-buildx\n```\n\n**Build and push your image to the location specified by `IMG`:**\n\n```sh\nmake docker-build docker-push IMG=\u003csome-registry\u003e/instaslice:tag\n```\n\n**NOTE:** This image ought to be published in the personal registry you specified.\nAnd it is required to have access to pull the image from the working environment.\nMake sure you have the proper permission to the registry if the above commands don’t work.\n\n**Install the CRDs into the cluster:**\n\n```sh\nmake install\n```\n\n**Deploy the Manager to the cluster with the image specified by `IMG`:**\n\n```sh\nmake deploy IMG=\u003csome-registry\u003e/instaslice:tag\n```\n\n\u003e **NOTE**: If you encounter RBAC errors, you may need to grant yourself cluster-admin\nprivileges or be logged in as admin.\n\n**Create instances of your solution**\nYou can apply the samples (examples) from the config/sample:\n\n```sh\nkubectl apply -k config/samples/\n```\n\n\u003e**NOTE**: Ensure that the samples has default values to test it out.\n\n### To Uninstall\n**Delete the instances (CRs) from the cluster:**\n\n```sh\nkubectl delete -k config/samples/\n```\n\n**Delete the APIs(CRDs) from the cluster:**\n\n```sh\nmake uninstall\n```\n\n**UnDeploy the controller from the cluster:**\n\n```sh\nmake undeploy\n```\n\n## Project Distribution\n\nFollowing are the steps to build the installer and distribute this project to users.\n\n1. Build the installer for the image built and published in the registry:\n\n```sh\nmake build-installer IMG=\u003csome-registry\u003e/instaslice:tag\n```\n\nNOTE: The makefile target mentioned above generates an 'install.yaml'\nfile in the dist directory. This file contains all the resources built\nwith Kustomize, which are necessary to install this project without\nits dependencies.\n\n2. Using the installer\n\nUsers can just run kubectl apply -f \u003cURL for YAML BUNDLE\u003e to install the project, i.e.:\n\n```sh\nkubectl apply -f https://raw.githubusercontent.com/\u003corg\u003e/instaslice/\u003ctag or branch\u003e/dist/install.yaml\n```\n\n## Contributing\n// TODO(user): Add detailed information on how you would like others to contribute to this project\n\n**NOTE:** Run `make help` for more information on all potential `make` targets\n\nMore information can be found via the [Kubebuilder Documentation](https://book.kubebuilder.io/introduction.html)\n\n## License\n\nCopyright 2024.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-codeflare%2Finstaslice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fproject-codeflare%2Finstaslice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-codeflare%2Finstaslice/lists"}