https://github.com/j3soon/docker-mig-manager
Unofficial minimal instructions for managing NVIDIA Multi-Instance GPU (MIG) in a docker container
https://github.com/j3soon/docker-mig-manager
docker multi-instance-gpu nvidia-container-toolkit nvidia-docker
Last synced: 28 days ago
JSON representation
Unofficial minimal instructions for managing NVIDIA Multi-Instance GPU (MIG) in a docker container
- Host: GitHub
- URL: https://github.com/j3soon/docker-mig-manager
- Owner: j3soon
- Created: 2024-09-24T17:13:45.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-01-16T18:06:35.000Z (12 months ago)
- Last Synced: 2025-01-16T19:27:42.495Z (12 months ago)
- Topics: docker, multi-instance-gpu, nvidia-container-toolkit, nvidia-docker
- Homepage:
- Size: 1000 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Docker MIG Manager
Unofficial minimal docker instructions for managing NVIDIA Multi-Instance GPU (MIG) in containers.
Prerequisites:
- [NVIDIA Driver](https://ubuntu.com/server/docs/nvidia-drivers-installation)
- [Docker](https://docs.docker.com/engine/install/ubuntu/)
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
- [Enable MIG Mode](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#enable-mig-mode)
Take A100 as an example, run:
```sh
docker run --rm -it --gpus all \
--cap-add=SYS_ADMIN \
-e NVIDIA_MIG_CONFIG_DEVICES=all \
ubuntu
# in the container
# Create two `3g.20gb` GPU instances (GI) and corresponding compute instances (CI)
nvidia-smi mig -cgi 9,3g.20gb -C
# List the available CIs and GIs
nvidia-smi mig -lgi; nvidia-smi mig -lci;
# Destroy all the CIs and GIs
nvidia-smi mig -dci; nvidia-smi mig -dgi;
```
This should also work on A30/H100/H200 by substituting the MIG profile to [a supported one](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-mig-profiles).
Note: `--runtime=nvidia`, `-e NVIDIA_VISIBLE_DEVICES=all`, and `-e NVIDIA_DRIVER_CAPABILITIES=all` may be required depending on your environment and use cases.
## References
Some references I found useful during the investigation.
- [MIG User Guide](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/)
- [9.4. Creating GPU Instances](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#creating-gpu-instances)
- [9.6. Destroying GPU Instances](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#destroying-gpu-instances)
- [10. Device Nodes and Capabilities](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#device-nodes-and-capabilities)
- [The NVIDIA Container Runtime](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#nvidia_mig_config_devices)
- [`NVIDIA_MIG_CONFIG_DEVICES`](https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime#nvidia_mig_config_devices)
- [NVIDIA MIG Manager For Kubernetes](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/cloud-native/containers/k8s-mig-manager/layers)
## Acknowledgement
Thanks [@Irene-Ting](https://github.com/Irene-Ting) for discussions.