{"id":17366222,"url":"https://github.com/letmutx/nomad-nvidia-vgpu-plugin","last_synced_at":"2025-07-30T10:37:11.604Z","repository":{"id":65198166,"uuid":"494766052","full_name":"letmutx/nomad-nvidia-vgpu-plugin","owner":"letmutx","description":"Nomad plugin for sharing Nvidia GPU across multiple jobs","archived":false,"fork":false,"pushed_at":"2024-12-22T02:32:32.000Z","size":121,"stargazers_count":5,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-01T12:17:15.350Z","etag":null,"topics":["gpu","nomad","nvidia"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/letmutx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-21T11:38:50.000Z","updated_at":"2025-02-14T20:34:09.000Z","dependencies_parsed_at":"2025-01-13T02:22:14.196Z","dependency_job_id":null,"html_url":"https://github.com/letmutx/nomad-nvidia-vgpu-plugin","commit_stats":{"total_commits":12,"total_committers":3,"mean_commits":4.0,"dds":0.5,"last_synced_commit":"10e9e55080b47768bdf8896707e0489ded744599"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/letmutx%2Fnomad-nvidia-vgpu-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/letmutx%2Fnomad-nvidia-vgpu-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/letmutx%2Fnomad-nvidia-vgpu-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/letmutx%2Fnomad-nvidia-vgpu-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/letmutx","download_url":"https://codeload.github.com/letmutx/nomad-nvidia-vgpu-plugin/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251871614,"owners_count":21657479,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gpu","nomad","nvidia"],"created_at":"2024-10-15T21:34:52.920Z","updated_at":"2025-05-01T12:17:21.655Z","avatar_url":"https://github.com/letmutx.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"Nomad Nvidia Virtual Device Plugin\n==================\n\nThis repo contains a device plugin for [Nomad](https://www.nomadproject.io/) to support exposing a number of virtual GPUs for each physical GPU present on the machine. This enables running workloads which don't consume the whole GPU.\n\nInstallation requirements\n-----------------------\n\nThis plugin needs the following dependencies to function:\n\n* [Nomad](https://www.nomadproject.io/downloads.html) 0.9+\n* GNU/Linux x86_64 with kernel version \u003e 3.10\n* NVIDIA GPU with Architecture \u003e Fermi (2.1)\n* NVIDIA drivers \u003e= 340.29 with binary nvidia-smi\n* Docker v19.03+\n\nCopy the plugin binary to the [plugins directory](https://www.nomadproject.io/docs/configuration/index.html#plugin_dir) and [configure the plugin](https://www.nomadproject.io/docs/configuration/plugin.html) in the client config. Also, see the requirements for the official [nvidia-plugin](https://www.nomadproject.io/plugins/devices/nvidia#installation-requirements).\n\n```hcl\nplugin \"nvidia-vgpu\" {\n  config {\n    ignored_gpu_ids    = [\"uuid1\", \"uuid2\"]\n    fingerprint_period = \"5s\"\n    vgpus = 16\n  }\n}\n```\n\nUsage\n--------------\n\nUse the [device stanza](https://www.nomadproject.io/docs/job-specification/device.html) in the job file to schedule with device support.\n\n```hcl\njob \"gpu-test\" {\n  datacenters = [\"dc1\"]\n  type = \"batch\"\n\n  group \"smi\" {\n    task \"smi\" {\n      driver = \"docker\"\n\n      config {\n        image = \"nvidia/cuda:11.0-base\"\n        command = \"nvidia-smi\"\n      }\n\n      resources {\n        device \"letmutx/gpu\" {\n          count = 1\n\n          # Add an affinity for a particular model\n          affinity {\n            attribute = \"${device.model}\"\n            value     = \"Tesla K80\"\n            weight    = 50\n          }\n        }\n      }\n    }\n  }\n}\n```\n\nNotes\n-------\n\n* GPU memory allocation/usage is handled in a cooperative manner. This means that one bad GPU process using more memory than assigned can cause starvation for other processes.\n* Managing memory isolation per task is left to the user. It depends on a lot of factors like [MPS](https://docs.nvidia.com/deploy/mps/index.html#topic_3_3_3), GPU architecture etc. [This doc](https://drops.dagstuhl.de/opus/volltexte/2018/8984/pdf/LIPIcs-ECRTS-2018-20.pdf) has some information.\n\nTesting\n---------\nThe best way to test the plugin is to go to a target machine with Nvidia GPU and run the plugin using Nomad's [plugin launcher](https://github.com/hashicorp/nomad/blob/main/plugins/shared/cmd/launcher/README.md) with:\n\n```shell\nmake eval\n```\n\nInspired by\n--------------\n\n* https://github.com/awslabs/aws-virtual-gpu-device-plugin\n* https://github.com/kubernetes/kubernetes/issues/52757#issuecomment-402772200\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fletmutx%2Fnomad-nvidia-vgpu-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fletmutx%2Fnomad-nvidia-vgpu-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fletmutx%2Fnomad-nvidia-vgpu-plugin/lists"}