{"id":28254714,"url":"https://github.com/nexusgpu/vgpu.rs","last_synced_at":"2026-02-25T07:19:26.326Z","repository":{"id":287312225,"uuid":"906011038","full_name":"NexusGPU/vgpu.rs","owner":"NexusGPU","description":"vgpu.rs is the fractional GPU \u0026 vgpu-hypervisor implementation written in Rust","archived":false,"fork":false,"pushed_at":"2026-01-30T07:14:26.000Z","size":1492,"stargazers_count":30,"open_issues_count":2,"forks_count":9,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-01-31T00:41:00.007Z","etag":null,"topics":["ai","ai-infra","fractional-gpu","gpu-utilization","nvidia","vgpu","vgpu-hypervisor","virtual-gpu"],"latest_commit_sha":null,"homepage":"https://tensor-fusion.ai","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NexusGPU.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2024-12-20T01:16:41.000Z","updated_at":"2026-01-30T07:14:02.000Z","dependencies_parsed_at":"2025-05-06T07:28:38.242Z","dependency_job_id":"c634e5ca-5a05-45db-9941-7be9d43e00c3","html_url":"https://github.com/NexusGPU/vgpu.rs","commit_stats":null,"previous_names":["nexusgpu/vgpu.rs"],"tags_count":519,"template":false,"template_full_name":null,"purl":"pkg:github/NexusGPU/vgpu.rs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fvgpu.rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fvgpu.rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fvgpu.rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fvgpu.rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NexusGPU","download_url":"https://codeload.github.com/NexusGPU/vgpu.rs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fvgpu.rs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29007372,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-02T06:37:10.400Z","status":"ssl_error","status_checked_at":"2026-02-02T06:37:09.383Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-infra","fractional-gpu","gpu-utilization","nvidia","vgpu","vgpu-hypervisor","virtual-gpu"],"created_at":"2025-05-19T20:15:19.640Z","updated_at":"2026-02-02T07:58:19.511Z","avatar_url":"https://github.com/NexusGPU.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# vgpu.rs\n\n[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2FNexusGPU%2Fvgpu.rs.svg?type=shield\u0026issueType=license)](https://app.fossa.com/projects/git%2Bgithub.com%2FNexusGPU%2Fvgpu.rs?ref=badge_shield\u0026issueType=license)\n[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2FNexusGPU%2Fvgpu.rs.svg?type=shield\u0026issueType=security)](https://app.fossa.com/projects/git%2Bgithub.com%2FNexusGPU%2Fvgpu.rs?ref=badge_shield\u0026issueType=security)\n[![Release](https://github.com/NexusGPU/vgpu.rs/actions/workflows/release.yml/badge.svg)](https://github.com/NexusGPU/vgpu.rs/actions/workflows/release.yml)\n[![Lint](https://github.com/NexusGPU/vgpu.rs/actions/workflows/lint.yml/badge.svg)](https://github.com/NexusGPU/vgpu.rs/actions/workflows/lint.yml)\n[![Test](https://github.com/NexusGPU/vgpu.rs/actions/workflows/test.yml/badge.svg)](https://github.com/NexusGPU/vgpu.rs/actions/workflows/test.yml)\n\nvgpu.rs is a fractional GPU \u0026 vgpu-hypervisor implementation written in Rust.\n\n## Installation\n\nYou can download the latest release binaries from the\n[GitHub Releases page](https://github.com/NexusGPU/vgpu.rs/releases). Use the\nfollowing command to automatically download the appropriate version:\n\n```bash\n# Download and extract the latest release\nARCH=$(uname -m | sed 's/x86_64/x64/' | sed 's/aarch64/arm64/')\n\n# Download the libraries\nwget \"https://github.com/NexusGPU/vgpu.rs/releases/latest/download/libadd_path-${ARCH}.tar.gz\"\nwget \"https://github.com/NexusGPU/vgpu.rs/releases/latest/download/libcuda_limiter-${ARCH}.tar.gz\"\n\n# Extract the archives\ntar -xzf libadd_path-${ARCH}.tar.gz\ntar -xzf libcuda_limiter-${ARCH}.tar.gz\n\n# Optional: Remove the archives after extraction\nrm libadd_path-${ARCH}.tar.gz libcuda_limiter-${ARCH}.tar.gz\n```\n\n## Usage\n\n### Using cuda-limiter\n\nThe `cuda-limiter` library intercepts CUDA API calls to enforce resource limits. After downloading and extracting the library, you can\nuse it as follows:\n\n```bash\n\n# First, get your GPU UUIDs or device indices\n# Run this command to list all available GPUs and their UUIDs\nnvidia-smi -L\n# Example output:\nGPU 0: NVIDIA GeForce RTX 4060 Ti (UUID: GPU-3430f778-7a25-704c-9090-8b0bb2478114)\n\n# Set environment variables to configure limits\n# You can use either GPU UUIDs (case-insensitive), device indices, or both as keys\n# 1. Use only GPU UUID as key\nexport TENSOR_FUSION_CUDA_UP_LIMIT='{\"gpu-3430f778-7a25-704c-9090-8b0bb2478114\": 10}'\nexport TENSOR_FUSION_CUDA_MEM_LIMIT='{\"gpu-3430f778-7a25-704c-9090-8b0bb2478114\": 1073741824}'\n\n# 2. Use only device index as key\nexport TENSOR_FUSION_CUDA_UP_LIMIT='{\"0\": 20}'\nexport TENSOR_FUSION_CUDA_MEM_LIMIT='{\"0\": 2147483648}'\n\n# Preload the cuda-limiter library and run an application\nLD_PRELOAD=/path/to/libcuda_limiter.so your_cuda_application\n\n# To verify the limiter is working, check nvidia-smi output\nLD_PRELOAD=/path/to/libcuda_limiter.so nvidia-smi\n# Nvidia-smi output:\n+-----------------------------------------------------------------------------------------+\n| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |\n|-----------------------------------------+------------------------+----------------------+\n| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |\n|                                         |                        |               MIG M. |\n|=========================================+========================+======================|\n|   0  NVIDIA GeForce RTX 4060 Ti     Off |   00000000:01:00.0 Off |                  N/A |\n|  0%   37C    P8             11W /  160W |       0MiB /   1024MiB |      0%      Default |\n|                                         |                        |                  N/A |\n+-----------------------------------------+------------------------+----------------------+\n\n+-----------------------------------------------------------------------------------------+\n| Processes:                                                                              |\n|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |\n|        ID   ID                                                               Usage      |\n|=========================================================================================|\n|                                                                                         |\n+-----------------------------------------------------------------------------------------+\n\n\n```\n\n### Using add-path\n\nThe `add-path` library modifies environment variables at runtime to ensure\nproper library loading and execution. After downloading and extracting the\nlibrary, you need to load it using LD_PRELOAD:\n\n```bash\n# First, set LD_PRELOAD to use the add-path library\nexport LD_PRELOAD=/path/to/libadd_path.so\n\n# Basic usage: Add a path to the PATH environment variable\nsh -c \"TF_PATH=/usr/custom/bin env | grep PATH\"\n# Example output: PATH=/home/user/bin:/usr/local/sbin:/bin:/usr/custom/bin\n\n# Add a path to LD_PRELOAD\nsh -c \"TF_LD_PRELOAD=/path/to/custom.so env | grep LD_PRELOAD\"\n# Example output: LD_PRELOAD=/path/to/libadd_path.so:/path/to/custom.so\n\n# Add a path to LD_LIBRARY_PATH\nsh -c \"TF_LD_LIBRARY_PATH=/usr/local/cuda/lib64 env | grep LD_LIBRARY_PATH\"\n# Example output: LD_LIBRARY_PATH=/usr/lib64:/lib64:/usr/local/cuda/lib64\n\n# Prepend a path to PATH (higher priority)\nsh -c \"TF_PREPEND_PATH=/opt/custom/bin env | grep PATH\"\n# Example output: PATH=/opt/custom/bin:/home/user/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n```\n\n**Note**: The key difference between using `TF_PATH` and `TF_PREPEND_PATH` is the position where the path is added. Prepending puts the path at the beginning, giving it higher priority when the system searches for executables or libraries.\n\n## Project Structure\n\nThis project is organized as a Cargo workspace containing multiple crates, each\nwith specific responsibilities:\n\n### Crates\n\n- [**hypervisor**](crates/hypervisor): The hypervisor implementation that\n  monitors and manages GPU resources. It leverages NVML (NVIDIA Management\n  Library) to track GPU utilization and optimize CUDA workload scheduling.\n\n- [**cuda-limiter**](crates/cuda-limiter): A dynamic library that intercepts\n  CUDA API calls to enforce resource limits. Built as a\n  `cdylib` that can be preloaded into CUDA applications to control their\n  resource usage.\n\n  \u003e **Implementation Reference**: The cuda-limiter module's design and implementation is based on research from\n  \u003e [GaiaGPU: Sharing GPUs in Container Clouds](https://ieeexplore.ieee.org/document/8672318). This paper introduces\n  \u003e innovative techniques for GPU resource management and isolation in container environments.\n\n- [**add-path**](crates/add-path): A utility library that modifies environment\n  variables like `PATH`, `LD_PRELOAD`, and `LD_LIBRARY_PATH` to ensure proper\n  library loading and execution. Built as a `cdylib` for runtime loading.\n\n  This library supports both appending and prepending values to environment\n  variables:\n  - By default, when an environment variable such as `TF_PATH`, `TF_LD_PRELOAD`,\n    or `TF_LD_LIBRARY_PATH` is set, its value will be appended to the\n    corresponding variable (e.g., `PATH`).\n  - If you want to prepend a value instead (i.e., place it at the beginning),\n    use an environment variable prefixed with `TF_PREPEND_`, such as\n    `TF_PREPEND_PATH`. This will insert the value at the front, ensuring it\n    takes precedence during library or binary lookup.\n\n  This flexible mechanism allows fine-grained control over environment variable\n  ordering, which is critical for correct library loading and runtime behavior\n  in complex CUDA or GPU environments.\n\n- [**tf-macro**](crates/tf-macro): Contains procedural macros that simplify common\n  patterns used throughout the codebase, improving code readability and reducing\n  boilerplate.\n\n- [**utils**](crates/utils): A collection of common utilities and helper\n  functions shared across the project. Includes tracing, logging, and other\n  infrastructure components.\n\n## System Architecture\n\n### 1. Overview\n\nThis project provides a Kubernetes-based solution for GPU virtualization and resource limiting. Its primary goal is to enable multiple pods to share physical GPU resources in a multi-tenant environment while precisely controlling the GPU usage of each pod.\n\nThe system consists of two core components:\n\n*   **Hypervisor (Daemon)**: A daemon process running on each GPU node, responsible for managing and scheduling all pods that require GPU resources on that node.\n*   **Cuda-Limiter (SO Library)**: A dynamic library injected into the user application's process space via the `LD_PRELOAD` mechanism. It intercepts CUDA API calls, communicates with the Hypervisor, and enforces GPU operation limits based on the Hypervisor's scheduling decisions.\n\nThe entire system is designed to run in a Kubernetes cluster, using Pods as the basic unit for resource isolation and limitation. This means all processes (workers/limiters) within the same pod share the same GPU quota.\n\n### 2. Core Components\n\n#### 2.1. Hypervisor\n\nThe Hypervisor is the \"brain\" of the system, deployed as a DaemonSet on each Kubernetes node equipped with a GPU. Its main responsibilities include:\n\n*   **Worker Management**: Tracks and manages all connected `cuda-limiter` instances (i.e., workers).\n*   **Resource Monitoring**:\n    *   Monitors physical GPU metrics (e.g., utilization, memory usage) via NVML (NVIDIA Management Library).\n    *   Aggregates GPU usage data reported by all workers.\n*   **Scheduling Policy**:\n    *   Dynamically decides which pod's process is allowed to perform GPU computations based on a predefined scheduling algorithm (e.g., weighted round-robin) and each pod's resource quota.\n    *   Scheduling decisions are dispatched to the corresponding `cuda-limiter` instances.\n*   **Shared Memory Communication**:\n    *   Creates a shared memory region to efficiently broadcast global GPU utilization data to all `cuda-limiter` instances on the node. This avoids the overhead of each worker querying NVML individually.\n*   **Kubernetes Integration**:\n    *   Interacts with the Kubernetes API Server to watch for pod creation and deletion events.\n    *   Retrieves pod metadata (like pod name, resource requests) and associates this information with workers to achieve pod-level resource isolation.\n*   **API Service**:\n    *   Provides an HTTP API for `cuda-limiter` registration, command polling, status reporting, etc.\n    *   Uses the `http-bidir-comm` crate to implement bidirectional communication with `cuda-limiter`.\n\n#### 2.2. Cuda-Limiter\n\nCuda-Limiter is a dynamic library (`.so` file) injected into every user process that needs to use the GPU. Its core function is to act as the Hypervisor's agent within the user process.\n\n*   **API Interception (Hooking)**:\n    *   Uses techniques like `frida-gum` to intercept critical CUDA API calls (e.g., `cuLaunchKernel`) and NVML API calls at runtime.\n    *   This is the key to enforcing GPU limits, as all GPU-related operations must first be inspected by `cuda-limiter`.\n*   **Communication with Hypervisor**:\n    *   On initialization, it obtains the Hypervisor's address from environment variables and registers itself.\n    *   Uses the `http-bidir-comm` crate to establish a long-lived connection (based on HTTP long-polling or SSE) with the Hypervisor to receive scheduling commands (e.g., \"execute,\" \"wait\").\n*   **Execution Control (Trap \u0026 Wait)**:\n    *   When an intercepted CUDA function is called, `cuda-limiter` pauses the thread's execution.\n    *   It sends a \"request to execute\" signal to the Hypervisor, including information about the current process and pod.\n    *   It then blocks and waits for the Hypervisor's response. Only after receiving an \"allow execute\" command does it call the original CUDA function, allowing the GPU operation to proceed.\n*   **Shared Memory Access**:\n    *   By attaching to the shared memory region created by the Hypervisor, `cuda-limiter` can access the current node-wide GPU status with near-zero overhead for local decision-making or data reporting.\n*   **Environment Variable Configuration**:\n    *   Relies on environment variables (e.g., `HYPERVISOR_IP`, `HYPERVISOR_PORT`, `POD_NAME`) to obtain the necessary runtime context.\n\n#### 2.3. http-bidir-comm\n\nThis is a generic, HTTP-based bidirectional communication library used by both the Hypervisor and Cuda-Limiter. It abstracts the common pattern of client-server task requests and result reporting.\n\n*   **Client (Cuda-Limiter side)**:\n    *   Implements a `BlockingHttpClient`, suitable for use in injected, potentially non-async code environments.\n    *   Pulls tasks/commands from the server via long-polling or Server-Sent Events (SSE).\n*   **Server (Hypervisor side)**:\n    *   Implemented using the `Poem` web framework to provide an asynchronous HTTP service.\n    *   Maintains task queues for each client (worker) and handles result submissions from clients.\n\n#### 2.4. Utils\n\nA common utility library providing shared functionality across multiple crates:\n\n*   **Logging**: Standardized logging facilities.\n*   **Hooking**: Low-level wrappers for API interception.\n*   **Shared Memory**: Wrappers for creating and accessing shared memory.\n*   **Build Info**: Embeds version and build information at compile time.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnexusgpu%2Fvgpu.rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnexusgpu%2Fvgpu.rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnexusgpu%2Fvgpu.rs/lists"}