{"id":49298229,"url":"https://github.com/scitrera/cuda-containers","last_synced_at":"2026-04-26T05:02:11.654Z","repository":{"id":336526964,"uuid":"1147583725","full_name":"scitrera/cuda-containers","owner":"scitrera","description":"Scitrera builds of various CUDA containers for version consistency, starting primarily with NVIDIA DGX Spark Containers","archived":false,"fork":false,"pushed_at":"2026-04-24T08:21:06.000Z","size":200,"stargazers_count":27,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-24T10:32:46.955Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scitrera.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-02T00:39:56.000Z","updated_at":"2026-04-24T08:21:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/scitrera/cuda-containers","commit_stats":null,"previous_names":["scitrera/cuda-containers"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/scitrera/cuda-containers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scitrera%2Fcuda-containers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scitrera%2Fcuda-containers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scitrera%2Fcuda-containers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scitrera%2Fcuda-containers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scitrera","download_url":"https://codeload.github.com/scitrera/cuda-containers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scitrera%2Fcuda-containers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32286271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"online","status_checked_at":"2026-04-26T02:00:05.962Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-26T05:02:09.817Z","updated_at":"2026-04-26T05:02:11.648Z","avatar_url":"https://github.com/scitrera.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CUDA Containers for NVIDIA DGX Spark\n\nhttps://github.com/scitrera/cuda-containers\n\nThis repository contains Dockerfiles and build recipes for CUDA-based containers optimized for **NVIDIA DGX Spark**\nsystems, with a focus on **vLLM**, **sglang**, **llama.cpp**, **PyTorch**, and multi-node inference workloads.\n\nThe primary goal of this project is to provide **stable, well-versioned, prebuilt images** that work out-of-the-box on\nDGX Spark (Blackwell-ready), while still being suitable as **base images** for custom builds.\n\n---\n\n## Why This Repo Exists\n\nThe official NVIDIA images tend to run too far behind the latest releases. Other community images prioritize bleeding\nedge over versioning and stability.\n\nThe goal of this repo is to provide a **stable, well-versioned, prebuilt images** that work out-of-the-box on DGX\nSpark (Blackwell-ready).\n\nThe main architectural difference from other builds (e.g. eugr's repo (link below) -- which is pretty much the community\nstandard) is:\n\n- **NCCL and PyTorch are built first**, in a dedicated base image\n- vLLM and related tooling are layered on top\n- Versioning follows **vLLM releases** as the primary axis\n\nIf you need the *absolute latest vLLM features from git right now*, I still strongly recommend:\nhttps://github.com/eugr/spark-vllm-docker\n\nFor sglang, the officially provided container is not continuously updated. I assume that might change\nin the near future as sglang gets better SM121 support -- but in the meantime, Scitrera will, on a best effort basis,\nmaintain sglang images similar to our vLLM images.\n\n---\n\n## Available Images\n\n### vLLM Images\n\nAll vLLM images:\n\n- Are optimized for DGX Spark\n- Include **Ray** for multi-node / cluster deployments\n- Rebuild PyTorch, Triton, and vLLM against updated NCCL\n- Support tensor parallelism (`-tp`) and multi-node inference\n- are hosted on Docker\n  Hub: [https://hub.docker.com/r/scitrera/dgx-spark-vllm](https://hub.docker.com/r/scitrera/dgx-spark-vllm)\n\n#### Latest Releases\n\n##### vLLM 0.16.0\n\n- `scitrera/dgx-spark-vllm:0.16.0-t4`\n    - vLLM 0.16.0\n    - PyTorch 2.10.0 (with torchvision + torchaudio)\n    - CUDA 13.1.1\n    - Transformers 4.57.6\n    - Triton 3.6.0\n    - NCCL 2.29.3-1\n    - FlashInfer 0.6.3\n\n- `scitrera/dgx-spark-vllm:0.16.0-t5`\n    - Same as above, but with **Transformers 5.2.0**\n\n##### vLLM 0.15.1\n\n- `scitrera/dgx-spark-vllm:0.15.1-t4`\n    - vLLM 0.15.1\n    - PyTorch 2.10.0 (with torchvision + torchaudio)\n    - CUDA 13.1.0\n    - Transformers 4.57.6\n    - Triton 3.5.1 *(3.6.0 not yet compatible)*\n    - NCCL 2.29.2-1\n    - FlashInfer 0.6.2\n\n- `scitrera/dgx-spark-vllm:0.15.1-t5`\n    - Same as above, but with **Transformers 5.0.0**\n\n##### Earlier Builds\n\n- `scitrera/dgx-spark-vllm:0.15.0-t4`\n- `scitrera/dgx-spark-vllm:0.15.0-t5`\n- `scitrera/dgx-spark-vllm:0.14.1-t4`\n- `scitrera/dgx-spark-vllm:0.14.1-t5`\n- `scitrera/dgx-spark-vllm:0.14.0-t4`\n- `scitrera/dgx-spark-vllm:0.14.0-t5`\n    - Includes a patch to `is_deepseek_mla()` for **GLM-4.7-Flash**\n    - Tested successfully with Ray and `-tp4` on a 4-node DGX Spark cluster\n\n- `scitrera/dgx-spark-vllm:0.13.0-t4`\n\n---\n\n### SGLang Images\n\nSGLang images are also optimized for DGX Spark and provide an alternative high-performance inference runtime.\n\n- are hosted on Docker\n  Hub: [https://hub.docker.com/r/scitrera/dgx-spark-sglang](https://hub.docker.com/r/scitrera/dgx-spark-sglang)\n\n#### Latest Releases\n\n##### SGLang 0.5.8\n\n- `scitrera/dgx-spark-sglang:0.5.8-t4`\n    - SGLang 0.5.8 (with build fixes post-release)\n    - PyTorch 2.10.0 (with torchvision + torchaudio)\n    - CUDA 13.1.1\n    - Transformers 4.57.6\n    - Triton 3.6.0\n    - NCCL 2.29.3-1\n    - FlashInfer 0.6.3\n\n- `scitrera/dgx-spark-sglang:0.5.8-t5`\n    - Same as above, but with **Transformers 5.2.0**\n\n---\n\n### llama.cpp Images\n\nllama.cpp images provide a lightweight, self-contained C++ inference runtime for GGUF models on DGX Spark — no\nPython or PyTorch required. Built directly from source with CUDA support.\n\n- Are hosted on Docker\n  Hub: [https://hub.docker.com/r/scitrera/dgx-spark-llama-cpp](https://hub.docker.com/r/scitrera/dgx-spark-llama-cpp)\n\n#### Latest Releases\n\n##### llama.cpp b8076\n\n- `scitrera/dgx-spark-llama-cpp:b8076-cu131`\n    - llama.cpp build 8076\n    - CUDA 13.1.1\n    - Built on `nvidia/cuda:13.1.1-devel-ubuntu24.04`\n    - Includes llama-server, llama-cli, llama-quantize, and all standard tools\n    - GGML CUDA and RPC backends enabled\n\n---\n\n### PyTorch Development Base Image\n\nIf you want to build your own inference stack:\n\n- **`scitrera/dgx-spark-pytorch-dev:2.10.0-v2-cu131`**\n    - PyTorch 2.10.0\n    - CUDA 13.1.1\n    - NCCL 2.29.3-1\n    - Built on `nvidia/cuda:13.1.1-devel-ubuntu24.04`\n    - Includes standard build tooling\n\n- **`scitrera/dgx-spark-pytorch-dev:2.10.0-cu131`**\n    - PyTorch 2.10.0\n    - CUDA 13.1.0\n    - NCCL 2.29.2-1\n    - Built on `nvidia/cuda:13.1.0-devel-ubuntu24.04`\n    - Includes standard build tooling\n\nThis is the recommended base image if you want to:\n\n- Build vLLM/sglang/other tools yourself\n- Add custom kernels or extensions\n- Experiment with alternative runtimes\n\n---\n\n## Tag Semantics\n\nTags follow this pattern for vLLM and SGLang containers:\n\n```\n\u003cversion\u003e-t\u003ctransformers-major\u003e\n```\n\nExamples:\n\n- `0.13.0-t4` → vLLM 0.13.0 + Transformers 4.x\n- `0.5.8-t5` → SGLang 0.5.8 + Transformers 5.x\n\nFor llama.cpp containers:\n\n```\nb\u003cbuild-number\u003e-cu\u003ccuda-short\u003e\n```\n\nExamples:\n\n- `b8076-cu131` → llama.cpp build 8076 + CUDA 13.1.1\n\n---\n\n## Example Usage (vLLM)\n\n```bash\ndocker run \\\n  --privileged \\\n  --gpus all \\\n  -it --rm \\\n  --network host --ipc=host \\\n  -v ~/.cache/huggingface:/root/.cache/huggingface \\\n  scitrera/dgx-spark-vllm:0.16.0-t4 \\\n  vllm serve \\\n    Qwen/Qwen2.5-7B-Instruct \\\n    --gpu-memory-utilization 0.4\n````\n\n---\n\n## Example Usage (SGLang)\n\n```bash\ndocker run \\\n  --privileged \\\n  --gpus all \\\n  -it --rm \\\n  --network host --ipc=host \\\n  -v ~/.cache/huggingface:/root/.cache/huggingface \\\n  scitrera/dgx-spark-sglang:0.5.8-t4 \\\n  sglang serve \\\n    --model-path Qwen/Qwen2.5-7B-Instruct \\\n    --mem-fraction-static 0.4\n````\n\n---\n\n## Example Usage (llama.cpp)\n\n```bash\ndocker run \\\n  --privileged \\\n  --gpus all \\\n  -it --rm \\\n  --network host --ipc=host \\\n  -v ~/models:/models \\\n  scitrera/dgx-spark-llama-cpp:b8076-cu131 \\\n  --model /models/my-model.gguf \\\n  --host 0.0.0.0 --port 8080\n```\n\nTo use the CLI instead of the server:\n\n```bash\ndocker run \\\n  --privileged \\\n  --gpus all \\\n  -it --rm \\\n  --entrypoint llama-cli \\\n  -v ~/models:/models \\\n  scitrera/dgx-spark-llama-cpp:b8076-cu131 \\\n  -m /models/my-model.gguf \\\n  -p \"Hello, world!\" -n 128\n```\n\n---\n\n## Inspecting Component Versions\n\nMajor component versions are embedded as Docker labels.\n\n```bash\ndocker inspect scitrera/dgx-spark-vllm:0.14.0rc2-t4 \\\n  --format '{{json .Config.Labels}}' | jq\n```\n\nExample output:\n\n```json\n{\n  \"dev.scitrera.cuda_version\": \"13.1.0\",\n  \"dev.scitrera.flashinfer_version\": \"0.6.1\",\n  \"dev.scitrera.nccl_version\": \"2.28.9-1\",\n  \"dev.scitrera.torch_version\": \"2.10.0-rc6\",\n  \"dev.scitrera.transformers_version\": \"4.57.5\",\n  \"dev.scitrera.triton_version\": \"3.5.1\",\n  \"dev.scitrera.vllm_version\": \"0.14.0rc2\"\n}\n```\n\n---\n\n## Notes \u0026 Caveats\n\n* NCCL is upgraded relative to upstream PyTorch builds\n* PyTorch, Triton, and vLLM/sglang are rebuilt accordingly\n* Image sizes could still be optimized further\n* Version combinations are chosen to be as new as possible but limited by **stability** (not guaranteed to have the\n  latest features if they might break things)\n\n---\n\n## Roadmap (Loose)\n\n* Better size optimization\n* More documentation/support for DGX Spark newcomers\n\n---\n\n## Acknowledgements\n\nThis work is inspired by and complementary to:\n\n* @eugr’s DGX Spark vLLM images\n  [https://github.com/eugr/spark-vllm-docker](https://github.com/eugr/spark-vllm-docker)\n* Everyone else who contributed to the NVIDIA DGX spark forums, especially in the first two months after the DGX Spark's\n  release. Getting things to work was really a mess!\n\nThis project is not affiliated with NVIDIA. This project is sponsored and maintained\nby [scitrera.ai](https://scitrera.ai/).\n\nIf you need the very latest vLLM feature added four hours ago, start with eugr's repo.\n\nIf you want stable, prebuilt images with predictable versioning, use the docker images built from this repo.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscitrera%2Fcuda-containers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscitrera%2Fcuda-containers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscitrera%2Fcuda-containers/lists"}