{"id":51118669,"url":"https://github.com/defilantech/llmkube-runtimes","last_synced_at":"2026-06-25T00:01:14.861Z","repository":{"id":365191169,"uuid":"1270967443","full_name":"defilantech/llmkube-runtimes","owner":"defilantech","description":"LLMKube inference runtime images (AMD/Vulkan first). Build-from-source, hardware-gated CI.","archived":false,"fork":false,"pushed_at":"2026-06-23T16:17:08.000Z","size":60,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-23T18:16:04.020Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/defilantech/LLMKube","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/defilantech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-16T07:58:00.000Z","updated_at":"2026-06-21T23:55:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/defilantech/llmkube-runtimes","commit_stats":null,"previous_names":["defilantech/llmkube-runtimes"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/defilantech/llmkube-runtimes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defilantech%2Fllmkube-runtimes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defilantech%2Fllmkube-runtimes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defilantech%2Fllmkube-runtimes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defilantech%2Fllmkube-runtimes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/defilantech","download_url":"https://codeload.github.com/defilantech/llmkube-runtimes/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defilantech%2Fllmkube-runtimes/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34753781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-24T02:00:07.484Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-25T00:01:13.811Z","updated_at":"2026-06-25T00:01:14.849Z","avatar_url":"https://github.com/defilantech.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# llmkube-runtimes\n\nInference runtime container images for [LLMKube](https://github.com/defilantech/LLMKube), built from source and gated on real hardware.\n\nToday this repo builds the **AMD/Vulkan** llama.cpp runtime as two images from one build: a minimal **server** image (what the operator runs) and a **tools** image (`llama-bench` + `llama-cli`, for hardware benchmarking and diagnostics). The layout (`vulkan/`) is set up so other backends (CUDA, Intel, CPU) can be added as sibling directories later without restructuring.\n\n## Why this repo exists\n\nLLMKube previously inherited its entire serving runtime from upstream floating image tags. That made the load-bearing part of the product an uncontrolled supply chain: when upstream's `:server-vulkan` tag shipped a `libggml-vulkan.so` with an undefined shader symbol, the backend silently failed to load and fell back to CPU, and we could neither fix nor detect it without a hand-run on a GPU (see [defilantech/LLMKube#725](https://github.com/defilantech/LLMKube/issues/725)).\n\nBuilding from source here means we own the Vulkan shader-gen step, the base image, and dependency/CVE patching, and we gate every build on hardware before anything trusts it.\n\nDesign reference: [`docs/proposals/697-amd-vulkan-runtime-image.md`](https://github.com/defilantech/LLMKube/blob/main/docs/proposals/697-amd-vulkan-runtime-image.md) in the LLMKube repo.\n\n## Images\n\nBoth images come from the same `vulkan/Dockerfile` build stage, so they carry the identical llama.cpp commit and Vulkan backends.\n\n`ghcr.io/defilantech/llmkube-llama-vulkan` — the server runtime.\n\n- Ubuntu 26.04 base (Mesa new enough for `gfx1151` / Strix Halo RADV), pinned by digest.\n- `cmake -DGGML_VULKAN=ON -DGGML_BACKEND_DL=ON` with `GGML_NATIVE=OFF` (a single generic x86-64 CPU backend, not `GGML_CPU_ALL_VARIANTS`), llama.cpp pinned by tag + commit SHA.\n- Runs the OpenAI-compatible `llama-server`. No ROCm.\n\n`ghcr.io/defilantech/llmkube-llama-vulkan-tools` — benchmarking + diagnostics.\n\n- Same backends and commit as the server image, plus `llama-bench` and `llama-cli` (it also carries `llama-server`). Default entrypoint is `llama-bench`.\n- Run off-cluster to benchmark hardware (e.g. Strix Halo `gfx1151`) with numbers directly comparable to the server runtime. The operator never consumes this image.\n\nEither pod consumes the GPU by mounting `/dev/dri` device nodes (both `renderD128` and `card1`) via a generic device-plugin resource; it requests no `nvidia.com/gpu`. Non-root: the deployment grants the host render group via `securityContext.supplementalGroups`.\n\n## The two-tier gate\n\nA built image is a **candidate**. Only an image a real GPU host has verified and signed is promoted to a tag the operator consumes.\n\n1. **Tier 1, in CI (this repo, free runners, no GPU).** Build, then run `llama-server --list-devices` under the image's software Vulkan (lavapipe). The Vulkan backend must dlopen and register; a #725-class undefined-symbol break fails here before the image ever leaves CI. On pass, push `:candidate-\u003csha\u003e` with an SBOM and build provenance.\n2. **Tier 2, out-of-band on a self-hosted `gfx1151` host.** A promoter verifies the candidate's build provenance, runs a sandboxed offline GPU smoke (real device + layer offload + a throughput floor), then promotes to `:stable` / `:b\u003cupstream\u003e-llmkube\u003cN\u003e` and applies a smoke-passed signature. The host is never a CI runner, so fork-PR code never touches it.\n\nTier 2 (the promoter) lands in a follow-up; this bootstrap is Tier 1.\n\n## Build locally\n\n```bash\n# server (default final stage)\ndocker build -t llmkube-llama-vulkan:dev vulkan/\n./scripts/tier1-gate.sh llmkube-llama-vulkan:dev\n\n# tools (llama-bench + llama-cli)\ndocker build --target tools -t llmkube-llama-vulkan-tools:dev vulkan/\n./scripts/tier1-gate.sh llmkube-llama-vulkan-tools:dev\n```\n\nBump the pinned llama.cpp ref by editing `LLAMACPP_REF` + `LLAMACPP_SHA` in `vulkan/Dockerfile` (the SHA check fails the build if they disagree); both images move together.\n\n## Tags\n\nBoth images use the same tag scheme:\n\n- `:candidate-\u003cgitsha\u003e` — built + Tier-1 passed, not yet GPU-verified. Do not run in production.\n- `:b\u003cupstream-build\u003e-llmkube\u003cN\u003e` — immutable, GPU-smoke-passed.\n- `:stable` — moving, advanced by the promoter.\n\nThe operator pins an explicit immutable tag or digest of the server image, never `:stable`. The tools image is run by hand for benchmarking; pin a `:candidate-\u003cgitsha\u003e` for a reproducible benchmark.\n\n## Contributing\n\nCommits must be signed off ([DCO](https://developercertificate.org/)): `git commit -s`. Licensed under [Apache-2.0](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdefilantech%2Fllmkube-runtimes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdefilantech%2Fllmkube-runtimes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdefilantech%2Fllmkube-runtimes/lists"}