https://github.com/sozercan/dynamo-actions

Last synced: 8 days ago
JSON representation
Host: GitHub
URL: https://github.com/sozercan/dynamo-actions
Owner: sozercan
Created: 2025-09-10T19:37:20.000Z (26 days ago)
Default Branch: main
Last Pushed: 2025-09-26T21:10:49.000Z (10 days ago)
Last Synced: 2025-09-26T21:26:18.789Z (10 days ago)
Size: 27.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Dynamo Actions

GitHub Actions workflows for building container images and deploying [NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo) components to Kubernetes.

## Workflows overview

| Workflow          | Purpose                                                                                                              | Trigger             | Key inputs                                                              | Outputs                               |

| ----------------- | -------------------------------------------------------------------------------------------------------------------- | ------------------- | ----------------------------------------------------------------------- | ------------------------------------- |

| `build-vllm`      | Build the VLLM container via the Dynamo repository and push to NGC.                                                  | `workflow_dispatch` | `dynamo_ref` (optional)                                                 | Image pushed to `nvcr.io`.            |

| `deploy-model`    | Render a deployment template under `deploy/`, substitute env variables, override replicas, and apply with `kubectl`. | `workflow_dispatch` | `runtime`, `deployment_type`, `main_container_image`, `model_name`, JSON config inputs | Applied manifest + uploaded artifact. |

| `deploy-operator` | Install / upgrade the Dynamo platform (CRDs + operator) via Helm.                                                    | `workflow_dispatch` | `dynamo_version` (optional)                                             | Operator installed in target cluster. |

---

## Workflow details

### `build-vllm`

Builds and optionally publishes a container using Dynamo's `./container/build.sh` helper.

- Clones `ai-dynamo/dynamo`, checks out the chosen ref (defaults to `v0.5.0`).

- Logs into `nvcr.io` with the provided credentials.

- Runs `./container/build.sh --framework VLLM --platform linux/arm64`.

#### Inputs (build-vllm)

- `dynamo_ref` (optional) – Tag or branch from the upstream repository; defaults to `v0.5.0`.

### `deploy-model`

Applies one of the templates under `deploy/` to a Kubernetes cluster with your chosen runtime, image, model, and configuration. The workflow now uses **compact JSON inputs** (to stay within GitHub's 10 input limit) that are parsed into environment variables before rendering.

Key capabilities:

- `envsubst` replaces `MAIN_CONTAINER_IMAGE`, `MODEL`, replica placeholders and SGLang-specific values.

- A Python post-processing step coerces replica counts to integers to satisfy the CRD schema.

- The rendered manifest is saved as an artifact for later inspection / rollback.

#### Inputs (deploy-model)

Required:

- `runtime` – `vllm` or `sglang`.

- `deployment_type` – Template filename (without `.yaml`) under `deploy//`.

- `main_container_image` – Image used for all `mainContainer` entries.

- `model_name` – Model identifier passed to the runtime entrypoint.

JSON configuration inputs (all optional; each has defaults if omitted):

| Input name       | Applies to | JSON Keys (examples)                                                                                                                       | Purpose                            |

| ---------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------- |

| `replicas`       | both       | `frontend`, `decode`, `prefill`, `planner`, `prometheus`                                                                                   | Override service replica counts.   |

| `sglang_scaling` | sglang     | `page_size`, `tp_size`, `dp_size`, `ep_size`, `decode_gpus`, `prefill_gpus`, `multinode_decode_node_count`, `multinode_prefill_node_count` | Parallelism & GPU sizing.          |

| `sglang_flags`   | sglang     | `enable_dp_attention`, `trust_remote_code`, `skip_tokenizer_init`, `mem_fraction_static`                                                   | Boolean / misc tuning flags.       |

| `sglang_disagg`  | sglang     | `disagg_transfer_backend`, `disagg_bootstrap_port`                                                                                         | Disaggregation transport settings. |

| `command_extras` | sglang     | `worker`, `frontend`, `planner`, `prometheus`                                                                                              | Append extra CLI flags per role.   |

##### Defaults

```jsonc

replicas:        { "frontend":1, "decode":1, "prefill":1, "planner":1, "prometheus":1 }

sglang_scaling:  { "page_size":16, "tp_size":1, "dp_size":1, "ep_size":1, "decode_gpus":1, "prefill_gpus":1, "multinode_decode_node_count":1, "multinode_prefill_node_count":1 }

sglang_flags:    { "enable_dp_attention":false, "trust_remote_code":true, "skip_tokenizer_init":true, "mem_fraction_static":"" }

sglang_disagg:   { "disagg_transfer_backend":"nixl", "disagg_bootstrap_port":30001 }

command_extras:  { "worker":"", "frontend":"", "planner":"", "prometheus":"" }

```

You can override only what you need; leaving an input blank (or omitting via CLI) keeps the defaults. For non‑SGLang runtimes the SGLang JSON inputs are ignored.

#### Prerequisites (deploy-model)

- Self-hosted runner with `kubectl`, `envsubst`, and `python3` available.

- Target cluster must already contain the `hf-token-secret` referenced by the templates (or adjust the manifest before applying).

### `deploy-operator`

Installs/updates the Dynamo CRDs and platform Helm charts.

- Verifies `kubectl`, `helm`, and `docker` are present and prints their versions before proceeding.

- Resolves the release version from the `dynamo_version` input; if blank, fetches the latest GitHub release tag automatically.

- Installs CRDs into the `default` namespace and the platform chart into `dynamo-kubernetes` (creating it if needed).

- Creates/updates the `hf-token-secret` using the provided Hugging Face token.

#### Inputs (deploy-operator)

- `dynamo_version` (optional) – Specific release tag to install. Leave empty to auto-detect the latest upstream release.

#### Required environment / secrets (deploy-operator)

- `HF_TOKEN` – Expose this value on the runner (for example by mapping a repository secret to an environment variable) so the workflow can create the Kubernetes secret. If it is missing, the workflow logs a warning and skips the secret creation step.

---

## Deployment templates (`deploy/`)

Templates are now organized by runtime:

```text

deploy/

  vllm/      # vLLM runtime templates (agg, agg_router, disagg, disagg_router, disagg_planner)

  sglang/    # SGLang runtime templates (agg, agg_logging, agg_router, disagg, disagg_multinode, disagg_planner)

```

When you run the `deploy-model` workflow you must provide:

- `runtime` (currently `vllm` or `sglang`)

- `deployment_type` (template filename without extension in the chosen runtime folder)

The YAML files support the following common environment placeholders:

| Placeholder              | Description                                                     | Defaults (commented in templates)             |

| ------------------------ | --------------------------------------------------------------- | --------------------------------------------- |

| `$MAIN_CONTAINER_IMAGE`  | Runtime container image applied to all `mainContainer` entries. | `nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.0` |

| `$MODEL`                 | Model identifier passed to `python3 -m dynamo.vllm`.            | `Qwen/Qwen3-0.6B`                             |

| `${FRONTEND_REPLICAS}`   | Replica count for `Frontend` services.                          | `1`                                           |

| `${DECODE_REPLICAS}`     | Replica count for `VllmDecodeWorker` services.                  | `1` (or `2` in planner/router templates)      |

| `${PREFILL_REPLICAS}`    | Replica count for `VllmPrefillWorker` services.                 | `1` (or `2` in planner template)              |

| `${PLANNER_REPLICAS}`    | Replica count for `Planner` service (planner template only).    | `1`                                           |

| `${PROMETHEUS_REPLICAS}` | Replica count for `Prometheus` service (planner template only). | `1`                                           |

### SGLang-Specific JSON Keys / Placeholders

The SGLang JSON inputs map 1:1 to environment variables consumed by templates and the command builder step:

| JSON Input       | Key 
| ---------------- | ------------- 
| `sglang_scaling` | `page_size` 
|                  | `tp_size` 
|                  | `dp_size` 
|                  | `ep_size` 
|                  | `decode_gpus` 
|                  | `prefill_gpus` 
|                  | `multinode_de 
|                  | `multinode_pr 
| `sglang_flags`   | `enable_dp_attention` 
|                  | `trust_remote_code` 
|                  | `skip_tokenizer_init` 
|                  | `mem_fraction_static` 
| `sglang_disagg`  | `disagg_transfer_backend` 
|                  | `disagg_bootstrap_port` 
| `command_extras` | `worker` 
|                  | `frontend` 
|                  | `planner` 
|                  | `prometheus`

| Placeholder / Env                 | Notes                                  | ----------------- | --------------------------------- | -------------------------------------- | | `${PAGE_SIZE}`                    | Token page size.                       | | `${TP_SIZE}`                      | Tensor parallel.                       | | `${DP_SIZE}`                      | Data parallel (omitted in CLI when 1). | | `${EP_SIZE}`                      | Expert parallel (omitted when 1).      | | `${DECODE_GPUS}`                  | GPUs per decode worker pod.            | | `${PREFILL_GPUS}`                 | GPUs per prefill worker pod.           | code_node_count`  | `${MULTINODE_DECODE_NODE_COUNT}`  | Multi-node template only.              | efill_node_count` | `${MULTINODE_PREFILL_NODE_COUNT}` | Multi-node template only.              | | `${ENABLE_DP_ATTENTION}`          | Adds flag if true.                     | | `${TRUST_REMOTE_CODE}`            | Adds flag if true.                     | | `${SKIP_TOKENIZER_INIT}`          | Adds flag if true.                     | | `${MEM_FRACTION_STATIC}`          | Adds flag when non-empty.              | | `${DISAGG_TRANSFER_BACKEND}`      | Disaggregation mode templates.         | | `${DISAGG_BOOTSTRAP_PORT}`        | Port for bootstrap.                    | | `$WORKER_COMMAND_EXTRA`           | Appended to decode/prefill commands.   | | `$FRONTEND_COMMAND_EXTRA`         | Frontend command extension.            | | `$PLANNER_COMMAND_EXTRA`          | Planner template only.                 | | `$PROMETHEUS_COMMAND_EXTRA`       | Planner template only.                 |

Replica environment variables originate from `replicas` JSON keys: `FRONTEND_REPLICAS`, `DECODE_REPLICAS`, `PREFILL_REPLICAS`, `PLANNER_REPLICAS`, `PROMETHEUS_REPLICAS`.

All placeholders are resolved via `envsubst` and replicas are integer-normalized post-render.

---

## Simple Qwen (SGLang, Aggregate) Example

Minimal aggregate deployment (single frontend + single decode worker) with defaults. Because all defaults already match what we need, only the required inputs are necessary.

### Prefilled Dispatch URL (Defaults Only)

```text

https://github.com/sozercan/dynamo-actions/actions/workflows/deploy-model.yaml/dispatches/new?ref=main&inputs[runtime]=sglang&inputs[deployment_type]=agg&inputs[main_container_image]=my-registry/sglang-runtime:latest&inputs[model_name]=Qwen/Qwen3-0.6B

```

### CLI (Explicit Replicas JSON – optional)

```bash

gh workflow run deploy-model.yaml \

  --ref main \

  -f runtime=sglang \

  -f deployment_type=agg \

  -f main_container_image=my-registry/sglang-runtime:latest \

  -f model_name=Qwen/Qwen3-0.6B \

  -f replicas='{"frontend":1,"decode":1,"prefill":1,"planner":1,"prometheus":1}'

```

---

## Simple Qwen (vLLM, Aggregate) Example

Defaults also suffice for vLLM aggregate; only required inputs are needed.

```bash

gh workflow run deploy-model.yaml \

  --ref main \

  -f runtime=vllm \

  -f deployment_type=agg \

  -f main_container_image=my-registry/vllm-runtime:latest \

  -f model_name=Qwen/Qwen3-0.6B

```

---

## DeepSeek (SGLang, Disaggregated) Example

Reference configuration approximating the DeepSeek R1 disaggregated setup (single pod per role with 8 GPUs each, 8‑way TP/DP/EP). Adjust the image as needed.

```bash

gh workflow run deploy-model.yaml \

  --ref main \

  -f runtime=sglang \

  -f deployment_type=disagg \

  -f main_container_image=my-registry/sglang-wideep-runtime:latest \

  -f model_name=deepseek-ai/DeepSeek-R1 \

  -f replicas='{"frontend":1,"decode":1,"prefill":1,"planner":1,"prometheus":1}' \

  -f sglang_scaling='{"page_size":16,"tp_size":8,"dp_size":8,"ep_size":8,"decode_gpus":8,"prefill_gpus":8,"multinode_decode_node_count":1,"multinode_prefill_node_count":1}' \

  -f sglang_flags='{"enable_dp_attention":true,"trust_remote_code":true,"skip_tokenizer_init":true,"mem_fraction_static":"0.82"}' \

  -f sglang_disagg='{"disagg_transfer_backend":"nixl","disagg_bootstrap_port":30001}'

```

URL form (JSON must be URL‑encoded, example shown partially encoded):

```text

https://github.com/sozercan/dynamo-actions/actions/workflows/deploy-model.yaml/dispatches/new?ref=main&inputs[runtime]=sglang&inputs[deployment_type]=disagg&inputs[main_container_image]=my-registry/sglang-wideep-runtime:latest&inputs[model_name]=deepseek-ai/DeepSeek-R1&inputs[sglang_scaling]=%7B%22page_size%22:16,%22tp_size%22:8,%22dp_size%22:8,%22ep_size%22:8,%22decode_gpus%22:8,%22prefill_gpus%22:8,%22multinode_decode_node_count%22:1,%22multinode_prefill_node_count%22:1%7D&inputs[sglang_flags]=%7B%22enable_dp_attention%22:true,%22trust_remote_code%22:true,%22skip_tokenizer_init%22:true,%22mem_fraction_static%22:%220.82%22%7D&inputs[sglang_disagg]=%7B%22disagg_transfer_backend%22:%22nixl%22,%22disagg_bootstrap_port%22:30001%7D

```

### Notes

- Increase `multinode_*_node_count` and reduce `*_gpus` if you want to distribute the total GPU count across multiple nodes instead of concentrating them in a single pod.

- Omit `mem_fraction_static` (leave blank) if unsure; the flag will not be added.

- For planner / prometheus roles use `deployment_type=disagg_planner` and supply meaningful `planner_replicas` / `prometheus_replicas`.

- Ensure the cluster has the required NVIDIA GPU resources and the `hf-token-secret` in the target namespace.

---

## Runner expectations

- Self-hosted runners require NVIDIA drivers (`nvidia-smi`), Docker, kubectl, helm, and GNU coreutils (for `envsubst`).

- Ensure the runner has network access to `nvcr.io`, `helm.ngc.nvidia.com`, and any target Kubernetes API server.

- Provide `KUBECONFIG` or use in-cluster configuration so `kubectl` commands succeed.

---

## Secrets summary

| Secret / variable | Used by           | Purpose                                        |

| ----------------- | ----------------- | ---------------------------------------------- |

| `HF_TOKEN`        | `deploy-operator` | Populate the `hf-token-secret` in the cluster. |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sozercan/dynamo-actions

Awesome Lists containing this project

README