{"id":47593068,"url":"https://github.com/openclaw-rocks/openclaw-operator","last_synced_at":"2026-04-18T09:04:17.424Z","repository":{"id":336915286,"uuid":"1151230494","full_name":"openclaw-rocks/openclaw-operator","owner":"openclaw-rocks","description":"Kubernetes operator for deploying and managing OpenClaw AI agent instances with production-grade security, observability, and lifecycle management.","archived":false,"fork":false,"pushed_at":"2026-04-17T14:47:48.000Z","size":1217,"stargazers_count":319,"open_issues_count":2,"forks_count":51,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-04-17T16:24:34.422Z","etag":null,"topics":["agents","ai","golang","helm","kubernetes","openclaw","operator"],"latest_commit_sha":null,"homepage":"https://openclaw.rocks","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openclaw-rocks.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-06T07:56:06.000Z","updated_at":"2026-04-17T16:09:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"e76d0a47-ac83-42e2-af32-7b02be2cd41b","html_url":"https://github.com/openclaw-rocks/openclaw-operator","commit_stats":null,"previous_names":["openclaw-rocks/k8s-operator","openclaw-rocks/openclaw-operator"],"tags_count":103,"template":false,"template_full_name":null,"purl":"pkg:github/openclaw-rocks/openclaw-operator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw-rocks%2Fopenclaw-operator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw-rocks%2Fopenclaw-operator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw-rocks%2Fopenclaw-operator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw-rocks%2Fopenclaw-operator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openclaw-rocks","download_url":"https://codeload.github.com/openclaw-rocks/openclaw-operator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw-rocks%2Fopenclaw-operator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31962892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"online","status_checked_at":"2026-04-18T02:00:07.018Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","golang","helm","kubernetes","openclaw","operator"],"created_at":"2026-04-01T17:44:03.295Z","updated_at":"2026-04-18T09:04:17.402Z","avatar_url":"https://github.com/openclaw-rocks.png","language":"Go","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/images/banner.svg\" alt=\"OpenClaw Kubernetes Operator — OpenClaws sailing the Kubernetes seas\" width=\"100%\"\u003e\n\u003c/p\u003e\n\n# OpenClaw Kubernetes Operator\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Go Report Card](https://goreportcard.com/badge/github.com/OpenClaw-rocks/openclaw-operator)](https://goreportcard.com/report/github.com/OpenClaw-rocks/openclaw-operator)\n[![CI](https://github.com/OpenClaw-rocks/openclaw-operator/actions/workflows/ci.yaml/badge.svg)](https://github.com/OpenClaw-rocks/openclaw-operator/actions/workflows/ci.yaml)\n[![Kubernetes](https://img.shields.io/badge/Kubernetes-1.28%2B-326CE5?logo=kubernetes\u0026logoColor=white)](https://kubernetes.io)\n[![Go](https://img.shields.io/badge/Go-1.24-00ADD8?logo=go\u0026logoColor=white)](https://go.dev)\n\n**Self-host [OpenClaw](https://openclaw.ai) AI agents on Kubernetes with production-grade security, observability, and lifecycle management.**\n\nOpenClaw is an AI agent platform that acts on your behalf across Telegram, Discord, WhatsApp, and Signal. It manages your inbox, calendar, smart home, and more through 50+ integrations. While [OpenClaw.rocks](https://openclaw.rocks) offers fully managed hosting, this operator lets you run OpenClaw on your own infrastructure with the same operational rigor.\n\n---\n\n## Why an Operator?\n\nDeploying AI agents to Kubernetes involves more than a Deployment and a Service. You need network isolation, secret management, persistent storage, health monitoring, optional browser automation, and config rollouts, all wired correctly. This operator encodes those concerns into a single `OpenClawInstance` custom resource so you can go from zero to production in minutes:\n\n```yaml\napiVersion: openclaw.rocks/v1alpha1\nkind: OpenClawInstance\nmetadata:\n  name: my-agent\nspec:\n  envFrom:\n    - secretRef:\n        name: openclaw-api-keys\n  storage:\n    persistence:\n      enabled: true\n      size: 10Gi\n```\n\nThe operator reconciles this into a fully managed stack of 9+ Kubernetes resources: secured, monitored, and self-healing.\n\n## Agents That Adapt Themselves\n\nAgents can autonomously install skills, patch their config, add environment variables, and seed workspace files - all through the Kubernetes API, validated by the operator on every request.\n\n```yaml\n# 1. Enable self-configure on the instance\nspec:\n  selfConfigure:\n    enabled: true\n    allowedActions: [skills, config, envVars, workspaceFiles]\n```\n\n```yaml\n# 2. The agent creates this to install a skill at runtime\napiVersion: openclaw.rocks/v1alpha1\nkind: OpenClawSelfConfig\nmetadata:\n  name: add-fetch-skill\nspec:\n  instanceRef: my-agent\n  addSkills:\n    - \"@anthropic/mcp-server-fetch\"\n```\n\nEvery request is validated against the instance's allowlist policy. Protected config keys cannot be overwritten, and denied requests are logged with a reason. See [Self-configure](#self-configure) for details.\n\n\u003e **Note:** Without `selfConfigure` enabled, config or skill changes made by the agent inside the container won't trigger a pod restart. You'll need to restart the pod manually (e.g. `kubectl delete pod \u003cpod-name\u003e`) for changes to take effect.\n\n## Features\n\n| | Feature | Details |\n|---|---|---|\n| **Declarative** | Single CRD | One resource defines the entire stack: StatefulSet, Service, RBAC, NetworkPolicy, PVC, PDB, Ingress, and more |\n| **Adaptive** | Agent self-configure | Agents autonomously install skills, patch config, and adapt their environment via the K8s API - every change validated against an allowlist policy |\n| **Secure** | Hardened by default | Non-root (UID 1000), read-only root filesystem, all capabilities dropped, seccomp RuntimeDefault, default-deny NetworkPolicy, validating webhook |\n| **Observable** | Built-in metrics | Prometheus metrics, ServiceMonitor integration, structured JSON logging, Kubernetes events |\n| **Flexible** | Provider-agnostic config | Use any AI provider (Anthropic, OpenAI, or others) via environment variables and inline or external config |\n| **Config Modes** | Merge or overwrite | `overwrite` replaces config on restart; `merge` deep-merges with PVC config, preserving runtime changes. Config is restored on every container restart via init container. |\n| **Skills** | Declarative install | Install ClawHub skills, npm packages, or GitHub-hosted skill packs via `spec.skills` - supports `npm:` and `pack:` prefixes |\n| **Plugins** | Declarative install | Install OpenClaw plugins via `spec.plugins` - npm packages installed in a secure init container |\n| **Runtime Deps** | pnpm \u0026 Python/uv | Built-in init containers install pnpm (via corepack) or Python 3.12 + uv for MCP servers and skills |\n| **Auto-Update** | OCI registry polling | Opt-in version tracking: checks the registry for new semver releases, backs up first, rolls out, and auto-rolls back if the new version fails health checks |\n| **Scalable** | Auto-scaling | HPA integration with CPU and memory metrics, min/max replica bounds, automatic StatefulSet replica management |\n| **Operational** | Instance suspension | Scale to zero with `spec.suspended: true` - all non-runtime resources remain managed, resume instantly with `false` |\n| **Resilient** | Self-healing lifecycle | PodDisruptionBudgets, health probes, automatic config rollouts via content hashing, 5-minute drift detection |\n| **Backup/Restore** | S3-backed snapshots | Automatic backup to S3-compatible storage on deletion, pre-update, and on a cron schedule; restore into a new instance from any snapshot |\n| **Workspace Seeding** | Initial files \u0026 dirs | Pre-populate the workspace with files and directories before the agent starts; reference an external ConfigMap for GitOps workflows |\n| **Gateway Auth** | Auto-generated tokens | Automatic gateway token Secret per instance, bypassing mDNS pairing (unusable in k8s) |\n| **Tailscale** | Tailnet access | Expose via Tailscale Serve or Funnel with SSO auth - no Ingress needed |\n| **Extensible** | Sidecars \u0026 init containers | Chromium for browser automation, Ollama for local LLMs, Tailscale for tailnet access, plus custom init containers and sidecars |\n| **Cloud Native** | SA annotations \u0026 CA bundles | AWS IRSA / GCP Workload Identity via ServiceAccount annotations; CA bundle injection for corporate proxies |\n\n\n## Architecture\n\n```\n+-----------------------------------------------------------------+\n|  OpenClawInstance CR          OpenClawSelfConfig CR              |\n|  (your declarative config)   (agent self-modification requests) |\n+---------------+-------------------------------------------------+\n                | watch\n                v\n+-----------------------------------------------------------------+\n|  OpenClaw Operator                                              |\n|  +-----------+  +-------------+  +----------------------------+ |\n|  | Reconciler|  |   Webhooks  |  |   Prometheus Metrics       | |\n|  |           |  |  (validate  |  |  (reconcile count,         | |\n|  |  creates -\u003e  |   \u0026 default)|  |   duration, phases)        | |\n|  +-----------+  +-------------+  +----------------------------+ |\n+---------------+-------------------------------------------------+\n                | manages\n                v\n+-----------------------------------------------------------------+\n|  Managed Resources (per instance)                               |\n|                                                                 |\n|  ServiceAccount -\u003e Role -\u003e RoleBinding    NetworkPolicy         |\n|  ConfigMap        PVC      PDB            ServiceMonitor        |\n|  GatewayToken Secret                                            |\n|                                                                 |\n|  StatefulSet                                                    |\n|  +-----------------------------------------------------------+ |\n|  | Init: config -\u003e pnpm* -\u003e python* -\u003e skills* -\u003e custom      | |\n|  |                                        (* = opt-in)        | |\n|  +------------------------------------------------------------+ |\n|  | OpenClaw Container  Gateway Proxy (nginx)                  | |\n|  |                     Chromium (opt) / Ollama (opt)          | |\n|  |                     Tailscale (opt) + custom sidecars      | |\n|  +------------------------------------------------------------+ |\n|                                                                 |\n|  Service (default: 18789, 18793 or custom) -\u003e Ingress (opt)     |\n+-----------------------------------------------------------------+\n```\n\n## Quick Start\n\n### Prerequisites\n\n- Kubernetes 1.28+\n- Helm 3\n\n### 1. Install the operator\n\n```bash\nhelm install openclaw-operator \\\n  oci://ghcr.io/openclaw-rocks/charts/openclaw-operator \\\n  --namespace openclaw-operator-system \\\n  --create-namespace\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eAlternative: install with Kustomize\u003c/summary\u003e\n\n```bash\n# Install CRDs\nmake install\n\n# Deploy the operator\nmake deploy IMG=ghcr.io/openclaw-rocks/openclaw-operator:latest\n```\n\n\u003c/details\u003e\n\n### 2. Create a secret with your API keys\n\n```yaml\napiVersion: v1\nkind: Secret\nmetadata:\n  name: openclaw-api-keys\ntype: Opaque\nstringData:\n  ANTHROPIC_API_KEY: \"sk-ant-...\"\n```\n\n### 3. Deploy an OpenClaw instance\n\n```yaml\napiVersion: openclaw.rocks/v1alpha1\nkind: OpenClawInstance\nmetadata:\n  name: my-agent\nspec:\n  envFrom:\n    - secretRef:\n        name: openclaw-api-keys\n  storage:\n    persistence:\n      enabled: true\n      size: 10Gi\n```\n\n```bash\nkubectl apply -f secret.yaml -f openclawinstance.yaml\n```\n\n### 4. Verify\n\n```bash\nkubectl get openclawinstances\n# NAME       PHASE     AGE\n# my-agent   Running   2m\n\nkubectl get pods\n# NAME         READY   STATUS    AGE\n# my-agent-0   1/1     Running   2m\n```\n\n## Configuration\n\n### Inline config (openclaw.json)\n\n```yaml\nspec:\n  config:\n    raw:\n      agents:\n        defaults:\n          model:\n            primary: \"anthropic/claude-sonnet-4-20250514\"\n          sandbox: true\n      session:\n        scope: \"per-sender\"\n```\n\n### External ConfigMap reference\n\n```yaml\nspec:\n  config:\n    configMapRef:\n      name: my-openclaw-config\n      key: openclaw.json\n```\n\nConfig changes are detected via SHA-256 hashing and automatically trigger a rolling update. No manual restart needed.\n\n### Gateway proxy\n\nBy default, each pod includes an nginx reverse proxy sidecar that forwards traffic to the OpenClaw gateway on loopback. Set `spec.gateway.enabled: false` to disable it:\n\n- Health probes and Service ports target the gateway directly on port 18789\n- `gateway.bind` is set to `0.0.0.0` instead of loopback\n- The `gateway-proxy` container and its tmp volume are omitted from the pod\n- To replace the built-in proxy with your own (e.g., Envoy, a signing proxy), disable it and add your proxy via `spec.sidecars`\n- **Warning:** Do not set `gateway.bind: loopback` in your config JSON when the proxy is disabled - the gateway will only listen on `127.0.0.1` with nothing forwarding external traffic, making the pod unreachable. The operator emits a `GatewayBindConflict` warning event if this misconfiguration is detected.\n- **TLS:** When the proxy is disabled, the gateway serves plaintext `ws://` on `0.0.0.0`. Ensure your replacement proxy or Ingress handles TLS termination to avoid exposing unencrypted WebSocket traffic (CWE-319).\n\n### Gateway authentication\n\nThe operator automatically generates a gateway token Secret for each instance and injects it into both the config JSON (`gateway.auth.mode: token`) and the `OPENCLAW_GATEWAY_TOKEN` env var. This bypasses Bonjour/mDNS pairing, which is unusable in Kubernetes.\n\n- The token is generated once and never overwritten - rotate it by editing the Secret directly\n- If you set `gateway.auth.token` in your config or `OPENCLAW_GATEWAY_TOKEN` in `spec.env`, your value takes precedence\n- To bring your own token Secret, set `spec.gateway.existingSecret` - the operator will use it instead of auto-generating one (the Secret must have a key named `token`)\n- The operator automatically sets `gateway.controlUi.dangerouslyDisableDeviceAuth: true` - device pairing is incompatible with Kubernetes (users cannot approve pairing from inside a container, connections are always proxied, and mDNS is unavailable)\n- **Do not set `gateway.mode: local`** in your config - this mode is for desktop installs and enforces device identity checks that cannot work behind a reverse proxy in Kubernetes\n- When connecting to the Control UI through an Ingress, pass the gateway token in the URL fragment: `https://openclaw.example.com/#token=\u003cyour-token\u003e`\n- Since v2026.2.24, OpenClaw restricts `gateway.allowedOrigins` to same-origin by default - if accessing via a non-default hostname (e.g. Ingress), set `gateway.allowedOrigins: [\"*\"]` in your config\n\n### Control UI allowed origins\n\nThe operator auto-injects `gateway.controlUi.allowedOrigins` so the Control UI works through reverse proxies without CORS errors. Origins are derived from:\n\n- **Localhost** (always): `http://localhost:18789`, `http://127.0.0.1:18789` for port-forwarding\n- **Ingress hosts**: scheme determined from TLS config (`https://` if TLS, `http://` otherwise)\n- **Explicit extras**: `spec.gateway.controlUiOrigins` for custom proxy URLs\n\nIf you set `gateway.controlUi.allowedOrigins` directly in your config JSON, the operator will not override it.\n\n### Chromium sidecar\n\nEnable headless browser automation for web scraping, screenshots, and browser-based integrations:\n\n```yaml\nspec:\n  chromium:\n    enabled: true\n    image:\n      repository: chromedp/headless-shell  # default\n      tag: \"stable\"\n    resources:\n      requests:\n        cpu: \"250m\"\n        memory: \"512Mi\"\n      limits:\n        cpu: \"1000m\"\n        memory: \"2Gi\"\n    # Pass extra flags to the Chromium process (appended to built-in anti-bot defaults)\n    extraArgs:\n      - \"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36\"\n    # Inject extra environment variables into the sidecar\n    extraEnv:\n      - name: DISPLAY\n        value: \":99\"\n```\n\nWhen enabled, the operator automatically:\n- Injects a `CHROMIUM_URL` environment variable into the main container\n- Configures browser profiles in the OpenClaw config - both `\"default\"` and `\"chrome\"` profiles are set to point at the sidecar's CDP endpoint, so browser tool calls work regardless of which profile name the LLM passes\n- Sets up shared memory, security contexts, and health probes for the sidecar\n- Applies anti-bot-detection flags by default (`--disable-blink-features=AutomationControlled`, `--disable-features=AutomationControlled`, `--no-first-run`)\n\n#### Persistent browser profiles\n\nBy default, all browser state (cookies, localStorage, session tokens) is lost on pod restart. Enable persistence to retain browser profiles across restarts:\n\n```yaml\nspec:\n  chromium:\n    enabled: true\n    persistence:\n      enabled: true          # default: false\n      storageClass: \"\"        # optional - uses cluster default if empty\n      size: \"1Gi\"             # default: 1Gi\n      existingClaim: \"\"       # optional - use a pre-existing PVC\n```\n\nWhen persistence is enabled, the operator creates a dedicated PVC and passes `--user-data-dir=/chromium-data` to Chrome so that cookies, localStorage, IndexedDB, cached credentials, and session tokens survive pod restarts. This is useful for authenticated browser automation, MFA-protected services, and long-running browser workflows.\n\n**Security note:** Persistent browser profiles contain sensitive session tokens. The PVC has the same security posture as other instance volumes. Ensure your StorageClass supports encryption at rest for sensitive workloads.\n\n### Ollama sidecar\n\nRun local LLMs alongside your agent for private, low-latency inference without external API calls:\n\n```yaml\nspec:\n  ollama:\n    enabled: true\n    models:\n      - llama3.2\n      - nomic-embed-text\n    gpu: 1\n    storage:\n      sizeLimit: 30Gi\n    resources:\n      requests:\n        cpu: \"1\"\n        memory: \"4Gi\"\n      limits:\n        cpu: \"4\"\n        memory: \"16Gi\"\n```\n\nWhen enabled, the operator:\n- Injects an `OLLAMA_HOST` environment variable into the main container\n- Pre-pulls specified models via an init container before the agent starts\n- Configures GPU resource limits when `gpu` is set (`nvidia.com/gpu`)\n- Mounts a model cache volume (emptyDir by default, or an existing PVC via `storage.existingClaim`)\n\nSee [Custom AI Providers](docs/custom-providers.md) for configuring OpenClaw to use Ollama models via environment variables.\n\n### Web terminal sidecar\n\nProvide browser-based shell access to running instances for debugging and inspection without requiring `kubectl exec`:\n\n```yaml\nspec:\n  webTerminal:\n    enabled: true\n    readOnly: false\n    credential:\n      secretRef:\n        name: my-terminal-creds\n    resources:\n      requests:\n        cpu: \"50m\"\n        memory: \"64Mi\"\n      limits:\n        cpu: \"200m\"\n        memory: \"128Mi\"\n```\n\nWhen enabled, the operator:\n- Injects a [ttyd](https://github.com/tsl0922/ttyd) sidecar container on port 7681\n- Mounts the instance data volume at `/home/openclaw/.openclaw` so you can inspect config, logs, and data files\n- Adds the web terminal port to the Service and NetworkPolicy for external access\n- Supports basic auth via a Secret with `username` and `password` keys\n- Supports read-only mode (`readOnly: true`) for production environments where shell input should be disabled\n\n### Tailscale integration\n\nExpose your instance via [Tailscale](https://tailscale.com) Serve (tailnet-only) or Funnel (public internet) - no Ingress or LoadBalancer needed:\n\n```yaml\nspec:\n  tailscale:\n    enabled: true\n    mode: serve          # \"serve\" (tailnet only) or \"funnel\" (public internet)\n    authKeySecretRef:\n      name: tailscale-auth\n    authSSO: true        # allow passwordless login for tailnet members\n    hostname: my-agent   # defaults to instance name\n    image:\n      repository: ghcr.io/tailscale/tailscale  # default\n      tag: latest\n    resources:\n      requests:\n        cpu: 50m\n        memory: 64Mi\n      limits:\n        cpu: 200m\n        memory: 256Mi\n```\n\nWhen enabled, the operator runs a **Tailscale sidecar** (`tailscaled`) that handles serve/funnel declaratively via `TS_SERVE_CONFIG`. An **init container** copies the `tailscale` CLI binary to a shared volume so the main container can call `tailscale whois` for SSO authentication. The sidecar runs in userspace mode (`TS_USERSPACE=true`) - no `NET_ADMIN` capability needed.\n\n**State persistence:** Tailscale node identity and TLS certificates are automatically persisted to a Kubernetes Secret (`\u003cinstance\u003e-ts-state`) via `TS_KUBE_SECRET`. This prevents hostname incrementing (device-1, device-2, ...) and Let's Encrypt certificate re-issuance across pod restarts. The operator pre-creates the state Secret, grants the pod's ServiceAccount `get/update/patch` access to it, and mounts the SA token automatically.\n\nUse ephemeral+reusable auth keys from the [Tailscale admin console](https://login.tailscale.com/admin/settings/keys). When `authSSO` is enabled, tailnet members can authenticate without a gateway token.\n\n### Config merge mode\n\nBy default, the operator overwrites the config file on every pod restart. Set `mergeMode: merge` to deep-merge operator config with existing PVC config, preserving runtime changes made by the agent:\n\n```yaml\nspec:\n  config:\n    mergeMode: merge\n    raw:\n      agents:\n        defaults:\n          model:\n            primary: \"anthropic/claude-sonnet-4-20250514\"\n```\n\n**Caveat:** In merge mode, removing a key from the CR does not remove it from the PVC config - the old value persists because deep-merge only adds or updates keys. If you need to remove stale config keys (e.g., after removing `gateway.mode: local`), temporarily switch to `mergeMode: overwrite`, apply, wait for the pod to restart, then switch back to `merge`.\n\n### Skill installation\n\nInstall skills declaratively. The operator runs an init container that fetches each skill before the agent starts. Entries use ClawHub by default, or prefix with `npm:` to install from npmjs.com. ClawHub installs are idempotent - if a skill is already installed (e.g., when using persistent storage), it is skipped rather than failing:\n\n```yaml\nspec:\n  skills:\n    - \"@anthropic/mcp-server-fetch\"       # ClawHub (default)\n    - \"npm:@openclaw/matrix\"              # npm package from npmjs.com\n```\n\nnpm lifecycle scripts are disabled globally on the init container (`NPM_CONFIG_IGNORE_SCRIPTS=true`) to mitigate supply chain attacks.\n\n### Skill packs\n\nSkill packs bundle multiple files (SKILL.md, scripts, config) into a single installable unit hosted on GitHub. Use the `pack:` prefix with `owner/repo/path` format:\n\n```yaml\nspec:\n  skills:\n    - \"pack:openclaw-rocks/skills/image-gen\"            # latest from default branch\n    - \"pack:openclaw-rocks/skills/image-gen@v1.0.0\"     # pinned to tag\n    - \"pack:myorg/private-skills/custom-tool@main\"       # private repo (requires GITHUB_TOKEN)\n```\n\nPacks are resolved in one of two modes:\n\n**1. Manifest mode** (explicit) -- the pack path contains a `skillpack.json` describing which files to seed and where:\n\n```json\n{\n  \"files\": {\n    \"skills/image-gen/SKILL.md\": \"SKILL.md\",\n    \"skills/image-gen/scripts/generate.py\": \"scripts/generate.py\"\n  },\n  \"directories\": [\"skills/image-gen/scripts\"],\n  \"config\": {\n    \"image-gen\": {\"enabled\": true}\n  }\n}\n```\n\n**2. Raw-repo mode** (autodiscovery) -- when no `skillpack.json` is present and the pack path contains a `SKILL.md`, the operator installs the entire directory verbatim into `skills/\u003cbasename\u003e/` in the workspace. This is useful for multi-skill repositories like [fluxcd/agent-skills](https://github.com/fluxcd/agent-skills) that follow a conventional `skills/\u003cname\u003e/SKILL.md` layout without per-skill manifests:\n\n```yaml\nspec:\n  skills:\n    - \"pack:fluxcd/agent-skills/skills/gitops-repo-audit@main\"\n    # installs every file under skills/gitops-repo-audit/ into the workspace\n    # at skills/gitops-repo-audit/ (including nested assets, schemas, etc.)\n```\n\nRaw mode does not inject config entries into `config.raw.skills.entries` -- use manifest mode if you need that. The operator refuses to install if GitHub truncates the tree response for very large repositories (add a `skillpack.json` manifest in that case).\n\nThe operator resolves packs via the GitHub Contents + Git Trees APIs (cached for 5 minutes), seeds files into the workspace via the init container, and (in manifest mode) injects config entries into `config.raw.skills.entries` with user overrides taking precedence. Set `GITHUB_TOKEN` on the operator deployment for private repo access.\n\n### Plugin installation\n\nInstall plugins declaratively. The operator runs a dedicated init container that installs each plugin via `npm install` before the agent starts:\n\n```yaml\nspec:\n  plugins:\n    - \"@martian-engineering/lossless-claw\"\n    - \"some-other-plugin\"\n```\n\nnpm lifecycle scripts are disabled globally on the init container (`NPM_CONFIG_IGNORE_SCRIPTS=true`) to mitigate supply chain attacks. Plugins are installed into the PVC-backed `~/.openclaw/node_modules` directory and persist across pod restarts.\n\n### Workspace seeding\n\nPre-populate the agent workspace with files and directories before the agent starts. Files can be provided inline or referenced from an external ConfigMap -- ideal for GitOps workflows where workspace content is managed alongside your manifests.\n\n**Inline files:**\n\n```yaml\nspec:\n  workspace:\n    initialDirectories:\n      - tools/scripts\n    initialFiles:\n      README.md: |\n        # My Workspace\n        This workspace is managed by OpenClaw.\n```\n\n**External ConfigMap reference:**\n\n```yaml\nspec:\n  workspace:\n    configMapRef:\n      name: my-workspace-files      # all keys become workspace files\n    initialFiles:                    # inline files (override configMapRef)\n      EXTRA.md: \"additional content\"\n```\n\nAll keys in the referenced ConfigMap are written as files into the workspace directory. When both `configMapRef` and `initialFiles` are specified, inline files take precedence over ConfigMap entries with the same filename.\n\n**Merge priority** (highest wins): operator-injected files \u003e inline `initialFiles` \u003e external `configMapRef` \u003e skill packs.\n\nThe operator sets a `WorkspaceReady` status condition to `False` when the referenced ConfigMap is missing or contains invalid filenames, and `True` once workspace files are seeded successfully. The controller watches external ConfigMaps for changes and re-reconciles automatically.\n\n**How it works:** Workspace files are seeded once via an init container. The init container copies files from a read-only ConfigMap volume to the PVC. The main container only sees the PVC (writable), so agents can modify their workspace files and changes persist across pod restarts. ConfigMaps are never mounted directly on the main container.\n\n**GitOps example with Kustomize:**\n\n```yaml\n# kustomization.yaml\napiVersion: kustomize.config.k8s.io/v1beta1\nkind: Kustomization\n\nnamespace: my-namespace              # must match the instance namespace\n\ngeneratorOptions:\n  disableNameSuffixHash: true        # required - operator looks up by exact name\n\nconfigMapGenerator:\n  - name: my-workspace-files\n    files:\n      - workspace/SOUL.md\n      - workspace/AGENT.md\n```\n\n\u003e **Important:** Two kustomize settings are required when using `configMapGenerator` with `configMapRef`:\n\u003e - **`disableNameSuffixHash: true`** -- The operator looks up ConfigMaps by exact name. Kustomize's default hash suffix (e.g. `-57k7g4dthc`) would cause a `ConfigMapNotFound` error.\n\u003e - **`namespace`** -- Generated ConfigMaps must be in the same namespace as the instance. Without this, kustomize creates them in the `default` namespace.\n\n**Additional workspaces (multi-agent):**\n\nWhen running multiple agents with isolated workspaces, use `additionalWorkspaces` to seed files for each agent. Each entry seeds to `~/.openclaw/workspace-\u003cname\u003e/` -- set matching paths in `spec.config.raw.agents.list[].workspace`.\n\n```yaml\nspec:\n  workspace:\n    configMapRef:\n      name: main-agent-workspace\n    additionalWorkspaces:\n      - name: scheduler\n        configMapRef:\n          name: scheduler-workspace\n        initialFiles:\n          SOUL.md: \"I am the scheduler agent\"\n        initialDirectories:\n          - tools\n  config:\n    raw:\n      agents:\n        list:\n          - id: main\n            name: \"Main Agent\"\n          - id: scheduler\n            name: \"Scheduler Agent\"\n      bindings:\n        - agentId: scheduler\n          match:\n            channel: discord\n            peer:\n              kind: channel\n              id: \"123456789\"        # bind to a specific channel\n```\n\nEach additional workspace supports the same `configMapRef`, `initialFiles`, and `initialDirectories` as the default workspace. Operator-injected `ENVIRONMENT.md` is included; `BOOTSTRAP.md` is not (only the default agent runs onboarding). Max 10 additional workspaces.\n\n\u003e **Seed-once behavior:** Workspace files (both default and additional) are only written on first boot when they don't already exist on the PVC. If an agent modifies its own SOUL.md or AGENT.md at runtime, those changes persist across pod restarts and are never overwritten by the ConfigMap content. To re-seed a file, delete it from the PVC first.\n\n**Full GitOps example with multiple agents:**\n\n```yaml\n# kustomization.yaml\napiVersion: kustomize.config.k8s.io/v1beta1\nkind: Kustomization\n\nnamespace: my-namespace\n\ngeneratorOptions:\n  disableNameSuffixHash: true\n\nresources:\n  - instance.yaml\n\nconfigMapGenerator:\n  - name: main-agent-workspace\n    files:\n      - agents/main/SOUL.md\n      - agents/main/AGENT.md\n  - name: scheduler-workspace\n    files:\n      - agents/scheduler/SOUL.md\n      - agents/scheduler/TOOLS.md\n```\n\n### Self-configure\n\nAllow agents to modify their own configuration by creating `OpenClawSelfConfig` resources via the K8s API. The operator validates each request against the instance's `allowedActions` policy before applying changes:\n\n```yaml\nspec:\n  selfConfigure:\n    enabled: true\n    allowedActions:\n      - skills        # add/remove skills\n      - config        # patch openclaw.json\n      - workspaceFiles # add/remove workspace files\n      - envVars       # add/remove environment variables\n```\n\nWhen enabled, the operator:\n- Grants the instance's ServiceAccount RBAC permissions to read its own CRD and create `OpenClawSelfConfig` resources\n- Enables SA token automounting so the agent can authenticate with the K8s API\n- Injects a `SELFCONFIG.md` skill file and `selfconfig.sh` helper script into the workspace\n- Opens port 6443 egress in the NetworkPolicy for K8s API access\n\nThe agent creates a request like:\n\n```yaml\napiVersion: openclaw.rocks/v1alpha1\nkind: OpenClawSelfConfig\nmetadata:\n  name: add-fetch-skill\nspec:\n  instanceRef: my-agent\n  addSkills:\n    - \"@anthropic/mcp-server-fetch\"\n```\n\nThe operator validates the request, applies it to the parent `OpenClawInstance`, and sets the request's status to `Applied`, `Denied`, or `Failed`. Terminal requests are auto-deleted after 1 hour.\n\n#### GitOps Coexistence\n\nSelfConfig uses Kubernetes Server-Side Apply (SSA) with the field manager name `openclaw-selfconfig`. This enables safe coexistence with GitOps controllers (FluxCD, ArgoCD, etc.) that manage the same `OpenClawInstance` resource:\n\n- **Per-item ownership** -- Skills (set items), env vars (map items by name), and workspace files (map fields) are tracked individually. A SelfConfig can add or remove only the items it owns without conflicting with items managed by other controllers.\n- **Atomic ownership** -- The `config.raw` field is owned atomically. If a GitOps controller also manages `config.raw`, `ForceOwnership` transfers ownership to the SelfConfig field manager on apply.\n- **Removal safety** -- When a SelfConfig attempts to remove an item owned by another field manager, the operator emits a `Warning` / `SelfConfigSkippedRemoval` event identifying the owning manager and includes the warning in the status message.\n- **Non-SSA users are unaffected** -- If you do not use `selfConfigure`, no SSA field managers are created and existing workflows remain unchanged.\n\nSee the [API reference](docs/api-reference.md) for the full `OpenClawSelfConfig` CRD spec and `spec.selfConfigure` fields.\n\n### Persistent storage\n\nBy default the operator creates a 10Gi PVC and retains it when the CR is deleted (orphan behavior). Override size, storage class, or retention:\n\n```yaml\nspec:\n  storage:\n    persistence:\n      size: 20Gi\n      storageClass: fast-ssd\n      orphan: true   # default -- PVC is RETAINED when the CR is deleted\n      # orphan: false  -- PVC is deleted with the CR (garbage collected)\n```\n\nTo reuse an existing PVC (e.g., after restoring from a backup):\n\n```yaml\nspec:\n  storage:\n    persistence:\n      existingClaim: my-agent-data\n```\n\n\u003e **Retention is stateful data protection.** Because agent workspaces contain irreplaceable data such as memory, notebooks, and conversation history, the default is `orphan: true`. To re-attach a retained PVC to a new instance, set `existingClaim` to its name.\n\n### Runtime dependencies\n\nEnable built-in init containers that install pnpm or Python/uv to the data PVC for MCP servers and skills:\n\n```yaml\nspec:\n  runtimeDeps:\n    pnpm: true    # Installs pnpm via corepack\n    python: true  # Installs Python 3.12 + uv\n```\n\n### Custom init containers and sidecars\n\nAdd custom init containers (run after operator-managed ones) and sidecar containers:\n\n```yaml\nspec:\n  initContainers:\n    - name: fetch-models\n      image: curlimages/curl:8.5.0\n      command: [\"sh\", \"-c\", \"curl -o /data/model.bin https://...\"]\n      volumeMounts:\n        - name: data\n          mountPath: /data\n  sidecars:\n    - name: cloud-sql-proxy\n      image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.14.3\n      args: [\"--structured-logs\", \"my-project:us-central1:my-db\"]\n      ports:\n        - containerPort: 5432\n  sidecarVolumes:\n    - name: proxy-creds\n      secret:\n        secretName: cloud-sql-proxy-sa\n```\n\nReserved init container names (`init-config`, `init-pnpm`, `init-python`, `init-skills`, `init-ollama`) are rejected by the webhook. If your sidecar replaces the built-in gateway proxy, set `spec.gateway.enabled: false` to avoid running both.\n\n### Extra volumes and mounts\n\nMount additional ConfigMaps, Secrets, or CSI volumes into the main container:\n\n```yaml\nspec:\n  extraVolumes:\n    - name: shared-data\n      persistentVolumeClaim:\n        claimName: shared-pvc\n  extraVolumeMounts:\n    - name: shared-data\n      mountPath: /shared\n```\n\n### Ingress Basic Auth\n\nAdd HTTP Basic Authentication to the Ingress. The operator auto-generates a random password and stores it in a managed Secret:\n\n```yaml\nspec:\n  networking:\n    ingress:\n      enabled: true\n      className: nginx\n      hosts:\n        - host: my-agent.example.com\n      security:\n        basicAuth:\n          enabled: true\n          username: admin          # default: \"openclaw\"\n          realm: \"My Agent\"        # default: \"OpenClaw\"\n```\n\nThe generated Secret is named `\u003cname\u003e-basic-auth` and contains three keys: `auth` (htpasswd format for ingress controllers), `username`, and `password` (plaintext, for retrieving the auto-generated credentials). It is tracked in `status.managedResources.basicAuthSecret`. To use your own credentials, provide a pre-formatted htpasswd Secret:\n\n```yaml\nspec:\n  networking:\n    ingress:\n      security:\n        basicAuth:\n          enabled: true\n          existingSecret: my-htpasswd-secret  # must contain key \"auth\"\n```\n\nFor Traefik ingress, a `Middleware` CRD resource is created automatically (requires Traefik CRDs installed).\n\n### Custom service ports\n\nBy default the operator creates a Service with the gateway (18789) and canvas (18793) ports. To expose custom ports instead (e.g., for a non-default application), set `spec.networking.service.ports`:\n\n```yaml\nspec:\n  networking:\n    service:\n      type: ClusterIP\n      ports:\n        - name: http\n          port: 3978\n          targetPort: 3978\n```\n\nWhen `ports` is set, it fully replaces the default ports -- including the Chromium port if the sidecar is enabled. To keep the defaults alongside custom ports, include them explicitly. If `targetPort` is omitted it defaults to `port`. See the [API reference](docs/api-reference.md#specnetworkingservice) for all fields.\n\n### CA bundle injection\n\nInject a custom CA certificate bundle for environments with TLS-intercepting proxies or private CAs:\n\n```yaml\nspec:\n  security:\n    caBundle:\n      configMapName: corporate-ca-bundle  # or secretName\n      key: ca-bundle.crt                  # default key name\n```\n\nThe bundle is mounted into all containers and the `SSL_CERT_FILE` / `NODE_EXTRA_CA_CERTS` environment variables are set automatically.\n\n### ServiceAccount annotations\n\nAdd annotations to the managed ServiceAccount for cloud provider integrations:\n\n```yaml\nspec:\n  security:\n    rbac:\n      serviceAccountAnnotations:\n        # AWS IRSA\n        eks.amazonaws.com/role-arn: \"arn:aws:iam::123456789:role/openclaw\"\n        # GCP Workload Identity\n        # iam.gke.io/gcp-service-account: \"openclaw@project.iam.gserviceaccount.com\"\n```\n\n### Auto-update\n\nOpt into automatic version tracking so the operator detects new releases and rolls them out without manual intervention:\n\n```yaml\nspec:\n  autoUpdate:\n    enabled: true\n    checkInterval: \"24h\"         # how often to poll the registry (1h-168h)\n    backupBeforeUpdate: true     # back up the PVC before applying an update\n    rollbackOnFailure: true      # auto-rollback if the new version fails health checks\n    healthCheckTimeout: \"10m\"    # how long to wait for the pod to become ready (2m-30m)\n```\n\nWhen enabled, the operator resolves `latest` to the highest stable semver tag on creation, then polls for newer versions on each `checkInterval`. Before updating, it optionally runs an S3 backup, then patches the image tag and monitors the rollout. If the pod fails to become ready within `healthCheckTimeout`, it reverts the image tag and (optionally) restores the PVC from the pre-update snapshot.\n\nSafety mechanisms include failed-version tracking (skips versions that failed health checks), a circuit breaker (pauses after 3 consecutive rollbacks), and full data restore when `backupBeforeUpdate` is enabled. Auto-update is a no-op for digest-pinned images (`spec.image.digest`).\n\nSee `status.autoUpdate` for update progress: `kubectl get openclawinstance my-agent -o jsonpath='{.status.autoUpdate}'`\n\n### Backup and restore\n\nThe operator uses [rclone](https://rclone.org/) to back up and restore PVC data to/from S3-compatible storage. All backup operations require a Secret named `s3-backup-credentials` in the **operator namespace**:\n\n```yaml\napiVersion: v1\nkind: Secret\nmetadata:\n  name: s3-backup-credentials\n  namespace: openclaw-operator-system\nstringData:\n  S3_ENDPOINT: \"https://s3.us-east-1.amazonaws.com\"\n  S3_BUCKET: \"my-openclaw-backups\"\n  S3_ACCESS_KEY_ID: \"\u003ckey-id\u003e\"            # optional - omit for workload identity\n  S3_SECRET_ACCESS_KEY: \"\u003csecret-key\u003e\"    # optional - omit for workload identity\n  # S3_PROVIDER: \"Other\"    # optional - set to \"AWS\", \"GCS\", etc. for native credential chains\n  # S3_REGION: \"us-east-1\"  # optional - needed for MinIO or providers with custom regions\n```\n\nCompatible with AWS S3, Backblaze B2, Cloudflare R2, MinIO, Wasabi, and any S3-compatible API.\n\n**Cloud workload identity:** Omit `S3_ACCESS_KEY_ID` and `S3_SECRET_ACCESS_KEY` and set `S3_PROVIDER` (e.g., `AWS`, `GCS`) to use the provider's native credential chain. Set `spec.backup.serviceAccountName` to a workload identity-enabled ServiceAccount (IRSA, GKE Workload Identity, AKS Workload Identity) so backup Jobs inherit the cloud IAM role. See the [Workload Identity section](docs/api-reference.md#workload-identity-cloud-native-auth) in the API reference for a full example.\n\n**When backups run automatically:**\n\n- **On delete** - the operator backs up the PVC before removing any resources. Subject to `spec.backup.timeout` (default: 30m) - if the backup does not complete in time, it is skipped automatically. Add `openclaw.rocks/skip-backup: \"true\"` to skip immediately.\n- **Before auto-update** - when `spec.autoUpdate.backupBeforeUpdate: true` (the default).\n- **On a schedule** - when `spec.backup.schedule` is set (cron expression).\n\nIf the Secret does not exist, backups are silently skipped and operations proceed normally.\n\n**Periodic scheduled backups:**\n\n```yaml\nspec:\n  backup:\n    schedule: \"0 2 * * *\"   # Daily at 2 AM UTC\n    retentionDays: 7         # Keep 7 days of daily snapshots (default)\n    historyLimit: 3          # Successful job runs to retain (default: 3)\n    failedHistoryLimit: 1    # Failed job runs to retain (default: 1)\n    timeout: \"30m\"           # Max time for pre-delete backup (default: 30m, min: 5m, max: 24h)\n    serviceAccountName: \"\"   # Optional: IRSA/Pod Identity SA for backup Jobs\n```\n\nThe operator creates a Kubernetes CronJob that runs rclone to sync PVC data to S3. The CronJob uses pod affinity to co-locate on the same node as the StatefulSet pod (required for RWO PVCs). Backups use an incremental sync strategy: data is synced to a fixed `latest` path (only changed files uploaded), a daily snapshot is taken, and snapshots older than `retentionDays` are automatically pruned.\n\n**Restoring from backup:**\n\n```yaml\nspec:\n  # Path recorded in status.lastBackupPath of the source instance\n  restoreFrom: \"backups/my-tenant/my-agent/2026-01-15T10:30:00Z\"\n```\n\nThe operator runs a restore job to populate the PVC before starting the StatefulSet, then clears `restoreFrom` automatically. Backup paths follow the format `backups/\u003ctenantId\u003e/\u003cinstanceName\u003e/\u003ctimestamp\u003e`.\n\n**Clone / migrate an instance:** `restoreFrom` works on both existing and brand-new instances. To clone an instance across namespaces, create a new `OpenClawInstance` with `spec.restoreFrom` pointing to the source's backup path - the operator creates the PVC, runs the restore Job, then starts the StatefulSet. The new instance gets a fresh gateway token; the source is unaffected. The restore Job uses `spec.backup.serviceAccountName` when set, so workload identity (IRSA/Pod Identity) works for cross-namespace clones. For ArgoCD users, add `spec.restoreFrom` to `ignoreDifferences` since the operator auto-clears it after restore.\n\nFor full details see the [Backup and Restore section](docs/api-reference.md#backup-and-restore) in the API reference.\n\n### What the operator manages automatically\n\nThese behaviors are always applied - no configuration needed:\n\n| Behavior | Details |\n|----------|---------|\n| `gateway.bind` | When the gateway proxy sidecar is enabled (default), binds to loopback and an nginx reverse proxy handles external access. When disabled (`spec.gateway.enabled: false`), binds to `0.0.0.0` so the gateway is reachable directly. |\n| Gateway auth token | Auto-generated Secret per instance; injected into config and env |\n| Control UI origins | `gateway.controlUi.allowedOrigins` auto-injected from localhost + ingress hosts + `spec.gateway.controlUiOrigins` |\n| `OPENCLAW_GATEWAY_HANDSHAKE_TIMEOUT_MS` | `10000` (10s) to work around upstream timeout regression in v2026.3.12 ([#46892](https://github.com/openclaw/openclaw/issues/46892)) |\n| `OPENCLAW_DISABLE_BONJOUR=1` | Always set (mDNS does not work in Kubernetes) |\n| Browser profiles | When Chromium is enabled, `\"default\"` and `\"chrome\"` profiles are auto-configured with the sidecar's CDP endpoint |\n| Tailscale serve config | When Tailscale is enabled, a `tailscale-serve.json` key is added to the ConfigMap for the sidecar's `TS_SERVE_CONFIG` |\n| Tailscale state persistence | When Tailscale is enabled, node identity and TLS certs are persisted to a `\u003cinstance\u003e-ts-state` Secret via `TS_KUBE_SECRET` |\n| Config hash rollouts | Config changes trigger rolling updates via SHA-256 hash annotation |\n| Config restoration | The init container restores config on every pod restart (overwrite or merge mode) |\n\nFor the full list of configuration options, see the [API reference](docs/api-reference.md) and the [full sample YAML](config/samples/openclaw_v1alpha1_openclawinstance_full.yaml).\n\n## Security\n\nThe operator follows a **secure-by-default** philosophy. Every instance ships with hardened settings out of the box, with no extra configuration needed.\n\n### Defaults\n\n- **Non-root execution**: containers run as UID 1000; root (UID 0) is blocked by the validating webhook (exception: Ollama sidecar requires root per the official image)\n- **Read-only root filesystem**: enabled by default for the main container and the Chromium sidecar; the PVC at `~/.openclaw/` provides writable home, and a `/tmp` emptyDir handles temp files\n- **All capabilities dropped**: no ambient Linux capabilities\n- **Seccomp RuntimeDefault**: syscall filtering enabled\n- **Default-deny NetworkPolicy**: only DNS (53) and HTTPS (443) egress allowed; ingress limited to same namespace\n- **Minimal RBAC**: each instance gets its own ServiceAccount with read-only access to its own ConfigMap; operator can create/update Secrets only for operator-managed gateway tokens\n- **No automatic token mounting**: `automountServiceAccountToken: false` on both ServiceAccounts and pod specs (enabled only when `selfConfigure` is active)\n- **Secret validation**: the operator checks that all referenced Secrets exist and sets a `SecretsReady` condition\n- **Security context propagation**: when `podSecurityContext.runAsNonRoot` is set to `false`, the operator propagates this to init containers and applicable sidecars (tailscale, web terminal) so there is no contradiction between pod-level and container-level settings. Self-consistent sidecars (gateway-proxy, chromium, ollama) retain their own security contexts. The `containerSecurityContext.runAsNonRoot` and `containerSecurityContext.runAsUser` fields allow granular control over the main container independently of the pod level.\n\n### Validating webhook\n\n| Check | Severity | Behavior |\n|-------|----------|----------|\n| `runAsUser: 0` | Error | Blocked: root execution not allowed |\n| Reserved init container name | Error | `init-config`, `init-pnpm`, `init-python`, `init-skills`, `init-ollama` are reserved |\n| Invalid skill name | Error | Only alphanumeric, `-`, `_`, `/`, `.`, `@` allowed (max 128 chars). `npm:` prefix for npm packages, `pack:` prefix for skill packs; bare `npm:` or `pack:` is rejected |\n| Invalid CA bundle config | Error | Exactly one of `configMapName` or `secretName` must be set |\n| JSON5 with inline raw config | Error | JSON5 requires `configMapRef` (inline must be valid JSON) |\n| JSON5 with merge mode | Error | JSON5 is not compatible with `mergeMode: merge` |\n| Invalid `checkInterval` | Error | Must be a valid Go duration between 1h and 168h |\n| Invalid `healthCheckTimeout` | Error | Must be a valid Go duration between 2m and 30m |\n\n\u003cdetails\u003e\n\u003csummary\u003eWarning-level checks (deployment proceeds with a warning)\u003c/summary\u003e\n\n| Check | Behavior |\n|-------|----------|\n| NetworkPolicy disabled | Deployment proceeds with a warning |\n| Ingress without TLS | Deployment proceeds with a warning |\n| Chromium without digest pinning | Deployment proceeds with a warning |\n| Ollama without digest pinning | Deployment proceeds with a warning |\n| Web terminal without digest pinning | Deployment proceeds with a warning |\n| Ollama runs as root | Required by official image; informational |\n| Auto-update with digest pin | Digest overrides auto-update; updates won't apply |\n| `readOnlyRootFilesystem` disabled | Proceeds with a security recommendation |\n| No AI provider keys detected | Scans `env`/`envFrom` for known provider env vars |\n| Unknown config keys | Warns on unrecognized top-level keys in `spec.config.raw` |\n\n\u003c/details\u003e\n\n## Observability\n\n### Prometheus metrics\n\n| Metric | Type | Description |\n|--------|------|-------------|\n| `openclaw_reconcile_total` | Counter | Reconciliations by result (success/error) |\n| `openclaw_reconcile_duration_seconds` | Histogram | Reconciliation latency |\n| `openclaw_instance_phase` | Gauge | Current phase per instance |\n| `openclaw_instance_info` | Gauge | Instance metadata for PromQL joins (always 1) |\n| `openclaw_instance_ready` | Gauge | Whether instance pod is ready (1/0) |\n| `openclaw_managed_instances` | Gauge | Total number of managed instances |\n| `openclaw_resource_creation_failures_total` | Counter | Resource creation failures |\n| `openclaw_autoupdate_checks_total` | Counter | Auto-update version checks by result |\n| `openclaw_autoupdate_applied_total` | Counter | Successful auto-updates applied |\n| `openclaw_autoupdate_rollbacks_total` | Counter | Auto-update rollbacks triggered |\n\nWhen `metrics.enabled: true` (the default), the operator automatically configures a full metrics pipeline: it injects `diagnostics.otel` config into OpenClaw to push OTLP metrics to a lightweight OTel Collector sidecar (`otel/opentelemetry-collector`), which exposes a Prometheus scrape endpoint on the configured port (default 9090). No manual OpenClaw configuration is needed. If you already set `diagnostics.otel` in your instance config, the operator preserves your settings.\n\n### ServiceMonitor\n\n```yaml\nspec:\n  observability:\n    metrics:\n      enabled: true\n      serviceMonitor:\n        enabled: true\n        interval: 15s\n        labels:\n          release: prometheus\n```\n\n### OTLP metrics export (operator)\n\nThe operator can push its own metrics (reconciliation counters, workqueue stats, client latencies, etc.) to any OTLP-compatible backend via gRPC. This bridges all Prometheus metrics to OpenTelemetry, running alongside the existing Prometheus scrape endpoint.\n\n```yaml\n# values.yaml\notlp:\n  enabled: true\n  endpoint: \"otel-collector.observability.svc:4317\"\n  insecure: true  # set to false for TLS\n```\n\nThe endpoint can also be configured via the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable. Metrics are pushed every 30 seconds. If the OTLP endpoint is unreachable, the operator logs a warning and continues operating normally.\n\n### PrometheusRule (alerts)\n\nAuto-provisions a PrometheusRule with 7 alerts including runbook URLs:\n\n```yaml\nspec:\n  observability:\n    metrics:\n      prometheusRule:\n        enabled: true\n        labels:\n          release: kube-prometheus-stack  # must match Prometheus ruleSelector\n        runbookBaseURL: https://openclaw.rocks/docs/runbooks  # default\n```\n\nAlerts: `OpenClawReconcileErrors`, `OpenClawInstanceDegraded`, `OpenClawSlowReconciliation`, `OpenClawPodCrashLooping`, `OpenClawPodOOMKilled`, `OpenClawPVCNearlyFull`, `OpenClawAutoUpdateRollback`\n\n### Grafana dashboards\n\nAuto-provisions two Grafana dashboard ConfigMaps (discovered via the `grafana_dashboard: \"1\"` label):\n\n```yaml\nspec:\n  observability:\n    metrics:\n      grafanaDashboard:\n        enabled: true\n        folder: OpenClaw  # Grafana folder (default)\n        labels:\n          grafana_dashboard_instance: my-grafana  # optional extra labels\n```\n\nDashboards:\n- **OpenClaw Operator** - fleet overview with reconciliation metrics, instance table, workqueue, and auto-update panels\n- **OpenClaw Instance** - per-instance detail with CPU, memory, storage, network, and pod health panels\n\n### Auto-Scaling (HPA)\n\nEnable horizontal pod auto-scaling to automatically adjust the number of replicas based on CPU and memory utilization:\n\n```yaml\nspec:\n  availability:\n    autoScaling:\n      enabled: true\n      minReplicas: 1\n      maxReplicas: 10\n      targetCPUUtilization: 80\n      targetMemoryUtilization: 70  # optional\n```\n\nWhen enabled, the operator creates a `HorizontalPodAutoscaler` targeting the StatefulSet and sets the StatefulSet's replica count to nil so the HPA manages scaling. The HPA is deleted when auto-scaling is disabled.\n\nWhen auto-scaling is combined with persistent storage:\n\n- Each replica gets its own PVC via StatefulSet `VolumeClaimTemplates` (named `data-\u003cinstance\u003e-\u003cordinal\u003e`)\n- PVCs inherit `size`, `storageClass`, and `accessModes` from `spec.storage.persistence`\n- Retention policy is `Retain` for both scale-down and deletion -- data is preserved\n- If auto-scaling is later disabled, per-replica PVCs become orphaned and must be cleaned up manually\n\n### Instance Suspension\n\nTemporarily scale an instance to zero replicas without deleting it:\n\n```yaml\nspec:\n  suspended: true\n```\n\nWhen suspended:\n\n- The StatefulSet scales to 0 replicas (pods terminate)\n- All non-runtime resources (Service, ConfigMap, RBAC, NetworkPolicy, PVC) remain fully managed\n- Phase becomes `Suspended`, Ready condition becomes `False`\n- Auto-updates are paused until the instance is resumed\n- `openclaw_instance_ready` metric reports `0`\n\nResume by setting `spec.suspended: false`. The instance returns to `Running` phase through the normal startup lifecycle.\n\n\u003e **Note:** `spec.suspended` and `spec.availability.autoScaling.enabled` are mutually exclusive. Disable auto-scaling before suspending.\n\n### Topology Spread Constraints\n\nSpread pods across topology domains (zones, nodes) for improved availability:\n\n```yaml\nspec:\n  availability:\n    topologySpreadConstraints:\n      - maxSkew: 1\n        topologyKey: topology.kubernetes.io/zone\n        whenUnsatisfiable: DoNotSchedule\n        labelSelector:\n          matchLabels:\n            app.kubernetes.io/instance: my-instance\n```\n\n### Runtime Class\n\nSchedule pods on alternative container runtimes (Kata Containers, gVisor, etc.) for VM-level isolation or security hardening:\n\n```yaml\nspec:\n  availability:\n    runtimeClassName: kata-fc\n```\n\nA matching `RuntimeClass` resource must exist in the cluster. If unset, the default container runtime is used.\n\n### Pod Annotations\n\nMerge extra annotations into the StatefulSet pod template. Operator-managed keys (`openclaw.rocks/config-hash`, `openclaw.rocks/secret-hash`) always take precedence and cannot be overridden.\n\nUseful for cloud-provider hints, such as preventing GKE Autopilot from evicting long-running agent pods:\n\n```yaml\nspec:\n  podAnnotations:\n    cluster-autoscaler.kubernetes.io/safe-to-evict: \"false\"\n```\n\nPhases: `Pending` -\u003e `Restoring` -\u003e `Provisioning` -\u003e `Running` | `Updating` | `BackingUp` | `Degraded` | `Failed` | `Terminating`\n\n## Deployment Guides\n\nPlatform-specific deployment guides are available for:\n\n- [AWS EKS](docs/deployment.md#aws-eks)\n- [Google GKE](docs/deployment.md#google-gke)\n- [Azure AKS](docs/deployment.md#azure-aks)\n- [Kind (local development)](docs/deployment.md#kind)\n\n## Development\n\n```bash\n# Clone and set up\ngit clone https://github.com/OpenClaw-rocks/openclaw-operator.git\ncd openclaw-operator\ngo mod download\n\n# Generate code and manifests\nmake generate manifests\n\n# Run tests\nmake test\n\n# Run linter\nmake lint\n\n# Run locally against a Kind cluster\nkind create cluster\nmake install\nmake run\n```\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for the full development guide.\n\n## Roadmap\n\n- **v1.0.0**: API graduation to `v1`, conformance test suite, semver constraints for auto-update, HPA integration, cert-manager integration, multi-cluster support\n\nSee the full [roadmap](ROADMAP.md) for details.\n\n## Don't Want to Self-Host?\n\n[OpenClaw.rocks](https://openclaw.rocks) offers fully managed hosting starting at **EUR 15/mo**. No Kubernetes cluster required. Setup, updates, and 24/7 uptime handled for you.\n\n## Contributing\n\nContributions are welcome. Please open an issue to discuss significant changes before submitting a PR. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n## Disclaimer: AI-Assisted Development\n\nThis repository is developed and maintained collaboratively by a human and [Claude Code](https://claude.ai/claude-code). This includes writing code, reviewing and commenting on issues, triaging bugs, and merging pull requests. The human reads everything and acts as the final guard, but Claude does the heavy lifting - from diagnosis to implementation to CI.\n\nIn the future, this repo may be fully autonomously operated, whether we humans like that or not.\n\n## License\n\nApache License 2.0, the same license used by Kubernetes, Prometheus, and most CNCF projects. See [LICENSE](LICENSE) for details.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenclaw-rocks%2Fopenclaw-operator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenclaw-rocks%2Fopenclaw-operator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenclaw-rocks%2Fopenclaw-operator/lists"}