An open API service indexing awesome lists of open source software.

https://github.com/bigg01/claude-mate-agent

๐Ÿค– Enterprise Claude Code agent for Kubernetes โ˜ธ๏ธ โ€” static service or on-demand CI/CD job ๐Ÿณ. Helm chart for K8s / OpenShift / AKS with personas ๐ŸŽญ, multi-provider LLM gateway, audit + OTEL ๐Ÿ“Š, sandboxes ๐Ÿ“ฆ. GitLab CI + GitHub Actions ready ๐Ÿ
https://github.com/bigg01/claude-mate-agent

ai-agents anthropic argocd claude claude-code container devops fluxcd github-actions gitlab-ci gitops helm helm-chart kubernetes llm llm-gateway mcp openshift opentelemetry python

Last synced: 11 days ago
JSON representation

๐Ÿค– Enterprise Claude Code agent for Kubernetes โ˜ธ๏ธ โ€” static service or on-demand CI/CD job ๐Ÿณ. Helm chart for K8s / OpenShift / AKS with personas ๐ŸŽญ, multi-provider LLM gateway, audit + OTEL ๐Ÿ“Š, sandboxes ๐Ÿ“ฆ. GitLab CI + GitHub Actions ready ๐Ÿ

Awesome Lists containing this project

README

          


Claude Mate Agent

Claude Mate Agent


Enterprise-grade Claude Code agent platform for Kubernetes and Red Hat OpenShift.


CI
Security
Pages
Release
GHCR
License


Kubernetes
OpenShift
Helm
Node.js
Python
OCI
Trivy
DORA

---

Claude Mate Agent packages the [Claude Code CLI](https://claude.ai/code) as a production-grade Kubernetes workload with defense-in-depth security, multi-provider LLM routing, full DORA-metric telemetry, and an SDLC quality-gate pipeline. The runtime image is built from `ubi9-minimal` with no package managers, no Python interpreter, and no build tools in the final layer.

## Key capabilities

| Pillar | What you get |
|---|---|
| **Execution** | Static long-running Deployment ยท on-demand CI/CD Job ยท isolated [sandbox](docs/sandbox.md) (one-shot K8s Job with gVisor / Kata / experimental NVIDIA OpenShell, ephemeral workspace, TTL cleanup) |
| **Connectivity** | Direct Anthropic ยท Kong AI Gateway ยท LiteLLM ยท OpenRouter ยท Azure AI Foundry ยท Vertex AI ยท NVIDIA NIM ยท **local Ollama / vLLM / LM Studio** (air-gapped, no API key) โ€” switch with one Helm value, no image rebuild ([details](docs/llm-gateway.md)) |
| **Personas** | Architect ยท Security ยท DevOps ยท SRE โ€” each with a curated system prompt and Claude CLI tool allow-list (security persona is read-only) |
| **Guardrails** | Five opt-in runtime controls โ€” [cost cap](docs/guardrails.md) ยท input/output content scrubbing (api-keys / credentials / PII / RFC1918) ยท `.claudeignore` workspace allowlist ยท per-persona intent denylist. Each is independent; zero overhead when disabled. |
| **Routing** | Kubernetes Ingress ยท OpenShift Route ยท Gateway API HTTPRoute โ€” same chart, capability-gated templates |
| **GitOps** | ArgoCD `Application` and FluxCD `HelmRelease` examples with automated sync, pruning, and self-heal |
| **Observability** | Always-on Prometheus `/metrics` ยท opt-in OTEL OTLP ยท Grafana **agent** + **DORA** dashboards auto-provisioned ยท structured JSON audit logs |
| **Quality gates** | Trivy CVE + IaC scan ยท Bandit + Semgrep SAST ยท Gitleaks ยท CycloneDX SBOM ยท pytest coverage with `--cov-fail-under` floor ยท Renovate for deps |
| **DORA telemetry** | Deployment Frequency, Lead Time, Change Failure Rate, MTTR โ€” emitted from every CI deploy job ([details](docs/dora-metrics.md)) |
| **Enterprise infra** | Artifactory mirrors for Docker/PyPI/npm/Helm ยท NVIDIA Container Runtime for GPU ยท Vault Agent Injector + Secrets Operator ยท cert-manager integration |

## Defense-in-depth protection

Seven independent security layers, each useful even if every other layer is breached:

| # | Layer | Controls |
|---|---|---|
| 1 | **Image** | `ubi9-minimal` base ยท no pip/npm/dnf/python in runtime ยท PyInstaller-compiled single binary ยท Renovate-tracked base/dep versions |
| 2 | **Container** | `readOnlyRootFilesystem: true` ยท `runAsNonRoot` + arbitrary UID for OpenShift SCC ยท `capabilities.drop: ALL` ยท seccomp `RuntimeDefault` ยท pinned Claude Code CLI version |
| 3 | **Network** | NetworkPolicy enabled by default ยท operator-defined egress allow-list ยท sandbox NetworkPolicy blocks all ingress ยท RFC 1918 excluded from default sandbox egress |
| 4 | **Sandbox** | One-shot K8s Job ยท `automountServiceAccountToken: false` ยท optional gVisor / Kata / experimental [NVIDIA OpenShell](docs/sandbox.md#nvidia-openshell-experimental) `runtimeClassName` (the last with inference-routing policy) ยท `activeDeadlineSeconds` hard cap ยท `ttlSecondsAfterFinished` auto-cleanup ยท ephemeral `/workspace` volume |
| 5 | **Identity** | API key from K8s Secret (never image-baked) ยท persona-bound Claude tool allow-list (`security` is read-only) ยท passive SIEM audit annotations on every pod ยท Vault Agent Injector option |
| 6 | **Content / DLP** | Runtime [guardrails](docs/guardrails.md): per-task + hourly cost cap ยท pre-flight input scrubbing ยท post-task output scrubbing (redact or block on api-keys, PEM, SSN, CC, RFC1918) ยท `.claudeignore` workspace allowlist ยท per-persona intent denylist. All opt-in via Helm. |
| 7 | **Supply chain** | Trivy `image` + `fs` + `config` (fixed CRITICAL/HIGH blocks merge) ยท Bandit + Semgrep SAST (SARIF โ†’ Code Scanning) ยท Gitleaks secret scan ยท Syft CycloneDX SBOM (90-day retention) ยท `.trivyignore` + `.gitleaks.toml` allowlists with rationale |

See [Security & Compliance](docs/security.md) and [Security Scanning](docs/security-scanning.md) for the full controls catalogue.

## Quick start

### Fully local โ€” no API key, no cloud (recommended first run)

> **Runtime:** every command below works with either `docker compose` or `podman compose` (or via `make compose-up-*` which auto-detects). The compose files are tool-agnostic.

```bash
# Boot agent + Ollama + LiteLLM together; LiteLLM bridges Anthropic โ†” OpenAI
make compose-up-local-llm # auto-detects podman or docker

# One-time: pull a model into Ollama
docker compose -f docker-compose.yml -f docker-compose.local-llm.yml \
exec ollama ollama pull llama3.1:8b
# (or: podman compose ...)

# Run a one-shot task against the local model โ€” no ANTHROPIC_API_KEY needed
CLAUDE_TASK="say hello in exactly three words" \
docker compose -f docker-compose.yml -f docker-compose.local-llm.yml \
run --rm agent --once
```

### Against the real Anthropic API

```bash
# Build the image (auto-detects podman or docker)
make build

# Run the static server (health + metrics on :8080)
make run

# Run an on-demand Claude task locally
export ANTHROPIC_API_KEY=sk-ant-...
export CLAUDE_TASK="summarise the open issues in this repo"
make run-once

# Spin up the full observability stack (agent + Prometheus + Grafana + Pushgateway)
docker compose up
```

Grafana opens at with the **Claude Mate Agent** and **DORA Metrics** dashboards pre-loaded.

## Local quality gates

```bash
make test # pytest + coverage (50% floor)
make sast # Bandit Python SAST
make scan # Trivy filesystem + IaC + image
make secrets # Gitleaks
make sbom # Syft โ†’ sbom.cyclonedx.json
make security # all of the above, sequentially
```

## What's inside

| Component | Description |
|---|---|
| `container/app.py` | Python wrapper โ€” health/readiness/metrics server, persona-aware Claude subprocess runner, cost-tracking + audit |
| `container/tests/` | pytest unit tests + coverage config (50% floor) |
| `Dockerfile` | 3-stage multi-stage build: `python-builder` (uv + PyInstaller) โ†’ `node-builder` (npm + claude CLI) โ†’ `ubi9-minimal` runtime |
| `charts/claude-mate-agent` | Helm chart โ€” Ingress ยท Route ยท Gateway API HTTPRoute ยท sandbox Job ยท NetworkPolicy ยท cert-manager ยท Vault ยท NVIDIA GPU |
| `examples/` | Deployment overlays: static-kubernetes ยท static-openshift ยท gateway-api ยท monitoring ยท on-demand-gitlab ยท on-demand-github ยท argocd ยท fluxcd ยท personas ยท sandbox ยท nvidia-gpu ยท **llm-gateway** (10 providers including Ollama / vLLM / LM Studio) ยท **mcp-deploy** (drive `kubernetes-mcp-server` from Claude Code) |
| `docker-compose.*.yml` | Opt-in local overlays: `local-llm` (Ollama + LiteLLM) ยท `opensearch` (audit-log sink test) ยท `nvidia` (GPU passthrough) ยท `artifactory` (corporate mirror) |
| `grafana/dashboards/` | `claude-mate-agent.json` + `dora-metrics.json` โ€” auto-provisioned |
| `prometheus/` | Scrape config + `dora_rules.yml` (recording + alerting) |
| `scripts/dora-emit.sh` | Canonical DORA event emitter (deploy / failure / restore) |
| `.github/workflows/` | `ci.yml` (test + build + push) ยท `security.yml` (Trivy + Bandit + Semgrep + Gitleaks + SBOM โ†’ SARIF) ยท `deploy.yml` ยท `sandbox.yml` ยท `on-demand.yml` |
| `.gitlab-ci.yml` | `validate โ†’ test โ†’ build โ†’ scan โ†’ package โ†’ deploy โ†’ on-demand` with full quality-gate gating |
| `.github/renovate.json` | Renovate config for Python, Node, Dockerfile, Helm, Compose, Actions |

## Operating modes

| Mode | Lifecycle | When to use | How it runs |
|---|---|---|---|
| **Static** | Long-running Deployment | Always-on service with continuous metrics/health endpoints | `make run` / `helm upgrade --install` |
| **On-demand** | Short-lived CI job | Manual or scheduled tasks triggered from CI/CD | GitHub Actions `on-demand.yml` / GitLab `run:on-demand-agent` |
| **Sandbox** | One-shot K8s Job | Untrusted prompts, contractor work, per-request isolation | `helm template ... \| kubectl create -f -` ([details](docs/sandbox.md)) |

## Observability

The platform emits three classes of telemetry:

1. **Service metrics** โ€” `claude_mate_agent_*` on `/metrics` (always on) and OTLP (opt-in via `OTEL_ENABLED=true`)
2. **Cost + audit** โ€” structured JSON with `task_cost_summary`, role, CI system, commit SHA, pod identifiers
3. **DORA** โ€” `dora_deployments_total`, `dora_lead_time_seconds`, `dora_change_failures_total`, `dora_restore_seconds` emitted via Pushgateway, surfaced on the Grafana DORA dashboard

DORA failure definition is codified in CI: rollout timeout, probe failure, or explicit `dora-emit.sh failure` within 24 h of deploy. Targets and alerting rules are in [`docs/dora-metrics.md`](docs/dora-metrics.md).

## Documentation

Full docs in [`docs/`](docs/), served with MkDocs Material:

```bash
make docs-serve # live preview at http://localhost:8000
make docs-build # build static site to site/
```

| Page | Purpose |
|---|---|
| [Getting Started](docs/getting-started.md) | Build, run, first task |
| [Local Development](docs/local-dev.md) | Compose overlays ยท fully-local Ollama stack ยท GPU passthrough |
| [Solution Architecture](docs/solution-architecture.md) | End-to-end reference architecture |
| [Container Internals](docs/architecture.md) | Two-layer design (agent + claude CLI), graceful shutdown |
| [Container Build](docs/container.md) | Multi-stage Dockerfile, PyInstaller, OTEL bundling |
| [Helm Chart](docs/helm-chart.md) | Values reference, routing, secrets |
| [GitLab CI/CD](docs/gitlab-ci.md) | Pipeline jobs and required variables |
| [GitHub Actions](docs/github-actions.md) | Workflows and required secrets |
| [Deploy via MCP](docs/mcp-deploy.md) | Drive `kubernetes-mcp-server` from Claude Code for interactive deploys |
| [Personas](docs/personas.md) | Architect / Security / DevOps / SRE roles |
| [LLM Gateway](docs/llm-gateway.md) | Provider routing โ€” Anthropic, Kong, LiteLLM, OpenRouter, Azure, Vertex AI, NVIDIA NIM, **Ollama / vLLM / LM Studio** |
| [Sandboxes](docs/sandbox.md) | Ephemeral one-shot Job execution ยท gVisor / Kata / **NVIDIA OpenShell** runtimes |
| [Guardrails](docs/guardrails.md) | Cost cap ยท input/output scrubbing ยท workspace allowlist ยท intent denylist |
| [Monitoring](docs/monitoring.md) | Metrics reference, OTEL setup, ServiceMonitor |
| [Security & Compliance](docs/security.md) | RBAC, SCC, NetworkPolicy, audit |
| [Security Scanning](docs/security-scanning.md) | Trivy, Bandit, Semgrep, Gitleaks, SBOM |
| [Quality Gates](docs/quality-gates.md) | SDLC stage โ†’ gate matrix, pipeline DAG |
| [DORA Metrics](docs/dora-metrics.md) | Definitions, targets, dashboard, alerting |
| [Versioning](docs/versioning.md) | SemVer scheme, release tags, version-bump helper |

## Requirements

See [`requirement.md`](requirement.md) for the full enterprise requirements catalogue covering Kubernetes/OpenShift support, container hardening, monitoring, logging, OpenShell protection, audit trail, remote log sync, team-mate roles, LLM gateways, GPU support, Artifactory mirrors, Claude sandboxes, security scanning, SAST, code coverage, SDLC quality gates, and DORA metrics.