https://github.com/amcheste/kagents
A Kubernetes-native platform for orchestrating AI knowledge work teams. Define teams declaratively, run them on your cluster, deliver results automatically.
https://github.com/amcheste/kagents
agent-orchestration ai-agents claude claude-code cloud-native knowledge-work kubernetes kubernetes-operator
Last synced: 1 day ago
JSON representation
A Kubernetes-native platform for orchestrating AI knowledge work teams. Define teams declaratively, run them on your cluster, deliver results automatically.
- Host: GitHub
- URL: https://github.com/amcheste/kagents
- Owner: amcheste
- License: other
- Created: 2026-04-03T21:27:33.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-06-08T01:55:33.000Z (17 days ago)
- Last Synced: 2026-06-08T03:21:21.277Z (17 days ago)
- Topics: agent-orchestration, ai-agents, claude, claude-code, cloud-native, knowledge-work, kubernetes, kubernetes-operator
- Language: Go
- Size: 2.16 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
- Agents: AGENTS.md
Awesome Lists containing this project
README

# kagents
**Run Claude Code Agent Teams as a Kubernetes operator.**
[](https://github.com/amcheste/claude-teams-operator/actions/workflows/validate.yml)
[](https://github.com/amcheste/claude-teams-operator/releases)
[](LICENSE)
[](go.mod)
---
> **kagents** is the project brand. The implementation lives in the [`claude-teams-operator`](https://github.com/amcheste/claude-teams-operator) repository and ships under the `claude.amcheste.io/v1alpha1` API group. Documentation site: [kagents.dev](https://kagents.dev) (under construction. See [v0.7.0 milestone](https://github.com/amcheste/claude-teams-operator/milestone/8)).
Claude Code [Agent Teams](https://docs.anthropic.com/en/docs/claude-code/agent-teams) let multiple Claude Code instances collaborate. A lead coordinates work via a shared task list while teammates communicate through peer-to-peer mailboxes. Natively this runs on a single machine using tmux. This operator lifts that pattern into Kubernetes so you can run large-scale agent teams on your cluster.
## Modes
The operator supports two distinct use cases controlled by a single field in the `AgentTeam` spec:
| Mode | Use when | Key field |
|------|----------|-----------|
| **Coding** | Agents work on a git repository | `spec.repository` |
| **Cowork** | Agents produce documents, reports, emails, analysis | `spec.workspace` |
Both modes share the same coordination protocol (shared PVCs, mailboxes, task lists) and all Cowork extensions (Skills, MCP servers, approval gates).
## Features
- **Native Agent Teams protocol**. Preserves Anthropic's file-based mailbox and task list format over ReadWriteMany PVCs; no protocol translation
- **Per-teammate git worktrees**. Each coding agent works on an isolated branch to prevent merge conflicts
- **Cowork mode**. Mount ConfigMap/PVC inputs and collect outputs without requiring a git repo
- **Skills as CRD fields**. Mount Claude Code skills from ConfigMaps into each agent's `.claude/skills/`
- **MCP servers per agent**. Configure Model Context Protocol connections per teammate
- **Approval gates**. Pause spawning specific teammates until a human applies an annotation
- **Budget enforcement**. Terminate the team if estimated API cost exceeds a configured limit
- **Timeout enforcement**. Terminate the team after a configurable wall-clock duration
- **`dependsOn` ordering**. Spawn teammates only after their declared dependencies complete
- **Reusable templates**. Define team patterns with `AgentTeamTemplate`, instantiate with `AgentTeamRun`
## Quick Start
### Prerequisites
- Kubernetes 1.28+
- ReadWriteMany PVC support (NFS, EFS, or a compatible CSI driver. See [ARCHITECTURE.md § Storage Requirements](ARCHITECTURE.md#storage-requirements) for options)
- Claude Code CLI access (Max subscription or API key)
- Opus 4.6 model access (required for Agent Teams)
### Install on an existing cluster (Helm)
```bash
# 1. Install the operator (CRDs + controller + RBAC)
helm install claude-teams-operator \
oci://ghcr.io/amcheste/charts/claude-teams-operator \
--namespace claude-teams-system --create-namespace
# 2. Create an API key secret in the namespace where your teams will run
kubectl create namespace dev-agents
kubectl create secret generic anthropic-api-key \
--namespace dev-agents \
--from-literal=ANTHROPIC_API_KEY=sk-ant-...
# 3. Apply a sample team
kubectl apply -n dev-agents -f \
https://raw.githubusercontent.com/amcheste/claude-teams-operator/main/config/samples/auth-refactor-team.yaml
# 4. Watch the team progress
kubectl get agentteams -n dev-agents -w
kubectl describe agentteam auth-refactor -n dev-agents
```
### Local development with Kind
For contributors and anyone who wants to run the full stack from source:
```bash
# 1. Create a Kind cluster with NFS provisioner
make kind-create
# 2. Build and load images into Kind
make docker-build docker-build-runner kind-load
# 3. Install CRDs and deploy the operator
make install deploy
# 4. Create your API key secret
kubectl create secret generic anthropic-api-key \
--namespace dev-agents \
--from-literal=ANTHROPIC_API_KEY=sk-ant-...
# 5. Apply a sample team
kubectl apply -f config/samples/auth-refactor-team.yaml
```
See [CONTRIBUTING.md](CONTRIBUTING.md) for the full dev loop (testing, linting, manifest regeneration).
## Example: Coding Team
```yaml
apiVersion: claude.amcheste.io/v1alpha1
kind: AgentTeam
metadata:
name: auth-refactor
namespace: dev-agents
spec:
repository:
url: "git@github.com:acme/backend.git"
branch: "main"
credentialsSecret: "git-credentials"
auth:
apiKeySecret: "anthropic-api-key"
lead:
model: "opus"
prompt: |
Coordinate the migration from JWT to OAuth2.
Assign backend-api, frontend-auth, and test-coverage to their tracks.
Validate integration when all tracks complete.
teammates:
- name: "backend-api"
model: "sonnet"
prompt: "Implement OAuth2 endpoints. Remove JWT middleware."
scope:
includePaths: ["src/api/auth/", "src/middleware/"]
- name: "test-coverage"
model: "sonnet"
prompt: "Write comprehensive tests for the OAuth2 migration."
dependsOn: ["backend-api"]
lifecycle:
timeout: "2h"
budgetLimit: "30.00"
onComplete: "create-pr"
pullRequest:
targetBranch: "main"
titleTemplate: "feat(auth): migrate from JWT to OAuth2"
```
## Example: Cowork Team
```yaml
apiVersion: claude.amcheste.io/v1alpha1
kind: AgentTeam
metadata:
name: q3-report
namespace: cowork-agents
spec:
workspace:
inputs:
- configMap: "quarterly-data"
mountPath: "/workspace/data"
output:
mountPath: "/workspace/output"
size: "5Gi"
auth:
apiKeySecret: "anthropic-api-key"
lead:
model: "opus"
prompt: "Coordinate the Q3 business report. Assign research, writing, and design."
teammates:
- name: "researcher"
model: "sonnet"
prompt: "Analyse the data in /workspace/data and produce a findings summary."
skills:
- name: "data-analysis"
source:
configMap: "data-analysis-skill"
- name: "email-drafter"
model: "sonnet"
prompt: "Draft follow-up emails for all Q3 prospects."
mcpServers:
- name: "gmail"
url: "https://gmail.mcp.example.com/mcp"
lifecycle:
timeout: "3h"
budgetLimit: "15.00"
approvalGates:
- event: "spawn-email-drafter"
channel: "webhook"
webhookUrl: "https://hooks.example.com/approvals"
```
Grant approval after reviewing the researcher's output:
```bash
kubectl annotate agentteam q3-report \
"approved.claude.amcheste.io/spawn-email-drafter=true" \
-n cowork-agents
```
## CRD Reference
### AgentTeam
The primary resource. Defines the full team, its workspace, lifecycle, and observability config.
| Field | Type | Description |
|-------|------|-------------|
| `spec.repository` | `RepositorySpec` | Git repo config (coding mode). Optional when `spec.workspace` is set. |
| `spec.workspace` | `WorkspaceSpec` | Input/output volumes (Cowork mode). Optional when `spec.repository` is set. |
| `spec.auth` | `AuthSpec` | API key or OAuth secret reference. |
| `spec.lead` | `LeadSpec` | Lead agent model, prompt, skills, and MCP servers. |
| `spec.teammates` | `[]TeammateSpec` | Worker agents with optional `dependsOn`, `scope`, `skills`, `mcpServers`. |
| `spec.lifecycle.timeout` | `string` | Max duration, e.g. `"4h"`. Defaults to `"4h"`. |
| `spec.lifecycle.budgetLimit` | `string` | Max USD spend, e.g. `"10.00"`. No limit if unset. |
| `spec.lifecycle.onComplete` | `string` | `create-pr` \| `push-branch` \| `notify` \| `none` |
| `spec.lifecycle.approvalGates` | `[]ApprovalGateSpec` | Human-in-the-loop gates before spawning a teammate. |
### AgentTeamTemplate
A reusable team pattern. Does not run on its own. Instantiate with `AgentTeamRun`.
### AgentTeamRun
Instantiates an `AgentTeamTemplate` against a specific repo or workspace.
```yaml
apiVersion: claude.amcheste.io/v1alpha1
kind: AgentTeamRun
metadata:
name: q4-security-review
spec:
templateRef:
name: fullstack-review
repository:
url: "git@github.com:acme/platform.git"
branch: "release/4.0"
credentialsSecret: "git-credentials"
auth:
apiKeySecret: "anthropic-api-key"
lead:
model: "opus"
prompt: "Run a full security, performance, and test quality review."
```
## Status
Watch team progress. The `Ready` column reports `running+completed/total` teammates, so `2/3` means two of three workers are up (or have finished) while one is still spawning or blocked on a dependency:
```bash
kubectl get agentteams -A
# NAME PHASE READY TASKS DONE COST AGE
# auth-refactor Running 2/3 7 $1.42 14m
# q3-report Completed 2/2 12 $3.80 2h
```
Inspect details, including operator events emitted at every phase transition:
```bash
kubectl describe agentteam auth-refactor -n dev-agents
# Status:
# Phase: Running
# Ready: 2/3
# Estimated Cost: 1.42
# Lead:
# Pod Name: auth-refactor-lead
# Phase: Running
# Teammates:
# - Name: backend-api, Phase: Running
# - Name: test-coverage, Phase: Waiting (dependsOn: backend-api)
# Events:
# Normal Initializing 5m agentteam-controller Provisioned PVCs and launched init Job
# Normal Running 4m agentteam-controller All agent pods started
```
## Approval Gates
Approval gates block a teammate from being spawned until a human applies an annotation.
```bash
# Inspect which teammates are waiting
kubectl get agentteam my-team -o jsonpath='{.status.teammates[*].pendingApproval}'
# Grant approval
kubectl annotate agentteam my-team \
"approved.claude.amcheste.io/spawn-email-drafter=true"
```
If `channel: webhook` is set, the operator POSTs a JSON payload to `webhookUrl` when the gate is triggered, allowing an external system to present the approval to a human and then apply the annotation.
## Documentation
This README is the entry point. For deeper dives, every topic lives in a dedicated in-repo document:
| Document | Read when you want to… |
|----------|-----------------------|
| [ARCHITECTURE.md](ARCHITECTURE.md) | Understand how the operator models Agent Teams. Phase state machine, PVC layout, RWX storage backends, coordination protocol, key design tradeoffs. |
| [TESTING.md](TESTING.md) | See the test strategy (unit / integration / acceptance / E2E), how to run each suite, and what each one actually verifies. |
| [CONTRIBUTING.md](CONTRIBUTING.md) | Set up a dev environment, run the full build/test loop, follow the branch + PR workflow, and walk through "How to add a new reconciler feature." |
| [docs/helm-values.md](docs/helm-values.md) | Tune the Helm chart. Every value documented with defaults and production override recipes. |
| [SECURITY.md](SECURITY.md) | Report a vulnerability or review the project's security policy. |
| [KUBECON.md](KUBECON.md) | See the talk framing and "interesting problems" log. Useful context for why specific architectural choices were made. |
## Development
Common Makefile targets (full loop in [CONTRIBUTING.md](CONTRIBUTING.md)):
```bash
make build # Build operator binary
make test # Run all tests
make lint # Run golangci-lint
make manifests # Regenerate CRD manifests
make generate # Regenerate deepcopy methods
```
## License
Apache 2.0