https://github.com/kelos-dev/kelos

The Kubernetes-native framework for orchestrating autonomous AI coding agents.
https://github.com/kelos-dev/kelos

agentic-ai agentic-coding ai ai-agents ci-cd claude claude-code codex gemini kubernetes kubernetes-operator opencode

Last synced: about 1 month ago
JSON representation

The Kubernetes-native framework for orchestrating autonomous AI coding agents.

Host: GitHub
URL: https://github.com/kelos-dev/kelos
Owner: kelos-dev
License: apache-2.0
Created: 2026-02-01T04:03:17.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-02-27T17:24:45.000Z (3 months ago)
Last Synced: 2026-02-27T18:44:01.138Z (3 months ago)
Topics: agentic-ai, agentic-coding, ai, ai-agents, ci-cd, claude, claude-code, codex, gemini, kubernetes, kubernetes-operator, opencode
Language: Go
Homepage:
Size: 968 KB
Stars: 48
Watchers: 1
Forks: 8
Open Issues: 65
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

README

Kelos

Orchestrate autonomous AI coding agents on Kubernetes.

Quick Start ·
Kelos Skill ·
Kelos Developing Kelos ·
Examples ·
Integration ·
Reference ·
YAML Manifests

Kelos lets you **define your development workflow as Kubernetes resources** and run it continuously. Declare what triggers agents, what they do, and how they hand off — Kelos handles the rest.

Kelos develops Kelos through seven TaskSpawners run 24/7: triaging issues, planning implementations, fixing bugs, responding to PR feedback, testing DX, brainstorming improvements, and tuning their own prompts. [See the full pipeline below.](#kelos-developing-kelos)

Supports **Claude Code**, **OpenAI Codex**, **Google Gemini**, **OpenCode**, **Cursor**, and [custom agent images](docs/agent-image-interface.md).

## How It Works

Kelos orchestrates the flow from external events to autonomous execution:

kelos-resources

You define what needs to be done, and Kelos handles the "how" — from cloning the right repo and injecting credentials to running the agent and capturing its outputs (branch names, commit SHAs, PR URLs, and token usage).

### Core Primitives

Kelos is built on four resources:

1. **Tasks** — Ephemeral units of work that wrap an AI agent run.
2. **Workspaces** — Persistent or ephemeral environments (git repos) where agents operate.
3. **AgentConfigs** — Reusable bundles of agent instructions (`AGENTS.md`, `CLAUDE.md`), plugins (skills and agents), and MCP servers.
4. **TaskSpawners** — Orchestration engines that react to external triggers (GitHub, Cron) to automatically manage agent lifecycles.

TaskSpawner — Automatic Task Creation from External Sources

TaskSpawner watches external sources (e.g., GitHub Issues) and automatically creates Tasks for each discovered item.

```
polls new issues
TaskSpawner ─────────────▶ GitHub Issues
│ ◀─────────────
│
├──creates──▶ Task: fix-bugs-1
└──creates──▶ Task: fix-bugs-2
```

## Kelos Developing Kelos

Kelos develops itself. Seven TaskSpawners run 24/7, each handling a different part of the development lifecycle — fully autonomous.

kelos-self-development

| TaskSpawner | Trigger | Model | Description |
|---|---|---|---|
| **kelos-workers** | GitHub Issues (`actor/kelos`) | Opus | Picks up issues, creates or updates PRs, self-reviews, and ensures CI passes |
| **kelos-pr-responder** | GitHub Pull Requests (`generated-by-kelos`, `changes_requested`) | Opus | Re-engages on PR review feedback and updates the existing branch incrementally |
| **kelos-planner** | GitHub Issues (`/kelos plan` comment) | Opus | Investigates an issue and posts a structured implementation plan — advisory only, no code changes |
| **kelos-triage** | GitHub Issues (`needs-actor`) | Opus | Classifies issues by kind/priority, detects duplicates, and recommends an actor |
| **kelos-fake-user** | Cron (daily 09:00 UTC) | Sonnet | Tests DX as a new user — follows docs, tries CLI workflows, files issues for problems found |
| **kelos-fake-strategist** | Cron (every 12 hours) | Opus | Explores new use cases, workflow improvements, and integration opportunities |
| **kelos-self-update** | Cron (daily 06:00 UTC) | Opus | Reviews and tunes prompts, configs, and workflow files — the pipeline improves itself |

Here's a trimmed snippet of `kelos-workers.yaml` — enough to show the pattern:

```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: kelos-workers
spec:
when:
githubIssues:
labels: [actor/kelos]
excludeLabels: [kelos/needs-input]
priorityLabels:
- priority/critical-urgent
- priority/important-soon
pollInterval: 1m
maxConcurrency: 3
taskTemplate:
model: opus
type: claude-code
branch: "kelos-task-{{.Number}}"
promptTemplate: |
You are a coding agent. You either
- create a PR to fix the issue
- update an existing PR to fix the issue
- comment on the issue or the PR if you cannot fix it
...
```

The key pattern is `excludeLabels: [kelos/needs-input]` — this creates a feedback loop where the agent works autonomously until it needs human input, then pauses. Removing the label re-queues the issue on the next poll.

See the full manifest at [`self-development/kelos-workers.yaml`](self-development/kelos-workers.yaml) and the [`self-development/` README](self-development/README.md) for setup instructions.

## Why Kelos?

AI coding agents are evolving from interactive CLI tools into autonomous background workers — managed like infrastructure, not invoked like commands. Kelos provides the framework to manage this transition at scale.

- **Workflow as YAML** — Define your development workflow declaratively: what triggers agents, what they do, and how they hand off. Version-control it, review it in PRs, and GitOps it like any other infrastructure.
- **Orchestration, not just execution** — Don't just run an agent; manage its entire lifecycle. Chain tasks with `dependsOn` and pass results (branch names, PR URLs, token usage) between pipeline stages. Use `TaskSpawner` to build event-driven workers that react to GitHub issues, PRs, or schedules.
- **Host-isolated autonomy** — Each task runs in an isolated, ephemeral Pod with a freshly cloned git workspace. Agents have no access to your host machine — use [scoped tokens and branch protection](#security-considerations) to control repository access.
- **Standardized interface** — Plug in any agent (Claude, Codex, Gemini, OpenCode, Cursor, or your own) using a simple [container interface](docs/agent-image-interface.md). Kelos handles credential injection, workspace management, and Kubernetes plumbing.
- **Scalable parallelism** — Fan out agents across multiple repositories. Kubernetes handles scheduling, resource management, and queueing — scale is limited by your cluster capacity and API provider quotas.
- **Observable & CI-native** — Every agent run is a first-class Kubernetes resource with deterministic outputs (branch names, PR URLs, commit SHAs, token usage) captured into status. Monitor via `kubectl`, manage via the `kelos` CLI or declarative YAML (GitOps-ready), and integrate with ArgoCD or GitHub Actions.

## Quick Start

Get running in 5 minutes (most of the time is gathering credentials).

### Prerequisites

- Kubernetes cluster (1.28+)

Don't have a cluster? Create one locally with kind

1. [Install kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) (requires Docker)
2. Create a cluster:
```bash
kind create cluster
```

This creates a single-node cluster and configures your kubeconfig automatically.

### 1. Install the CLI

```bash
curl -fsSL https://raw.githubusercontent.com/kelos-dev/kelos/main/hack/install.sh | bash
```

Alternative: install from source

```bash
go install github.com/kelos-dev/kelos/cmd/kelos@latest
```

### 2. Install Kelos

```bash
kelos install
```

This installs the Kelos controller and CRDs into the `kelos-system` namespace.

Verify the installation:

```bash
kubectl get pods -n kelos-system
kubectl get crds | grep kelos.dev
```

### Helm Install

Kelos also publishes a Helm chart as an OCI artifact in GHCR.

To install Kelos with Helm:

```bash
helm upgrade --install kelos oci://ghcr.io/kelos-dev/charts/kelos \
-n kelos-system \
--create-namespace \
--version
```

This installs the controller and, by default, the Kelos CRDs.

For CRD migration, adopting existing CRDs into Helm ownership, and advanced chart usage, see [the Helm chart README](internal/manifests/charts/kelos/README.md).

### 3. Initialize Your Config

```bash
kelos init
```

Edit `~/.kelos/config.yaml`:

```yaml
oauthToken:
workspace:
repo: https://github.com/your-org/your-repo.git
ref: main
token: # optional, for private repos and pushing changes
```

How to get your credentials

**Claude OAuth token** (recommended for Claude Code):
Run `claude setup-token` locally and follow the prompts. This generates a long-lived token (valid for ~1 year). Copy the token from `~/.claude/credentials.json`.

**Anthropic API key** (alternative for Claude Code):
Create one at [console.anthropic.com](https://console.anthropic.com). Set `apiKey` instead of `oauthToken` in your config.

**Codex OAuth credentials** (for OpenAI Codex):
Run `codex auth login` locally, then reference the auth file in your config:
```yaml
oauthToken: "@~/.codex/auth.json"
type: codex
```
Or set `apiKey` with an OpenAI API key instead.

**GitHub token** (for pushing branches and creating PRs):
Create a [Personal Access Token](https://github.com/settings/tokens) with `repo` scope (and `workflow` if your repo uses GitHub Actions).

**GitHub App** (recommended for production/org use):
For organizations, [GitHub Apps](https://docs.github.com/en/apps) are preferred over PATs — they offer fine-grained permissions, higher rate limits, and don't depend on a specific user account. Use `githubApp` instead of `token` in your workspace config:
```yaml
workspace:
repo: https://github.com/your-org/repo.git
ref: main
githubApp:
appID: "12345"
installationID: "67890"
privateKeyPath: ~/.config/my-app.private-key.pem
```
See the [Workspace reference](docs/reference.md#workspace) for details.

> **Warning:** Without a workspace, the agent runs in an ephemeral pod — any files it creates are lost when the pod terminates. Always set up a workspace to get persistent results.

### 4. Run Your First Task

```bash
$ kelos run -p "Add a hello world program in Python"
task/task-r8x2q created

$ kelos logs task-r8x2q -f
```

The task name (e.g. `task-r8x2q`) is auto-generated. Use `--name` to set a custom name, or `-w` to watch task status after creation. To stream agent logs, run `kelos logs -f`.

The agent clones your repo, makes changes, and can push a branch or open a PR.

> **Tip:** If something goes wrong, check the controller logs with
> `kubectl logs deployment/kelos-controller-manager -n kelos-system`.

Using kubectl and YAML instead of the CLI

Create a `Workspace` resource to define a git repository:

```yaml
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
name: my-workspace
spec:
repo: https://github.com/your-org/your-repo.git
ref: main
```

Then reference it from a `Task`:

```yaml
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: hello-world
spec:
type: claude-code
prompt: "Create a hello world program in Python"
credentials:
type: oauth
secretRef:
name: claude-oauth-token
workspaceRef:
name: my-workspace
```

```bash
kubectl apply -f workspace.yaml
kubectl apply -f task.yaml
kubectl get tasks -w
```

Using an API key instead of OAuth

Set `apiKey` instead of `oauthToken` in `~/.kelos/config.yaml`:

```yaml
apiKey:
```

Or pass `--secret` to `kelos run` with a pre-created secret (api-key is the default credential type), or set `spec.credentials.type: api-key` in YAML.

## Kelos Skill

The [Kelos skill](skills/kelos/) teaches AI coding agents how to author and operate Kelos resources. Install it via [skills.sh](https://skills.sh):

```bash
npx skills add kelos-dev/kelos
```

Then ask your agent:

```
Using the /kelos skill, set up a TaskSpawner that watches GitHub issues
labeled "bug" and auto-creates Tasks to fix them.
```

The agent will generate the correct manifests, apply them, and troubleshoot any issues on your behalf.

## Examples

### Auto-fix GitHub issues with TaskSpawner

Create a TaskSpawner to automatically turn GitHub issues into agent tasks:

```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: fix-bugs
spec:
when:
githubIssues:
labels: [bug]
state: open
pollInterval: 5m
taskTemplate:
type: claude-code
workspaceRef:
name: my-workspace
credentials:
type: oauth
secretRef:
name: claude-oauth-token
promptTemplate: "Fix: {{.Title}}\n{{.Body}}"
```

```bash
kubectl apply -f taskspawner.yaml
```

TaskSpawner polls for new issues matching your filters and creates a Task for each one.

### Chain tasks into pipelines

Use `dependsOn` to chain tasks into pipelines. A task in `Waiting` phase stays paused until all its dependencies succeed:

```bash
kelos run -p "Scaffold a new user service" --name scaffold --branch feature/user-service
kelos run -p "Write tests for the user service" --depends-on scaffold --branch feature/user-service
```

Tasks sharing the same `branch` are serialized automatically — only one runs at a time.

YAML equivalent

```yaml
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: scaffold
spec:
type: claude-code
prompt: "Scaffold a new user service with CRUD endpoints"
credentials:
type: oauth
secretRef:
name: claude-oauth-token
workspaceRef:
name: my-workspace
branch: feature/user-service
---
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: write-tests
spec:
type: claude-code
prompt: "Write comprehensive tests for the user service"
credentials:
type: oauth
secretRef:
name: claude-oauth-token
workspaceRef:
name: my-workspace
branch: feature/user-service
dependsOn: [scaffold]
```

Downstream tasks can reference upstream results in their prompt using `{{.Deps}}`:

```yaml
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: open-pr
spec:
type: claude-code
prompt: |
Open a PR for branch {{index .Deps "write-tests" "Results" "branch"}}.
credentials:
type: oauth
secretRef:
name: claude-oauth-token
workspaceRef:
name: my-workspace
branch: feature/user-service
dependsOn: [write-tests]
```

The `.Deps` map is keyed by dependency Task name. Each entry has `Results` (key-value map with branch, commit, pr, etc.) and `Outputs` (raw output lines). See [examples/07-task-pipeline](examples/07-task-pipeline/) for a full three-stage pipeline.

### Create PRs automatically

Add a `token` to your workspace config:

```yaml
workspace:
repo: https://github.com/your-org/repo.git
ref: main
token:
```

```bash
kelos run -p "Fix the bug described in issue #42 and open a PR with the fix"
```

The `gh` CLI and `GITHUB_TOKEN` are available inside the agent container, so the agent can push branches and create PRs autonomously.

### Inject agent instructions and MCP servers

Use `AgentConfig` to bundle project-wide instructions, plugins, and MCP servers:

```yaml
apiVersion: kelos.dev/v1alpha1
kind: AgentConfig
metadata:
name: my-config
spec:
agentsMD: |
# Project Rules
Follow TDD. Always write tests first.
mcpServers:
- name: github
type: http
url: https://api.githubcopilot.com/mcp/
headers:
Authorization: "Bearer "
```

```bash
kelos run -p "Fix the bug" --agent-config my-config
```

- `agentsMD` is written to `~/.claude/CLAUDE.md` (user-level, additive with the repo's own instructions).
- `plugins` are mounted as plugin directories and passed via `--plugin-dir`.
- `mcpServers` are written to the agent's native MCP configuration. Supports `stdio`, `http`, and `sse` transport types.

See the [full AgentConfig spec](docs/reference.md#agentconfig) for plugins, skills, and agents configuration.

> Browse all ready-to-apply YAML manifests in the [`examples/`](examples/) directory.

## Integration

Kelos integrates with external systems in two ways:

**TaskSpawner** — Kelos natively watches external sources and automatically creates Tasks. Supports GitHub Issues, GitHub Pull Requests, Jira, and Cron schedules. No glue code needed.

```yaml
spec:
when:
githubIssues:
labels: [bug]
state: open
```

**Direct Task creation** — Create Task resources from your own workflows for full control. Any system that can run `kubectl apply` or call the Kubernetes API can trigger agent runs — GitHub Actions, CI/CD pipelines, scripts, Slack bots, or custom automation.

```bash
kelos run -p "Fix the flaky test in ci_test.go" --workspace my-workspace
```

See the [Integration guide](docs/integration.md) for examples of both approaches, including GitHub Actions workflows, Jira setup, and programmatic Task creation.

## Orchestration Patterns

- **Autonomous Self-Development** — Build a feedback loop where agents pick up issues, write code, self-review, and fix CI flakes until the task is complete. See the [self-development pipeline](#kelos-developing-kelos).
- **Event-Driven Bug Fixing** — Automatically spawn agents to investigate and fix bugs as soon as they are labeled in GitHub. See [Auto-fix GitHub issues](#auto-fix-github-issues-with-taskspawner).
- **Fleet-Wide Refactoring** — Orchestrate a "fan-out" where dozens of agents apply the same refactoring pattern across a fleet of microservices in parallel.
- **Hands-Free CI/CD** — Embed agents as first-class steps in your deployment pipelines to generate documentation or perform automated migrations.
- **AI Worker Pools** — Maintain a pool of specialized agents (e.g., "The Security Fixer") that developers can trigger via simple Kubernetes resources.

## Reference

| Resource | Key Fields | Full Spec |
|----------|-----------|-----------|
| **Task** | `type`, `prompt`, `credentials`, `workspaceRef`, `dependsOn`, `branch` | [Reference](docs/reference.md#task) |
| **Workspace** | `repo`, `ref`, `secretRef` (PAT or GitHub App), `files` | [Reference](docs/reference.md#workspace) |
| **AgentConfig** | `agentsMD`, `plugins`, `mcpServers` | [Reference](docs/reference.md#agentconfig) |
| **TaskSpawner** | `when`, `taskTemplate`, `pollInterval`, `maxConcurrency` | [Reference](docs/reference.md#taskspawner) |

CLI Reference

| Command | Description |
|---------|-------------|
| `kelos install` | Install Kelos CRDs and controller into the cluster |
| `kelos uninstall` | Uninstall Kelos from the cluster |
| `kelos init` | Initialize `~/.kelos/config.yaml` |
| `kelos run` | Create and run a new Task |
| `kelos get [name]` | List resources or view a specific resource (`tasks`, `taskspawners`, `workspaces`) |
| `kelos delete ` | Delete a resource |
| `kelos logs [-f]` | View or stream logs from a task |
| `kelos suspend taskspawner ` | Pause a TaskSpawner |
| `kelos resume taskspawner ` | Resume a paused TaskSpawner |

See [full CLI reference](docs/reference.md#cli-reference) for all flags and options.

## Security Considerations

Kelos runs agents in isolated, ephemeral Pods with no access to your host machine, SSH keys, or other processes. The risk surface is limited to what the injected credentials allow.

**What agents CAN do:** Push branches, create PRs, and call the GitHub API using the injected `GITHUB_TOKEN`.

**What agents CANNOT do:** Access your host, read other pods, reach other repositories, or access any credentials beyond what you explicitly inject.

Best practices:

- **Scope your GitHub tokens.** Use [fine-grained Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#fine-grained-personal-access-tokens) restricted to specific repositories instead of broad `repo`-scoped classic tokens.
- **Enable branch protection.** Require PR reviews before merging to `main`. Agents can push branches and open PRs, but protected branches prevent direct pushes to your default branch.
- **Use `maxConcurrency` and `maxTotalTasks`.** Limit how many tasks a TaskSpawner can create to prevent runaway agent activity.
- **Use `podOverrides.activeDeadlineSeconds`.** Set a timeout to prevent tasks from running indefinitely.
- **Audit via Kubernetes.** Every agent run is a first-class Kubernetes resource — use `kubectl get tasks` and cluster audit logs to track what was created and by whom.

> **About `--dangerously-skip-permissions`:** Claude Code uses this flag for non-interactive operation. Despite the name, the actual risk is minimal — agents run inside ephemeral containers with no host access. The flag simply disables interactive approval prompts, which is necessary for autonomous execution.

Kelos uses standard Kubernetes RBAC — use namespace isolation to separate teams. Each TaskSpawner automatically creates a scoped ServiceAccount and RoleBinding.

## Cost and Limits

Running AI agents costs real money. Here's how to stay in control:

**Model costs vary significantly.** Opus is the most capable but most expensive model. Use `spec.model` (or `model` in config) to choose cheaper models like Sonnet for routine tasks and reserve Opus for complex work. Check the [API pricing](https://docs.anthropic.com/en/docs/about-claude/pricing) page for current rates.

**Use `maxConcurrency` to cap spend.** Without it, a TaskSpawner can create unlimited concurrent tasks. If 100 issues match your filter on first poll, that's 100 simultaneous agent runs. Always set a limit:

```yaml
spec:
maxConcurrency: 3 # max 3 tasks running at once
maxTotalTasks: 50 # stop after 50 total tasks
```

**Use `podOverrides.activeDeadlineSeconds` to limit runtime.** Set a timeout per task to prevent agents from running indefinitely:

```yaml
spec:
podOverrides:
activeDeadlineSeconds: 3600 # kill after 1 hour
```

Or via the CLI:

```bash
kelos run -p "Fix the bug" --timeout 30m
```

**Use `suspend` for emergencies.** If costs are spiraling, pause a spawner immediately:

```bash
kelos suspend taskspawner my-spawner
# ... investigate ...
kelos resume taskspawner my-spawner
```

**Rate limits.** API providers enforce concurrency and token limits. If a task hits a rate limit mid-execution, it will likely fail. Use `maxConcurrency` to stay within your provider's limits.

## FAQ

What agents does Kelos support?

Kelos supports **Claude Code**, **OpenAI Codex**, **Google Gemini**, **OpenCode**, and **Cursor** out of the box. You can also bring your own agent image using the [container interface](docs/agent-image-interface.md).

Can I use Kelos without Kubernetes?

No. Kelos is built on Kubernetes Custom Resources and requires a Kubernetes cluster. For local development, use [kind](https://kind.sigs.k8s.io/) (`kind create cluster`) to create a single-node cluster on your machine.

Is it safe to give agents repo access?

Agents run in isolated, ephemeral Pods with no host access. Their capabilities are limited to what you inject — typically a scoped GitHub token. Use fine-grained PATs, branch protection, and `maxConcurrency` to control the blast radius. See [Security Considerations](#security-considerations).

How much does it cost to run?

Costs depend on the model and task complexity. Check the [API pricing](https://docs.anthropic.com/en/docs/about-claude/pricing) page for current rates. Use `maxConcurrency`, timeouts, and model selection to stay in budget. See [Cost and Limits](#cost-and-limits).

## Uninstall

```bash
kelos uninstall
```

## Development

Build, test, and iterate with `make`:

```bash
make update # generate code, CRDs, fmt, tidy
make verify # generate + vet + tidy-diff check
make test # unit tests
make test-integration # integration tests (envtest)
make test-e2e # e2e tests (requires cluster)
make build # build binary
make image # build docker image
```

## Contributing

1. Fork the repo and create a feature branch.
2. Make your changes and run `make verify` to ensure everything passes.
3. Open a pull request with a clear description of the change.

For significant changes, please open an issue first to discuss the approach.

We welcome contributions of all kinds — see [good first issues](https://github.com/kelos-dev/kelos/labels/good%20first%20issue) for places to start.

## License

[Apache License 2.0](LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kelos-dev/kelos

Awesome Lists containing this project

README

Kelos