https://github.com/ikamensh/kodo

Orchestrator for AI coding (claude code, cursor, codex, gemini)
https://github.com/ikamensh/kodo

Last synced: about 1 month ago
JSON representation

Orchestrator for AI coding (claude code, cursor, codex, gemini)

Host: GitHub
URL: https://github.com/ikamensh/kodo
Owner: ikamensh
License: mit
Created: 2026-02-18T09:40:03.000Z (3 months ago)
Default Branch: dev
Last Pushed: 2026-03-21T22:05:42.000Z (2 months ago)
Last Synced: 2026-03-21T22:34:14.828Z (2 months ago)
Language: Python
Homepage:
Size: 4.37 MB
Stars: 39
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

awesome-agent-orchestrators - kodo - Autonomous multi-agent coding orchestrator that directs Claude Code, Codex, and Gemini CLI agents through work cycles with independent verification. (Multi-Agent Swarms)
awesome-cli-coding-agents - kodo - agent coding orchestrator that directs Claude Code, Codex, and Gemini CLI through work cycles with independent architect and tester verification. SWE-bench verified. (Harnesses & orchestration / Orchestrators & autonomous loops)

README

Building while you sleep.

---

# 🦉 kodo

Autonomous multi-agent coding that runs overnight on your Claude Code Max subscription. An orchestrator directs Claude Code agents through work cycles with independent verification — so you wake up to tested, reviewed code instead of a stale terminal.

### [SWE-bench Verified: Kodo 57% vs Cursor 46%](https://kodo-bench-h2h-430011644943.europe-west1.run.app/)

On a 100-task head-to-head using the same underlying model (Cursor `composer-1.5`), adding Kodo's orchestration layer solves 24% more real-world GitHub issues. Same model, same prompt, same conditions — the difference is orchestration. [Full methodology and interactive results →](https://kodo-bench-h2h-430011644943.europe-west1.run.app/)

## Overview

Kodo modes overview — Goal, Improve, and Test

See [detailed mode diagrams](docs/modes_diagram.md) for the full pipeline of each mode.

## 🎬 How it works in practice

Real run from [blackopt](https://github.com/ikamen/blackopt) — building an auto-solving meta-optimizer with 4 new algorithms, adaptive scheduling, and 73 tests. **3 hours unattended, 2 cycles, succeeded.**

```
🔍 [00:00] orchestrator → architect
"Survey the codebase — Solver interface, existing algorithms,
where to add new ones."
📋 [03:04] architect reports back
Full architecture survey, found 3 bugs in existing code

🔧 [03:14] orchestrator → worker_smart
"Fix structural bugs identified by architect"
✅ [11:29] worker_smart: 82 turns of editing. All bugs fixed, tests pass.

⚡ [12:36] orchestrator → architect: "Analyze how to implement DE and PSO"
[15:22] orchestrator → worker_fast: "Implement TabuSearch and EDA"
[16:01] orchestrator → worker_smart: "Build autosolve() — concurrent
portfolio, adaptive scheduling"

🏁 [35:20] orchestrator → done("autosolve complete, 4 new algorithms")
→ tester: runs tests ✅
→ tester_browser: runs tests ✅
→ architect: "ProcessPool is never closed — resource leak" ❌
REJECTED

🔧 [45:37] orchestrator → worker_smart: "Fix the resource leak"
→ done() → architect: "class-variable contamination" ❌
REJECTED

... 7 more verification rounds ...
architect catches: time-slice state mutation, exponential
offspring, crossover edge case — each progressively more subtle

🎉 [2:59:50] → done() → tester ✅ → tester_browser ✅ → architect ✅
ACCEPTED — "4 new algorithms, autosolve() API, 73 tests pass"
```

The architect verifier caught **9 rounds of bugs** that the worker agent was blind to — resource leaks, class variable contamination, state mutation — each subtler than the last. A single Claude Code session would likely have shipped with several of these.

## 🦉 When to use kodo

You have a Claude Code Max subscription. You can't use it while you sleep.

kodo lets you set a goal, go to bed, and wake up to working code that's been independently tested and reviewed. The orchestrator (Gemini Flash) directs your subscription-covered Claude Code agents through multiple work cycles with built-in QA.

🌙 Overnight runsSet a goal, leave it running for hours. Cycles checkpoint progress automatically.
🔍 Built-in verificationIndependent architect + tester agents review work before accepting. Catches bugs the implementing agent is blind to.
🎭 Role separationOrchestrator making judgment calls, workers building code, independent reviewers catching issues.
🧠 Context efficiencyWork is spread across multiple agent context windows, so tasks that might overwhelm a single agent's context can succeed when agents take turns with focused scopes.

## 🧑‍💻 When to just use Claude Code directly

📖 LearningYou want to stay in the loop and build intuition by watching decisions unfold.
🧭 ExplorationYou don't know what you want yet and are discovering the shape of the solution as you go.
🎮 SteeringThe task needs frequent course corrections that only a human at the keyboard can provide.

## 📦 Install

1. You need uv to install kodo.

**Linux / macOS:**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh # install uv (skip if you have it)
```

**Windows (PowerShell):**
```powershell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # install uv (skip if you have it)
```

2. Install kodo using uv
```bash
uv tool install git+https://github.com/ikamensh/kodo
```

That's it. `kodo` is now on your PATH.

To also install the **SWE-bench benchmark harness** (`kodo-bench`):
```bash
uv tool install --with 'kodo[benchmark]' git+https://github.com/ikamensh/kodo
```

### Prerequisites

You need **at least one** agent backend installed:

| Backend | Role | Setup |
|---------|------|-------|
| 🤖 [Claude Code](https://code.claude.com/docs/en/setup) | Smart workers + architect | [instructions](docs/providers.md#claude-code-smart-workers--architect) |
| ⚡ [Cursor](https://cursor.com/docs/cli/installation) | Fast workers + testers | [instructions](docs/providers.md#cursor-fast-workers--testers) |
| 🌀 [OpenAI Codex](https://github.com/openai/codex/blob/main/docs/install.md) | Fast workers | [instructions](docs/providers.md#openai-codex-fast-workers) |
| 💎 [Gemini CLI](https://geminicli.com/docs/get-started/installation/) | Fast workers (free tier) | [instructions](docs/providers.md#gemini-cli-fast-workers) |
| 🌙 [Kimi](https://www.kimi.com/code/docs/en/kimi-cli/guides/getting-started.html) | Smart workers | [instructions](docs/providers.md#kimi-smart-workers) |
| 👻 [Kiro](https://kiro.dev/docs/cli/installation/) | Workers | [instructions](docs/providers.md#kiro-workers) |

Claude Code + one fast backend (Cursor, Codex, or Gemini CLI) is recommended. See [docs/providers.md](docs/providers.md) for detailed setup instructions, authentication, and troubleshooting.

For the **API orchestrator** (recommended), set a key in `.env` or your environment:
```bash
GOOGLE_API_KEY=... # Gemini orchestrator (recommended — fast and cheap)
ANTHROPIC_API_KEY=... # Claude API orchestrator (alternative)
```

> **Why API over CLI orchestrators?** CLI coding tools (Claude Code, Cursor, Codex) are built to solve problems themselves — they'll try to write code, micromanage agents, or go off-script instead of purely delegating. A plain API model stays in its lane as a coordinator: it thinks high level and delegates, closer to human user behavior.

## 🚀 Usage

```bash
# Interactive mode (recommended) — walks you through goal, config, launch
kodo # run in current directory
kodo ./my-project # run in specific directory

# Non-interactive (for scripting, CI, overnight cron jobs)
kodo --goal 'Build a REST API for user management' ./my-project
kodo --goal-file requirements.md ./my-project
kodo --goal 'Build X' --team full --exchanges 50 --cycles 10 ./my-project

# Test — find bugs through realistic interaction (not unit tests)
kodo test # test current project
kodo test --focus 'auth module' # focus on specific area
kodo test --target src/api/ # scope to specific files/dirs

# Improve — code review for simplification, usability, architecture
kodo improve # review current project
kodo improve --focus 'CLI flags' # focus on specific area

# Fix findings from a previous test or improve run
kodo --fix-from # printed at end of test/improve runs

# Resume an interrupted run (looks in ~/.kodo/runs/)
kodo --resume # resume latest incomplete run in current dir
kodo --resume 20260218_205503 # resume specific run by ID
```

### Interactive mode

The interactive CLI will:
1. Ask for your goal (or reuse an existing `goal.md`)
2. Optionally refine it via a Claude interview
3. Let you pick team, orchestrator, and limits
4. Show a summary and ask for confirmation before starting
5. Print a live progress table as agents work

### Non-interactive mode

Passing `--goal` or `--goal-file` enables non-interactive mode — no prompts, no confirmations. The AI still breaks down your goal into stages (unless `--skip-intake` is set), but without asking clarifying questions.

### All flags

```
kodo [project_dir] [options]

Goal (mutually exclusive):
--goal TEXT Goal text (inline)
--goal-file PATH Path to file containing goal
--improve Code review: simplification, usability, architecture
--test Find bugs through realistic interaction and workflows
--fix-from RUN_ID Fix findings from a previous test or improve run

Test/Improve options:
--focus TEXT Steer toward a specific area (e.g. 'error handling')
--target PATH Scope --test to specific files/dirs (repeatable)

Behavior:
--effort LEVEL low | standard (default) | high | max
--skip-intake Skip AI goal refinement
--auto-refine Auto-refine goal (no human input, for overnight runs)
--yes, -y Skip confirmation prompts
--no-auto-commit Disable auto-commit after stages

Output:
--json Structured JSON to stdout (implies --yes)
--resume [RUN_ID] Resume an interrupted run
--version Show version
```

> **⚠️ Heads up:** agents run with full permissions (`bypassPermissions` mode). They primarily work in your project directory but **can access any file on your system** (installing dependencies, editing configs, etc.). Make sure you have a git commit or backup before launching.

### `kodo test` — test like a real user

Tests your software the way a real user would — install it, exercise every feature, then probe edge cases.

1. **Setup & Discovery**: installs the software, builds testing tools (CLI wrappers, fixtures, sample data), maps all user-facing features and workflows
2. **Feature Walkthroughs**: exercises every feature end-to-end — follows documented workflows, tries every CLI command and flag, tests happy paths and common error cases
3. **Edge Cases & Error Paths**: probes boundaries — empty inputs, huge inputs, invalid types, missing files, concurrent usage, interruption mid-operation
4. **Triage & Regression Tests**: for confirmed bugs, writes a test that fails, fixes the code, verifies the test passes

If agents need tools they can't build (Docker, VPS, browser automation), they say so in the **Blocked Workflows** section of the report. On repeated runs, previously-tested features are skipped based on coverage tracking in `.kodo/test-coverage.md`.

```bash
kodo test # full test run
kodo test --focus 'authentication' # focus on area
kodo test --target src/api/ --target src/auth/ # scope to files
```

### `kodo --improve` — code review for significant improvements

Reviews your codebase like a senior developer joining the project. Focuses on simplification, usability, and architecture — not on running tests (use `kodo test` for that).

1. **Simplification**: unnecessary abstractions, duplicated logic, dead code, things that reimplement stdlib
2. **Usability**: redundant CLI flags, confusing API naming, poor error messages, missing defaults, docs that contradict code
3. **Architecture**: module boundaries, dependency directions, circular deps, scattered responsibilities
4. **Triage**: skeptically filters findings — most don't survive scrutiny
5. **Fix & Report**: auto-fixes safe issues, flags ambiguous ones as "needs decision"

```bash
kodo --improve # full review
kodo --improve --focus 'CLI interface' # focus on area
```

### Subcommands

```bash
kodo test # find bugs through realistic testing
kodo runs # list all past runs
kodo runs ./my-project # list runs for a specific project
kodo issue [RUN_ID] # report a bug (opens GitHub with run context pre-filled)
kodo backends # show available backends, models, API key status
kodo teams # list available teams
kodo teams add my-team # interactively create a custom team
kodo teams edit my-team # edit an existing team
kodo teams delete # pick user team files to remove (same listing style as `kodo teams`)
```

```
🦉 Orchestrator (Gemini Flash)
│
├── 🔍 architect Survey codebase, review code, find bugs
├── 🧠 worker_smart Complex implementation (Claude Code)
├── ⚡ worker_fast Quick tasks, iterations (Cursor, Codex, or Gemini CLI)
├── 🧪 tester Run tests, verify behavior
└── 🌐 tester_browser Browser-based UI testing
```

### Effort levels

Control how hard agents work and how strict verification is:

| Level | Orchestrator behavior | Verification | Claude workers |
|-------|----------------------|-------------|----------------|
| `low` | Do exactly what's asked, don't over-engineer | Basic — tests passing is sufficient | `--effort low` |
| `standard` | Default behavior | Default | SDK default |
| `high` | Push agents to iterate, reject mediocre results | Thorough — verify each criterion with evidence | `--effort high` |
| `max` | Tackle hardest parts first, iterate aggressively | Skeptical — reject technically correct but mediocre work | `--effort max` |

Set via CLI (`--effort max`) or project config (`.kodo/config.json`):
```json
{ "effort": "max" }
```

**Key concepts:**

- **Session** — a stateful conversation with a backend (Claude, Cursor, Codex, Gemini CLI, Kimi, or Kiro). Tracks token usage, supports reset.
- **Agent** — a prompt + session + turn budget. Call `agent.run(task, project_dir)` to get work done.
- **Orchestrator** — an LLM that delegates to a team of agents via tool calls:
- `ClaudeCodeOrchestrator` — runs on Claude Code with agents as MCP tools. Free on Max subscription.
- `ApiOrchestrator` — runs on Anthropic/Gemini API. Pay-per-token orchestrator, but workers still use your subscription.
- **Cycle** — one unit of orchestrated work. Think of it as one dev session.
- **Run** — multiple cycles until done, with summaries bridging context between cycles.
- **Stage** — an independently verifiable piece of a plan. Stages run sequentially, or in parallel in git worktrees when grouped.

## 🎨 Custom teams

You can customize which agents run by dropping a `team.json` file — no code changes needed.

**Lookup order:**
1. `{project}/.kodo/team.json` — project-level override
2. `~/.kodo/teams/{name}.json` — user-level named team

**Example:** adding a UX/UI designer agent to review user-facing code:

```json
{
"name": "saga-with-designer",
"agents": {
"worker_fast": {
"backend": "claude", "model": "sonnet",
"description": "Fast worker for implementation tasks."
},
"worker_smart": {
"backend": "claude", "model": "opus",
"description": "Deep-thinking worker for complex tasks."
},
"tester": {
"backend": "claude", "model": "sonnet",
"description": "Runs tests and reports results.",
"max_turns": 10
},
"architect": {
"backend": "claude", "model": "opus",
"description": "Reviews architecture, validates direction.",
"max_turns": 10, "timeout_s": 600
},
"designer": {
"backend": "claude", "model": "opus",
"description": "UX/UI advisor. Reviews component structure, accessibility, interaction patterns. Provides file/line references.",
"system_prompt": "You are a UX/UI design advisor. Review code for UI structure, accessibility, responsive design, and consistency. Reference specific files and lines. Fix minor issues yourself. Say 'ALL CHECKS PASS' if clean.",
"max_turns": 10, "timeout_s": 600,
"fallback_model": "sonnet"
}
}
}
```

The orchestrator sees all agents in the team and delegates to them as needed. You can add any specialized reviewer (security auditor, performance analyst, etc.) the same way.

**Agent fields:** `backend` and `model` are required. Optional: `description`, `system_prompt`, `max_turns` (default 15), `timeout_s`, `chrome` (for browser agents), `fallback_model`.

## 💰 Cost tracking

Kodo tracks costs in two buckets:

| Bucket | What | Example |
|--------|------|---------|
| **🔑 API** | Real money — pay-per-token orchestrator calls | Gemini Flash orchestrator: ~$0.13/run |
| **✨ Virtual** | **Not charged.** Claude Code SDK reports what API usage *would* cost — but on a Max/Pro subscription you pay nothing extra. | Claude Max workers: shows ~$1.69, actual spend $0 |

The progress table labels subscription-covered costs as **Virtual** to make this clear. Only the **API** bucket represents real spend.

## 🔎 Analyzing past runs

```bash
# Open the interactive HTML viewer
python -m kodo.viewer ~/.kodo/runs/20260218_205503/log.jsonl
# Or serve on port 8080: python -m kodo.viewer --serve --port 8080
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ikamensh/kodo

Awesome Lists containing this project

README