An open API service indexing awesome lists of open source software.

https://github.com/ikamensh/kodo

Orchestrator for AI coding (claude code, cursor, codex, gemini)
https://github.com/ikamensh/kodo

Last synced: about 1 month ago
JSON representation

Orchestrator for AI coding (claude code, cursor, codex, gemini)

Awesome Lists containing this project

README

          






Building while you sleep.



Python 3.13+
MIT License
Claude Code
Cursor
OpenAI Codex
Gemini CLI
Kimi
Kiro

---

# ๐Ÿฆ‰ kodo

Autonomous multi-agent coding that runs overnight on your Claude Code Max subscription. An orchestrator directs Claude Code agents through work cycles with independent verification โ€” so you wake up to tested, reviewed code instead of a stale terminal.

### [SWE-bench Verified: Kodo 57% vs Cursor 46%](https://kodo-bench-h2h-430011644943.europe-west1.run.app/)

On a 100-task head-to-head using the same underlying model (Cursor `composer-1.5`), adding Kodo's orchestration layer solves 24% more real-world GitHub issues. Same model, same prompt, same conditions โ€” the difference is orchestration. [Full methodology and interactive results โ†’](https://kodo-bench-h2h-430011644943.europe-west1.run.app/)

## Overview


Kodo modes overview โ€” Goal, Improve, and Test

See [detailed mode diagrams](docs/modes_diagram.md) for the full pipeline of each mode.

## ๐ŸŽฌ How it works in practice

Real run from [blackopt](https://github.com/ikamen/blackopt) โ€” building an auto-solving meta-optimizer with 4 new algorithms, adaptive scheduling, and 73 tests. **3 hours unattended, 2 cycles, succeeded.**

```
๐Ÿ” [00:00] orchestrator โ†’ architect
"Survey the codebase โ€” Solver interface, existing algorithms,
where to add new ones."
๐Ÿ“‹ [03:04] architect reports back
Full architecture survey, found 3 bugs in existing code

๐Ÿ”ง [03:14] orchestrator โ†’ worker_smart
"Fix structural bugs identified by architect"
โœ… [11:29] worker_smart: 82 turns of editing. All bugs fixed, tests pass.

โšก [12:36] orchestrator โ†’ architect: "Analyze how to implement DE and PSO"
[15:22] orchestrator โ†’ worker_fast: "Implement TabuSearch and EDA"
[16:01] orchestrator โ†’ worker_smart: "Build autosolve() โ€” concurrent
portfolio, adaptive scheduling"

๐Ÿ [35:20] orchestrator โ†’ done("autosolve complete, 4 new algorithms")
โ†’ tester: runs tests โœ…
โ†’ tester_browser: runs tests โœ…
โ†’ architect: "ProcessPool is never closed โ€” resource leak" โŒ
REJECTED

๐Ÿ”ง [45:37] orchestrator โ†’ worker_smart: "Fix the resource leak"
โ†’ done() โ†’ architect: "class-variable contamination" โŒ
REJECTED

... 7 more verification rounds ...
architect catches: time-slice state mutation, exponential
offspring, crossover edge case โ€” each progressively more subtle

๐ŸŽ‰ [2:59:50] โ†’ done() โ†’ tester โœ… โ†’ tester_browser โœ… โ†’ architect โœ…
ACCEPTED โ€” "4 new algorithms, autosolve() API, 73 tests pass"
```

The architect verifier caught **9 rounds of bugs** that the worker agent was blind to โ€” resource leaks, class variable contamination, state mutation โ€” each subtler than the last. A single Claude Code session would likely have shipped with several of these.

## ๐Ÿฆ‰ When to use kodo

You have a Claude Code Max subscription. You can't use it while you sleep.

kodo lets you set a goal, go to bed, and wake up to working code that's been independently tested and reviewed. The orchestrator (Gemini Flash) directs your subscription-covered Claude Code agents through multiple work cycles with built-in QA.

๐ŸŒ™ Overnight runsSet a goal, leave it running for hours. Cycles checkpoint progress automatically.
๐Ÿ” Built-in verificationIndependent architect + tester agents review work before accepting. Catches bugs the implementing agent is blind to.
๐ŸŽญ Role separationOrchestrator making judgment calls, workers building code, independent reviewers catching issues.
๐Ÿง  Context efficiencyWork is spread across multiple agent context windows, so tasks that might overwhelm a single agent's context can succeed when agents take turns with focused scopes.

## ๐Ÿง‘โ€๐Ÿ’ป When to just use Claude Code directly

๐Ÿ“– LearningYou want to stay in the loop and build intuition by watching decisions unfold.
๐Ÿงญ ExplorationYou don't know what you want yet and are discovering the shape of the solution as you go.
๐ŸŽฎ SteeringThe task needs frequent course corrections that only a human at the keyboard can provide.

## ๐Ÿ“ฆ Install

1. You need uv to install kodo.

**Linux / macOS:**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh # install uv (skip if you have it)
```

**Windows (PowerShell):**
```powershell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # install uv (skip if you have it)
```

2. Install kodo using uv
```bash
uv tool install git+https://github.com/ikamensh/kodo
```

That's it. `kodo` is now on your PATH.

To also install the **SWE-bench benchmark harness** (`kodo-bench`):
```bash
uv tool install --with 'kodo[benchmark]' git+https://github.com/ikamensh/kodo
```

### Prerequisites

You need **at least one** agent backend installed:

| Backend | Role | Setup |
|---------|------|-------|
| ๐Ÿค– [Claude Code](https://code.claude.com/docs/en/setup) | Smart workers + architect | [instructions](docs/providers.md#claude-code-smart-workers--architect) |
| โšก [Cursor](https://cursor.com/docs/cli/installation) | Fast workers + testers | [instructions](docs/providers.md#cursor-fast-workers--testers) |
| ๐ŸŒ€ [OpenAI Codex](https://github.com/openai/codex/blob/main/docs/install.md) | Fast workers | [instructions](docs/providers.md#openai-codex-fast-workers) |
| ๐Ÿ’Ž [Gemini CLI](https://geminicli.com/docs/get-started/installation/) | Fast workers (free tier) | [instructions](docs/providers.md#gemini-cli-fast-workers) |
| ๐ŸŒ™ [Kimi](https://www.kimi.com/code/docs/en/kimi-cli/guides/getting-started.html) | Smart workers | [instructions](docs/providers.md#kimi-smart-workers) |
| ๐Ÿ‘ป [Kiro](https://kiro.dev/docs/cli/installation/) | Workers | [instructions](docs/providers.md#kiro-workers) |

Claude Code + one fast backend (Cursor, Codex, or Gemini CLI) is recommended. See [docs/providers.md](docs/providers.md) for detailed setup instructions, authentication, and troubleshooting.

For the **API orchestrator** (recommended), set a key in `.env` or your environment:
```bash
GOOGLE_API_KEY=... # Gemini orchestrator (recommended โ€” fast and cheap)
ANTHROPIC_API_KEY=... # Claude API orchestrator (alternative)
```

> **Why API over CLI orchestrators?** CLI coding tools (Claude Code, Cursor, Codex) are built to solve problems themselves โ€” they'll try to write code, micromanage agents, or go off-script instead of purely delegating. A plain API model stays in its lane as a coordinator: it thinks high level and delegates, closer to human user behavior.

## ๐Ÿš€ Usage

```bash
# Interactive mode (recommended) โ€” walks you through goal, config, launch
kodo # run in current directory
kodo ./my-project # run in specific directory

# Non-interactive (for scripting, CI, overnight cron jobs)
kodo --goal 'Build a REST API for user management' ./my-project
kodo --goal-file requirements.md ./my-project
kodo --goal 'Build X' --team full --exchanges 50 --cycles 10 ./my-project

# Test โ€” find bugs through realistic interaction (not unit tests)
kodo test # test current project
kodo test --focus 'auth module' # focus on specific area
kodo test --target src/api/ # scope to specific files/dirs

# Improve โ€” code review for simplification, usability, architecture
kodo improve # review current project
kodo improve --focus 'CLI flags' # focus on specific area

# Fix findings from a previous test or improve run
kodo --fix-from # printed at end of test/improve runs

# Resume an interrupted run (looks in ~/.kodo/runs/)
kodo --resume # resume latest incomplete run in current dir
kodo --resume 20260218_205503 # resume specific run by ID
```

### Interactive mode

The interactive CLI will:
1. Ask for your goal (or reuse an existing `goal.md`)
2. Optionally refine it via a Claude interview
3. Let you pick team, orchestrator, and limits
4. Show a summary and ask for confirmation before starting
5. Print a live progress table as agents work

### Non-interactive mode

Passing `--goal` or `--goal-file` enables non-interactive mode โ€” no prompts, no confirmations. The AI still breaks down your goal into stages (unless `--skip-intake` is set), but without asking clarifying questions.

### All flags

```
kodo [project_dir] [options]

Goal (mutually exclusive):
--goal TEXT Goal text (inline)
--goal-file PATH Path to file containing goal
--improve Code review: simplification, usability, architecture
--test Find bugs through realistic interaction and workflows
--fix-from RUN_ID Fix findings from a previous test or improve run

Test/Improve options:
--focus TEXT Steer toward a specific area (e.g. 'error handling')
--target PATH Scope --test to specific files/dirs (repeatable)

Configuration:
--team TEAM full (default) | quick | test
--exchanges N Max exchanges per cycle
--cycles N Max cycles
--orchestrator BACKEND api (default) | claude-code | gemini-cli | codex | cursor
--orchestrator-model M opus | sonnet | gemini-pro | gemini-flash

Behavior:
--effort LEVEL low | standard (default) | high | max
--skip-intake Skip AI goal refinement
--auto-refine Auto-refine goal (no human input, for overnight runs)
--yes, -y Skip confirmation prompts
--no-auto-commit Disable auto-commit after stages

Output:
--json Structured JSON to stdout (implies --yes)
--resume [RUN_ID] Resume an interrupted run
--version Show version
```

> **โš ๏ธ Heads up:** agents run with full permissions (`bypassPermissions` mode). They primarily work in your project directory but **can access any file on your system** (installing dependencies, editing configs, etc.). Make sure you have a git commit or backup before launching.

### `kodo test` โ€” test like a real user

Tests your software the way a real user would โ€” install it, exercise every feature, then probe edge cases.

1. **Setup & Discovery**: installs the software, builds testing tools (CLI wrappers, fixtures, sample data), maps all user-facing features and workflows
2. **Feature Walkthroughs**: exercises every feature end-to-end โ€” follows documented workflows, tries every CLI command and flag, tests happy paths and common error cases
3. **Edge Cases & Error Paths**: probes boundaries โ€” empty inputs, huge inputs, invalid types, missing files, concurrent usage, interruption mid-operation
4. **Triage & Regression Tests**: for confirmed bugs, writes a test that fails, fixes the code, verifies the test passes

If agents need tools they can't build (Docker, VPS, browser automation), they say so in the **Blocked Workflows** section of the report. On repeated runs, previously-tested features are skipped based on coverage tracking in `.kodo/test-coverage.md`.

```bash
kodo test # full test run
kodo test --focus 'authentication' # focus on area
kodo test --target src/api/ --target src/auth/ # scope to files
```

### `kodo --improve` โ€” code review for significant improvements

Reviews your codebase like a senior developer joining the project. Focuses on simplification, usability, and architecture โ€” not on running tests (use `kodo test` for that).

1. **Simplification**: unnecessary abstractions, duplicated logic, dead code, things that reimplement stdlib
2. **Usability**: redundant CLI flags, confusing API naming, poor error messages, missing defaults, docs that contradict code
3. **Architecture**: module boundaries, dependency directions, circular deps, scattered responsibilities
4. **Triage**: skeptically filters findings โ€” most don't survive scrutiny
5. **Fix & Report**: auto-fixes safe issues, flags ambiguous ones as "needs decision"

```bash
kodo --improve # full review
kodo --improve --focus 'CLI interface' # focus on area
```

### Subcommands

```bash
kodo test # find bugs through realistic testing
kodo runs # list all past runs
kodo runs ./my-project # list runs for a specific project
kodo issue [RUN_ID] # report a bug (opens GitHub with run context pre-filled)
kodo backends # show available backends, models, API key status
kodo teams # list available teams
kodo teams add my-team # interactively create a custom team
kodo teams edit my-team # edit an existing team
kodo teams delete # pick user team files to remove (same listing style as `kodo teams`)
```

```
๐Ÿฆ‰ Orchestrator (Gemini Flash)
โ”‚
โ”œโ”€โ”€ ๐Ÿ” architect Survey codebase, review code, find bugs
โ”œโ”€โ”€ ๐Ÿง  worker_smart Complex implementation (Claude Code)
โ”œโ”€โ”€ โšก worker_fast Quick tasks, iterations (Cursor, Codex, or Gemini CLI)
โ”œโ”€โ”€ ๐Ÿงช tester Run tests, verify behavior
โ””โ”€โ”€ ๐ŸŒ tester_browser Browser-based UI testing
```

### Effort levels

Control how hard agents work and how strict verification is:

| Level | Orchestrator behavior | Verification | Claude workers |
|-------|----------------------|-------------|----------------|
| `low` | Do exactly what's asked, don't over-engineer | Basic โ€” tests passing is sufficient | `--effort low` |
| `standard` | Default behavior | Default | SDK default |
| `high` | Push agents to iterate, reject mediocre results | Thorough โ€” verify each criterion with evidence | `--effort high` |
| `max` | Tackle hardest parts first, iterate aggressively | Skeptical โ€” reject technically correct but mediocre work | `--effort max` |

Set via CLI (`--effort max`) or project config (`.kodo/config.json`):
```json
{ "effort": "max" }
```

**Key concepts:**

- **Session** โ€” a stateful conversation with a backend (Claude, Cursor, Codex, Gemini CLI, Kimi, or Kiro). Tracks token usage, supports reset.
- **Agent** โ€” a prompt + session + turn budget. Call `agent.run(task, project_dir)` to get work done.
- **Orchestrator** โ€” an LLM that delegates to a team of agents via tool calls:
- `ClaudeCodeOrchestrator` โ€” runs on Claude Code with agents as MCP tools. Free on Max subscription.
- `ApiOrchestrator` โ€” runs on Anthropic/Gemini API. Pay-per-token orchestrator, but workers still use your subscription.
- **Cycle** โ€” one unit of orchestrated work. Think of it as one dev session.
- **Run** โ€” multiple cycles until done, with summaries bridging context between cycles.
- **Stage** โ€” an independently verifiable piece of a plan. Stages run sequentially, or in parallel in git worktrees when grouped.

## ๐ŸŽจ Custom teams

You can customize which agents run by dropping a `team.json` file โ€” no code changes needed.

**Lookup order:**
1. `{project}/.kodo/team.json` โ€” project-level override
2. `~/.kodo/teams/{name}.json` โ€” user-level named team

**Example:** adding a UX/UI designer agent to review user-facing code:

```json
{
"name": "saga-with-designer",
"agents": {
"worker_fast": {
"backend": "claude", "model": "sonnet",
"description": "Fast worker for implementation tasks."
},
"worker_smart": {
"backend": "claude", "model": "opus",
"description": "Deep-thinking worker for complex tasks."
},
"tester": {
"backend": "claude", "model": "sonnet",
"description": "Runs tests and reports results.",
"max_turns": 10
},
"architect": {
"backend": "claude", "model": "opus",
"description": "Reviews architecture, validates direction.",
"max_turns": 10, "timeout_s": 600
},
"designer": {
"backend": "claude", "model": "opus",
"description": "UX/UI advisor. Reviews component structure, accessibility, interaction patterns. Provides file/line references.",
"system_prompt": "You are a UX/UI design advisor. Review code for UI structure, accessibility, responsive design, and consistency. Reference specific files and lines. Fix minor issues yourself. Say 'ALL CHECKS PASS' if clean.",
"max_turns": 10, "timeout_s": 600,
"fallback_model": "sonnet"
}
}
}
```

The orchestrator sees all agents in the team and delegates to them as needed. You can add any specialized reviewer (security auditor, performance analyst, etc.) the same way.

**Agent fields:** `backend` and `model` are required. Optional: `description`, `system_prompt`, `max_turns` (default 15), `timeout_s`, `chrome` (for browser agents), `fallback_model`.

## ๐Ÿ’ฐ Cost tracking

Kodo tracks costs in two buckets:

| Bucket | What | Example |
|--------|------|---------|
| **๐Ÿ”‘ API** | Real money โ€” pay-per-token orchestrator calls | Gemini Flash orchestrator: ~$0.13/run |
| **โœจ Virtual** | **Not charged.** Claude Code SDK reports what API usage *would* cost โ€” but on a Max/Pro subscription you pay nothing extra. | Claude Max workers: shows ~$1.69, actual spend $0 |

The progress table labels subscription-covered costs as **Virtual** to make this clear. Only the **API** bucket represents real spend.

## ๐Ÿ”Ž Analyzing past runs

```bash
# Open the interactive HTML viewer
python -m kodo.viewer ~/.kodo/runs/20260218_205503/log.jsonl
# Or serve on port 8080: python -m kodo.viewer --serve --port 8080
```