An open API service indexing awesome lists of open source software.

https://github.com/l4ci/maestro

A stateless daemon that drives GitLab/GitHub issues through a human-in-the-loop lifecycle using Claude as the coding agent.
https://github.com/l4ci/maestro

ai-agents automation claude coding-agent daemon devtools github gitlab monorepo nodejs typescript

Last synced: about 10 hours ago
JSON representation

A stateless daemon that drives GitLab/GitHub issues through a human-in-the-loop lifecycle using Claude as the coding agent.

Awesome Lists containing this project

README

          

# Maestro

Maestro is a robot teammate for your code repositories. You assign it a ticket,
and it does the work: it opens a branch, writes the code with Claude, proves the
change works, and then hands the result back to you for review. Once you approve,
it merges. If it gets stuck or has a question, it asks and waits.

It works on top of **GitLab** and **GitHub**. You keep using issues and merge
requests the way you already do. Maestro just becomes another contributor on the
team — one that happens to be an AI.

---

## The one-paragraph version

You run a single long-lived program (the **daemon**) on a machine you control.
It reads a config file listing which repos to watch. Every so often it looks at
each repo and asks: *is there a ticket here assigned to my bot account?* If yes,
it picks it up and walks it through a fixed set of steps — write code, prove it,
ask for review, merge. The clever part is that Maestro keeps **almost no memory
of its own**. Everything it needs to know lives in the ticket, the merge request,
and the git history. So if the daemon crashes or you reboot the machine, it just
re-reads the repo and carries on exactly where it left off.

---

## Why it's built this way

Two ideas drive the whole design.

**1. The forge is the memory.** Maestro doesn't run a database. The "state" of
any ticket — whether it's new, in progress, waiting for review, or done — is
written directly into the forge as labels, merge-request status, and comments.
Anything stored on the local disk (cloned repos, log files) is treated as a
throwaway cache that can be deleted and rebuilt at any time.

```mermaid
flowchart LR
subgraph durable["Durable — survives anything"]
F["GitLab / GitHub
tickets · labels · MRs · comments"]
C["maestro.config.yaml
which repos to watch"]
end
subgraph cache["Disposable cache — delete anytime"]
W["workspaces/
cloned repos"]
L["logs/"]
end
D["Maestro daemon"]
D -->|reads & writes| F
D -->|reads| C
D -.->|rebuildable| W
D -.->|rebuildable| L
```

Because of this, a multi-day wait for a human reviewer costs nothing. The daemon
can die, sit idle for a week, then wake up and rebuild every ticket's status from
the forge.

**2. The AI starts fresh every time.** Maestro never tries to "resume" a Claude
session. Each time it needs the agent, it starts a brand-new, cold session. The
agent re-learns what it's doing by reading three things: the ticket, the merge
request description (which doubles as its to-do list), and the recent git diff.
This sounds wasteful but it's actually robust — there's no fragile session to
lose, and a human can read the exact same three sources to understand what
happened. Anything that should outlive a single ticket — coding conventions,
decisions the team has locked in — belongs in the repo's own `CLAUDE.md`, which
the cold agent reads automatically on every run (see [Repo-specific
conventions](#repo-specific-conventions)).

---

## The lifecycle: how a ticket becomes a merge

Every ticket moves through the same set of stages. The stage is visible as a label
on the ticket (`maestro:in-progress`, `maestro:in-review`, and so on — on GitLab
they appear scoped, as `maestro::in-progress`), so you can see it at a glance on
your board. This is the default single-agent flow; a
repo can opt into a longer pipeline with separate define, plan, implement, and
review agents — see [Per-role prompts and the stage
pipeline](#per-role-prompts-and-the-stage-pipeline-29) below.

```mermaid
stateDiagram-v2
[*] --> New: assigned to bot
New --> InProgress: open branch + draft MR,
post "started"
InProgress --> InProgress: write code,
atomic commits
InProgress --> Handoff: agent says "done"
InProgress --> Blocked: agent has a question
Handoff --> InReview: proof posted,
then review requested
InReview --> InProgress: changes requested
InReview --> Done: you approve, then merge
Blocked --> InProgress: human answers
Done --> [*]: workspace cleaned up

note right of Handoff
Proof is always posted
BEFORE you're pinged,
so you're notified only
once everything is ready.
end note
```

In words:

- **New** — A ticket assigned to the bot, with no Maestro label yet. The daemon
creates a branch and a *draft* merge request, labels it in-progress, posts a
"started" comment, and begins work.
- **In progress** — The agent works the ticket one small commit at a time, ticking
off items in its to-do list (which lives in the MR description) and posting the
occasional progress note.
- **Handoff** — A brief, behind-the-scenes step. The agent says it's done, so
Maestro generates *proof* (see below), posts it on **both** the issue and the
MR, **then** requests your review on the merge request and posts a short
"ready for review" comment on the issue. The review request surfaces the MR
under your "review requests" and fires the forge's native notification; the
comment @-mentions you with a link and your response options, so you're
notified even if the review request can't land (no access to the repo, or a
shared bot account). From here you reply wherever is natural: approve or
request changes on the MR, or steer with a `/maestro` comment on either the
MR or the issue (on a dedicated bot account, a leading `@bot` mention works
too — see [Addressing the bot by name](#driving-maestro-from-a-merge-request--no-ticket-required)) —
all these channels reach the agent. The order matters: you're pinged last,
when there's actually something to look at.
- **In review** — Maestro waits. If you approve, it merges using that repo's own
git rules and the ticket auto-closes. If you request changes, it flips back to
in-progress and feeds your feedback to the agent. Three channels count as
"changes requested": a formal review (GitHub "Request changes" / an unresolved
GitLab review thread), an MR/PR comment starting with `/maestro` (the
shared-account escape hatch — the bot account can't review its own MR), and
an issue comment starting with `/maestro` (any author — the explicit prefix is
what keeps ordinary review chatter from spinning up agents).
- **Blocked** — The agent hit something it can't decide on its own. It posts the
question and waits for a human. No slot is consumed while it waits.
Maestro recognises your answer by its author: any reply from an account other
than the bot resumes work. If you **share the bot's account** (a per-host
`bot_user` pointing at your own user), start the reply with `/maestro` — a
body-start command is the one thing the agent can never produce (it has no
forge access; the daemon posts everything, and always behind a heading), so
it counts as provably human even from the bot's account.
- **Done** — Ticket closed, local workspace cleaned up.

---

## What happens during one "tick"

The daemon runs on a loop. One pass over the repos is called a **tick**. Here's
what a single ticket looks like as it gets picked up and worked:

```mermaid
sequenceDiagram
participant D as Daemon
participant F as Forge (GitLab/GitHub)
participant WS as Workspace
participant CL as Claude
participant P as Proof
participant H as Human

D->>F: Any ticket assigned to the bot?
F-->>D: Yes — ticket #42, no label
D->>WS: Clone repo, make a branch
D->>F: Open draft MR, label in-progress
loop until done or blocked
D->>CL: Start a fresh session (read ticket, MR, diff)
CL->>WS: Write code, commit
CL-->>D: status: done
end
D->>P: Run the proof (e.g. Playwright tests)
P-->>D: Artifacts
D->>F: Post proof on ticket + MR
D->>F: Assign MR to ticket author, mark ready
Note over D,H: Maestro now waits...
H->>F: Approve
D->>F: Merge per repo's git rules
F-->>D: Ticket auto-closes
D->>WS: Clean up the clone
```

The only thing the daemon ever hears back from Claude is a tiny status:
`done`, `needs_input`, or `in_progress` (a review session adds its pass/fail
verdict). Everything else it learns by reading the forge on the next tick.

---

## Driving Maestro from a merge request — no ticket required

Tickets aren't the only way in. Any **open MR/PR that has no backing issue** can
be handed to Maestro directly: assign the MR to the bot account and write a
comment that *starts with* `/maestro`:

```
/maestro the e2e tests fail on this branch — find out why and fix it
```

On the next tick, Maestro starts a cold agent on that MR's branch, follows the
instruction (investigate, or change code), pushes commits if it changed
anything, and **always** posts a reply — even if it only has findings, or
failed. One command, one reply. To ask for more, comment `/maestro …` again;
the newest unanswered command wins.

Two verbs never reach the agent and are executed by the daemon itself,
instantly: **`/maestro merge`** and **`/maestro close`**. (The agent never
holds a forge token, so merging and closing are the daemon's job anyway.) A
draft MR refuses `merge` with a hint to mark it ready first.

The same trust rules as tickets apply: on a shared bot account the body-start
`/maestro` is what marks a comment as provably human, and a non-empty
`allowed_actors` list restricts who may command the bot. MRs that belong to a
Maestro ticket (a `maestro/issue-*` branch, or one that `Closes #N`) are *not*
picked up here — those stay with their ticket's lifecycle, where the same
`/maestro` comment counts as review feedback.

> **Addressing the bot by name.** If the bot runs on its **own dedicated
> account** (not shared with you), you can start a command with an `@`-mention of
> that account instead of `/maestro` — `@maestro-bot fix the failing test` works
> anywhere a `/maestro` command does. It's an alias, nothing more: the mention
> must lead the comment (a passing `@maestro-bot` mid-sentence stays ordinary
> chatter), and it only counts from someone *other than* the bot — which is
> exactly why it's a dedicated-account feature. On a **shared** account every
> comment is authored by the bot itself, so the mention can't prove a human typed
> it and `/maestro` stays the only hatch.

---

## The pieces inside

Maestro is a set of small, independently testable parts. The most important one
is the **reconciler** — the "brain" that, given a snapshot of a ticket, decides
the single next action. It's pure logic with no side effects, which is why it can
be tested exhaustively and why it never changed when GitHub support was added on
top of GitLab.

```mermaid
flowchart TD
Config["Config loader
reads maestro.config.yaml"]
WF["Workflow loader
reads each repo's WORKFLOW.md"]
Forge["Forge adapter
GitLab + GitHub, one shared shape"]
Rec["Reconciler
(snapshot) to one action"]
WSM["Workspace manager
clone · branch · cleanup"]
Run["Agent runner
headless cold session"]
Proof["Proof generator
playwright · tests · diff · none"]

Config --> Rec
WF --> Rec
Forge --> Rec
Rec -->|"start work"| WSM
WSM --> Run
Run -->|"done"| Proof
Proof --> Forge
Rec -->|"merge / comment / label"| Forge
```

- **Forge adapter** — The only part that knows the difference between GitLab and
GitHub. It translates each into one shared shape so nothing above it has to
care which forge a repo lives on.
- **Reconciler** — Pure decision-making. Takes a ticket snapshot, returns at most
one action per tick.
- **Workspace manager** — Clones a repo into a per-ticket folder, handles the
branch, cleans up when the ticket is done. This is also the seam where, later,
you could swap host folders for isolated containers.
- **Agent runner** — Runs the same `claude` binary you use interactively, but
headless. Locally it uses your existing login and still loads your `CLAUDE.md`,
settings, skills, and permission modes. Set `defaults.agent.kind: codex` to run
OpenAI's Codex CLI (`codex exec`) instead — the daemon-global choice is the only
difference; the cold-session, status-contract, and proof flow are identical.
> ⚠️ **Codex is unverified.** The Codex backend is unit-tested against the
> published Codex SDK types but has not yet been run against a live `codex` CLI.
> Treat it as experimental until [#122](https://github.com/l4ci/maestro/issues/122)
> is closed.
- **Proof generator** — Pluggable per repo. Pick `playwright`, `test-output`,
`diff-summary`, or `none`.
- **CLI and Web** — Thin shells over the shared core (see below).

---

## Prerequisites

Maestro is an orchestrator — it drives a handful of command-line tools rather than
reimplementing them. These need to be installed and on your `PATH` before the
daemon will run:

| Tool | Why | Install |
|---|---|---|
| **Node.js ≥ 20** + **pnpm** | runs Maestro itself | [nodejs.org](https://nodejs.org) · [pnpm.io](https://pnpm.io/installation) |
| **git** | clone and branch each ticket's workspace | your package manager |
| **claude** | the default coding agent, run headless — *unless you set `agent.kind: codex`* | [Claude Code](https://claude.com/claude-code) |
| **codex** | alternative coding agent, run headless — *only if you set `agent.kind: codex`* | [OpenAI Codex CLI](https://github.com/openai/codex) |
| **glab** | talk to the GitLab API — *only if you watch GitLab repos* | [gitlab.com/gitlab-org/cli](https://gitlab.com/gitlab-org/cli) |
| **gh** | talk to the GitHub API — *only if you watch GitHub repos* | [cli.github.com](https://cli.github.com) |

You don't need to log into `glab`/`gh` — Maestro injects the token itself (from
its environment, loaded from your `.env`). It only needs the binaries present. Run **`maestro doctor`** at any
time to check what's missing; the daemon also runs this check on startup and
refuses to boot (with a clear message) if a required tool is absent.

---

## Getting started

The fast path from a fresh clone to a running daemon and dashboard.

```sh
# 1. Clone and set up (installs deps, builds, scaffolds .env, checks your tools)
git clone https://github.com/l4ci/maestro.git
cd maestro
./scripts/setup.sh

# 2. Add your secrets — paste the bot account's token(s)
$EDITOR .env # MAESTRO_GITLAB_TOKEN / MAESTRO_GITHUB_TOKEN

# 3. Point Maestro at your forge(s) — host + which env var holds each token
$EDITOR maestro.config.yaml # (see "Setting it up" below for the full schema)

# 4. Confirm every required tool is on PATH
node packages/cli/dist/cli.js doctor

# 5. Connect your first repo (creates its labels/board, commits the config change)
node packages/cli/dist/cli.js add gitlab.com/your-group/your-repo

# 6. Load the tokens into your shell, then start the daemon
set -a; . ./.env; set +a
node packages/cli/dist/cli.js daemon

# 7. In another terminal, start the dashboard and open it in a browser
node packages/cli/dist/cli.js dashboard # → http://127.0.0.1:4000
```

That's the whole loop. Assign an issue on your repo to the bot account, and watch
its state move across the dashboard as Maestro picks it up, works it, and hands it
back for review.

> **Tip — a shorter `maestro`:** `./scripts/setup.sh` links the CLI onto your
> PATH automatically when pnpm has a global bin dir (`pnpm setup` creates one —
> then re-run the setup script, or run `pnpm -C packages/cli link --global`
> yourself). No global bin dir? Alias it instead:
> `alias maestro='node /path/to/maestro/packages/cli/dist/cli.js'`.

---

## Setting it up

There are two config files. One is global to your Maestro install; the other
lives inside each repo you want watched.

### 1. The global config — `maestro.config.yaml`

This lists your repos and global defaults. It's committed to git. **Secrets never
go here** — the config only names the *environment variable* that holds a token,
never the token itself.

```yaml
defaults:
poll_interval_active: 30s # how often to check repos with live work
poll_interval_idle: 5m # how often to check quiet repos
bot_user: maestro-bot
concurrency:
global_max: 2 # how many tickets to actively work at once
agent:
kind: claude # coding agent: 'claude' (default) or 'codex' (OpenAI Codex CLI)
# command: /usr/local/bin/claude # optional: override the binary/path (defaults to the kind name)
# NOTE: 'codex' is experimental and not yet verified against a live codex CLI — see issue #122.
forges:
# Single entry per forge (shorthand)…
github: { host: github.com, token_env: MAESTRO_GITHUB_TOKEN }
# …or a list for multiple hosts of the same kind. A username only exists on its
# own forge, so an entry may carry its own bot_user (else defaults.bot_user).
gitlab:
- { host: gitlab.com, token_env: MAESTRO_GITLAB_TOKEN }
- { host: git.acme.internal, token_env: MAESTRO_ACME_TOKEN, bot_user: acme-bot }
repos:
- url: gitlab.com/group/api
- url: github.com/org/web
```

The actual token values go in a `.env` file, which is gitignored:

```sh
cp .env.example .env
# then fill in MAESTRO_GITLAB_TOKEN / MAESTRO_GITHUB_TOKEN
```

### 2. The per-repo config — `WORKFLOW.md`

Each watched repo carries its own `WORKFLOW.md`, version-controlled alongside the
code. It tells Maestro how *that* repo wants to be worked: which branch to target,
how to merge, how to prove a change works, and any house rules for the agent
(test commands, conventions, definition of done). The prompt body of this file is
the agent's operating manual. When you run `maestro add`, a sensible default is
generated for you from a template.

---

### Per-role prompts and the stage pipeline (#29)

A `WORKFLOW.md` body may declare role sections:

```markdown
Shared conventions every agent gets.

## role: define
Refine the request into acceptance criteria. Ask, don't assume.

## role: plan
Produce the implementation plan and the checkbox todo.

## role: implement
Execute the plan, one atomic commit per step.

## role: review
Judge the diff against the plan. Block on real problems, not taste.
```

Text above the first role heading is shared by every agent. A repo **without**
role sections keeps the original single-agent flow unchanged — roles are opt-in
per repo.

Declaring roles replaces the single generalist agent with a staged pipeline,
where each stage runs a cold session with only its own instructions:

```mermaid
flowchart LR
B["backlog
define agent drafts
acceptance criteria"] -->|"human applies maestro:todo
or replies /maestro approve"| T["todo
plan agent writes
the plan"]
T -->|"branch + draft MR,
plan from birth"| I["in progress
implement agent,
atomic commits"]
I -->|"done, proof posted"| R{"internal review:
a fresh agent
judges the diff"}
R -->|pass| H["handoff —
human review"]
R -->|"fail, round n"| I
R -->|"bounce cap hit"| BL["blocked —
over to you"]
```

- **Backlog** — new issues land here. The define agent refines the request into
acceptance criteria and posts them as an issue comment. Then it waits for a
human: apply the `maestro:todo` label (the daemon never sets that label itself,
so its presence proves a person signed off) or reply `/maestro approve`.
Labelling the issue `maestro:todo` at creation skips definition entirely.
- **Todo** — the plan agent writes the implementation plan. Only after that does
Maestro create the branch and draft MR, so the MR carries the plan from its
first second.
- **In progress** — implementation, as before. But when the agent says "done",
you are not pinged yet.
- **Internal review** — Maestro posts the proof, then starts a *separate* cold
session whose only job is to judge the diff. Pass → the normal handoff: you're
assigned, the MR is marked ready. Fail → the findings land as an issue comment
("round 1", "round 2", …) and the implement agent picks them up next tick.
After `review.max_rounds` consecutive fails (default 3, configurable in the
front matter), Maestro stops bouncing and flips the ticket to blocked with a
summary — it never auto-merges and never silently drops work. Any comment from
you resets the round count and resumes the loop.

The labels you'll see on a roled repo, in board order:

| Label | Meaning | Who sets it |
|---|---|---|
| `maestro:backlog` | being defined | daemon |
| `maestro:todo` | definition approved, awaiting plan | **a human** — this is the approval gate |
| `maestro:in-progress` | plan landed, implementation underway | daemon |
| `maestro:in-review` | proof posted; internal then human review | daemon |
| `maestro:blocked` | a question (or the bounce cap) needs a human | daemon |
| `maestro:queued` | wants a slot, none free | daemon |

`maestro:queued` is a capacity marker, not a stage: it can sit alongside any of
the others, means only "waiting for a free concurrency slot", and is retracted
when work actually starts (or when you unassign the bot). On GitLab the labels
are scoped (`maestro::backlog`), so they exclude each other automatically.

One design note worth knowing: in a roled repo the labels are *projections* for
your board, not the daemon's memory. The stage is re-derived every tick from
artifacts — does an MR exist, is it still a draft, was the AC draft approved —
so a crashed daemon, a stripped label, or an unblocked ticket all recover to
exactly the right place. The one exception is `maestro:todo`, which is itself an
artifact: a human put it there.

## A walkthrough of the default `WORKFLOW.md`

A `WORKFLOW.md` has two parts: a **front-matter block** (the settings, in YAML
between the `---` fences) and a **prompt body** (plain Markdown below the fences,
which becomes the agent's instructions). Here's the default, annotated.

### The front matter — settings

```yaml
---
forge: gitlab # gitlab | github (guessed from the repo's host if left out)
project: group/repo # GitLab path, OR GitHub org/repo
bot_user: maestro-bot # the account tickets get assigned to
manage_board: true # auto-create the labels (and, on GitLab, the board lists)

trigger: # the gate for what the bot is allowed to pick up
assignee: bot # the ticket must be assigned to bot_user
require_label: null # optional: also require this maintainer-added label
allowed_actors: [] # optional: only trust triggers from these users (turn ON for public repos)

proof: # how this repo proves a change works
type: playwright # playwright | test-output | diff-summary | none
command: "npx playwright test --reporter=line"

git: # this repo's own merge rules
default_branch: main
target: main # which branch the MR/PR targets
merge_strategy: squash # squash | merge | rebase
delete_source_branch: true

environment: # how to reach or boot a running instance (for proof)
base_url: http://localhost:3000 # an already-running local instance, if any
start_command: "npm run dev" # else, how to start one
seed_command: "npm run db:seed" # load sample/dummy data
health_check: "curl -sf localhost:3000/health"

claude: # how the agent runs
command: "claude" # same binary as interactive; the daemon runs it headless
max_turns: 40 # safety cap on how long one session can churn
stall_timeout_seconds: 120 # kill a session that's been silent this long (then retry once)
permission_mode: acceptEdits # how much the agent may do without asking

concurrency:
max_active: 2 # most tickets this one repo will work at once

ci: # gate the handoff on the head commit's pipeline (default off)
gate: false # true: hold the handoff until CI is conclusive, bounce red CI back to the agent
wait_timeout_seconds: 1200 # a pipeline still running after this hands off anyway (stuck/external CI)
max_fix_rounds: 3 # red-CI bounces before the ticket is parked as blocked for a human
---
```

The blocks worth understanding:

- **`trigger`** is your safety gate. By default a ticket just needs to be assigned
to the bot. For anything public, turn on `require_label` and/or `allowed_actors`
so a stranger can't kick off work by assignment alone.
- **`proof`** is how Maestro *demonstrates* the change is good before pinging you.
`playwright` runs browser tests, `test-output` runs your test suite, `diff-summary`
just summarizes the change, and `none` skips it. If proof *generation itself*
crashes (a Playwright crash, a health-check timeout, a misconfigured command),
the first two failures retry quietly; the third consecutive one parks the ticket
as blocked with the failure posted on the issue, so a broken proof setup never
loops silently. Any reply from you un-parks it.
- **`claude.stall_timeout_seconds`** is a watchdog, not a turn limit: a session
that emits *nothing* for this long is killed and retried once. Size it above
your repo's slowest silent command — a cold dependency install or full build
can legitimately produce no output for minutes.
- **`environment`** only matters when proof needs a running app. If you already
keep a local instance up, point `base_url` at it; otherwise Maestro uses
`start_command` to boot one, `seed_command` to fill it with data, and
`health_check` to know it's ready.
- **`git`** lets each repo keep its own merge habits — Maestro never imposes one
global rule.
- **`ci`** is an opt-in extra gate (off by default). With `gate: true`, Maestro
reads the head commit's pipeline before handing off: a **passing** pipeline hands
off as usual, a **running** one *holds* the handoff (so you're never pinged
seconds before CI goes red) until it ages past `wait_timeout_seconds`, and a
**failed** one bounces straight back to the agent with the failing job logs
threaded in as context. After `max_fix_rounds` red bounces (counted since your
last comment, so any reply resets it) the ticket parks as blocked for you, the
same way the internal-review cap does. Works on GitLab pipelines and GitHub
check-runs; repos without CI are unaffected. While in review, a pipeline that
*regresses* (the target branch moved, someone pushed) bounces the same way —
but an approval still merges, so your branch-protection rules own merge-time
enforcement.

### The prompt body — the agent's instructions

Below the second `---` is plain Markdown that becomes the agent's operating
manual. The template ships with a shared spine (the same six steps from the
lifecycle above) plus a spot for your repo's house rules:

```markdown
# Agent operating protocol

You are working a single issue end-to-end in a cold session. Reconstruct all
context from the issue, the MR description (your durable plan/todo), recent
commits + diff, and the repo conventions below.

1. Orient — read the issue, the MR description, recent commits + diff, and the
conventions in this file.
2. First session only — gather context. If the task is ambiguous, post a comment
with questions, set maestro:blocked, and stop. Otherwise, write a plan +
checkbox todo list into the MR description.
3. Work the next unchecked item — one atomic commit per meaningful step.
4. After each step — tick the box in the MR description; post a short progress
comment if notable.
5. Done — all boxes checked + definition-of-done met → emit done.
6. Blocked anytime — need a human decision → comment the question, label
maestro:blocked, stop.

## Repo-specific conventions

- Test: `npm test`
- Lint: `npm run lint`
- Definition of done: tests + lint green; proof attached; MR todo all checked.
```

**You mostly edit the bottom section.** The numbered protocol is the shared
default — leave it alone unless you have a reason. The "Repo-specific conventions"
block is where you teach the agent about *this* codebase: the exact test and lint
commands, architecture notes, naming rules, and what "done" means here. Whatever a
new human teammate would need to know on day one belongs there.

> **Why the MR description matters so much:** notice that step 2 puts the plan and
> to-do list *into the MR description*, not into the agent's memory. That's
> deliberate. Because the agent starts cold every session, the MR description is
> its only durable scratchpad — and it's one you can read too. Open the MR and you
> see exactly what the agent thinks it's doing and how far along it is.

> **A second channel — `CLAUDE.md`.** The conventions block above rides in the
> WORKFLOW.md body, so Maestro is the one injecting it. But the agent is Claude
> Code, and Claude Code loads a `CLAUDE.md` from the repo root on its own, every
> run, with no help from Maestro. That makes `CLAUDE.md` the home for durable,
> repo-owned knowledge: conventions, decisions the team has settled, architecture
> notes, links into `docs/` or your ADRs. And because a `CLAUDE.md` can point at
> other files, you can wire up a whole tree of standing context and keep all of it
> in git. It fits *the forge is the memory*: the knowledge survives because it's
> committed, and it changes only when an MR merges it.

---

## Running it

The daemon is one process that watches **all** your repos. You never run one
daemon per repo.

```sh
# load the forge tokens, then start the daemon (watches everything in the config)
set -a; . ./.env; set +a
maestro daemon
```

The daemon reads tokens from its environment, not from the `.env` file directly —
hence the `source` line (a [service](#keeping-it-running-background-and-boot)
does this for you via `EnvironmentFile=`).

On startup it preflights your tools (`git`, `claude`, and the forge binaries you
need) and refuses to boot if any are missing — so a misconfigured host fails fast
with a clear message instead of silently looping.

Day-to-day you'll mostly use the CLI:

| Command | What it does |
|---|---|
| `maestro daemon` | Start the daemon — one process that watches every repo in the config and works assigned issues. Preflights tools, then loops. |
| `maestro add ` | Start watching a repo. Sets up its labels/board and commits the config change. Add `--public` to opt into a public repo (read the safety notes first). |
| `maestro list` | Show all watched repos and what's in flight. |
| `maestro status ` | Show one ticket's current stage. |
| `maestro logs ` | Show the agent's logs for a ticket. |
| `maestro run --attach` | Open an **interactive** Claude in that ticket's workspace so you can watch or drive it by hand. Local-dev only, not the daemon path. |
| `maestro dashboard` | Start the web dashboard (same as `node packages/web/dist/main.js`) — see below. |
| `maestro doctor` | Check that every required tool (`git`, `claude`, `glab`/`gh`) is on your `PATH`. Exits non-zero if anything's missing. |

### The dashboard

A small read-only **web dashboard** shows the same information in your browser —
a live status table of every watched repo and its tickets, plus an "add a repo"
form — and auto-refreshes every few seconds.

![The Maestro dashboard: two repos with their tickets and colour-coded lifecycle states](docs/assets/dashboard.png)

*Example view — each ticket shows its current lifecycle state, and each repo
summarises its counts. A repo whose forge can't be reached shows as "unreachable"
instead of looking idle.*

```sh
maestro dashboard # → http://127.0.0.1:4000
```

Override the bind address with `MAESTRO_WEB_HOST` / `MAESTRO_WEB_PORT`. The same
endpoint also serves the raw read-model as JSON to any non-browser client (handy
for scripting), so `curl localhost:4000` gives you the data the page renders.

**Adding repos from the dashboard is off by default.** The `GET` paths are
read-only and always open, but `POST /repos` (the "add a repo" form) mutates your
config and creates labels plus a bootstrap issue/PR on the forge — so it stays
disabled unless you opt in by setting `MAESTRO_DASHBOARD_TOKEN`:

```sh
MAESTRO_DASHBOARD_TOKEN="$(openssl rand -hex 32)" maestro dashboard
```

With no token set the write path doesn't exist (a `POST /repos` returns `404`) and
the add-repo form is hidden. With a token set, the form appears and each add must
carry it as `Authorization: Bearer ` (compared in constant time); a missing
header is `401`, a wrong token `403`. This keeps a read-only dashboard safe to
expose on a shared tailnet/LAN while gating the one write path behind a secret. On
an untrusted network, still prefer binding `127.0.0.1` and fronting it with
`tailscale serve` + ACLs.

### Keeping it running: background and boot

`maestro daemon` runs in the foreground. For a quick detached session, `tmux`
(or `nohup maestro daemon &`) does the job — but it dies with the machine. The
proper way to survive reboots is a systemd **user** service; the repo ships a
ready unit at [`templates/maestro.service`](templates/maestro.service):

```sh
mkdir -p ~/.config/systemd/user
cp templates/maestro.service ~/.config/systemd/user/
$EDITOR ~/.config/systemd/user/maestro.service # set the three EDIT lines (paths)
systemctl --user daemon-reload
systemctl --user enable --now maestro

# start at boot without anyone logging in
loginctl enable-linger $USER

# watch it
journalctl --user -u maestro -f
```

Three things the unit handles that a bare `nohup` doesn't:

- **Tokens.** Nothing in Maestro reads the `.env` file itself — the tokens must
be in the daemon's process environment. In the foreground you load them into
your shell once (`set -a; . ./.env; set +a`); the unit does it declaratively
with `EnvironmentFile=`.
- **PATH.** The daemon shells out to `git`, `claude`, `glab`/`gh`, and a systemd
user session's default `PATH` misses the usual homes of two of them
(`~/.local/bin` for claude, `/snap/bin` for glab). The unit extends `PATH`;
if a tool still can't be found, the startup preflight fails fast and names it.
- **Restarts.** `Restart=on-failure` is safe precisely because of the design
above: the daemon keeps no state of its own, so a restarted process re-reads
the forge and picks up every ticket where it left off.

A user service (not a system one) is deliberate: the daemon runs Claude with
*your* login and settings, so it should run as your user, not root.

The dashboard has a matching unit,
[`templates/maestro-web.service`](templates/maestro-web.service) — same install
steps with `maestro-web` as the unit name. It carries commented `Environment=`
lines for the bind address and the optional `MAESTRO_DASHBOARD_TOKEN` write
gate, so the on-by-default state stays read-only.

---

## How the daemon decides what to work on

You can watch a hundred repos on a tiny machine. The thing that costs real
resources isn't *watching*, it's *working*. A repo with nothing assigned, or a
ticket sitting in review, costs only a cheap periodic check. Compute is spent only
while a ticket is actively being worked, and each active ticket holds one "slot".

```mermaid
flowchart LR
R1["repo: api
2 tickets waiting"] --> Q
R2["repo: web
1 ticket in review"] -. no slot needed .-> Idle
R3["repo: docs
nothing assigned"] -. just polled .-> Idle
Q["Work queue"] --> S{"Free slot?
(global_max: 2)"}
S -->|yes| Work["Active worker"]
S -->|no| Wait["Queued"]
```

If more tickets are ready than you have slots, the extras simply queue — and get
the `maestro:queued` label, so the queue is visible on the forge instead of
looking like silence. Nothing breaks — throughput is just capped. The right
number of slots depends on your machine's RAM, since each active worker runs
Claude plus possibly a browser for proof. On a 4 GB box, 1–2 is sensible.

To scale up, you don't add daemons — you add **machines**, each running its own
single daemon watching a few repos. They never coordinate with each other; the
forge is the shared source of truth, so there's nothing to sync.

---

## Maestro can manage itself

The Maestro project is just a git repo, which means Maestro can watch *itself*.
Want to add a repo or bump concurrency? File a ticket on the Maestro repo. The
agent edits `maestro.config.yaml`, opens a merge request, you approve, it merges,
and the daemon hot-reloads the new config. There's no separate admin panel —
managing Maestro *is* using Maestro.

---

## A note on safety

Maestro runs autonomous Claude (and possibly a browser for proof) directly on the
host machine, unsandboxed. For your own **private** repos with a dedicated bot
account, that's a reasonable tradeoff. For **public** repos it's riskier, and
support for them is deliberately opt-in (`--public`):

- **Who can start work** is gated by forge permissions — assigning a ticket to the
bot requires write/triage access. You can tighten this further with a required
label or an allowlist of trusted users.
- **What a ticket says** is the harder problem. On a public repo, ticket text is
written by strangers, and the agent acts on it with the bot's credentials. The
real fix is per-ticket container isolation, which is a planned future step. Until
then, treat public-repo support as experimental and keep secrets out of the
workspace.
- **Permission mode.** Headless, the agent has no human to approve tool calls, so
it ships defaulting to `bypassPermissions` (`--dangerously-skip-permissions`) —
otherwise it can't even `git commit` its work or run a proof. That means it runs
unsandboxed Bash on the host. Fine for a private repo you trust; for a public one,
override `claude.permission_mode` to a constrained mode (`acceptEdits`/`default`)
and accept that the agent can't commit or prove until the container sandbox lands.

---

## For developers

It's a pnpm + TypeScript monorepo.

```
packages/core the brain: reconciler, forge adapters, agent runner, proof,
config + workflow loaders, daemon loop, tool preflight
packages/cli maestro add | status | list | logs | run | dashboard | doctor + daemon entry
packages/web read-only dashboard (HTML page + JSON API) + add-repo form
templates/ the default WORKFLOW.md used when onboarding a repo
scripts/ setup.sh — one-shot install + build + tool check
```

The CLI and web packages are intentionally thin — almost all real logic lives in
`core` so both interfaces behave identically.

```sh
pnpm install
pnpm typecheck # strict TypeScript
pnpm test # vitest
pnpm lint # biome
pnpm build # per package
```

### Where to read more

- **Design spec (the locked source of truth):**
[`docs/superpowers/specs/2026-06-03-maestro-design.md`](docs/superpowers/specs/2026-06-03-maestro-design.md)
- **Build roadmap and milestone history:** [`tasks/todo.md`](tasks/todo.md)
- **Architecture vocabulary and settled seams:** [`CONTEXT.md`](CONTEXT.md)