An open API service indexing awesome lists of open source software.

https://github.com/dbc-oduffy/coordinator-claude

An MIT-licensed Claude Code operating layer for PM-led AI engineering. Planning, enrichment, delegated execution, staged review, ship decisions, durable handoffs.
https://github.com/dbc-oduffy/coordinator-claude

agent-orchestration agent-teams ai-agents anthropic claude-code claude-code-plugin claude-code-skills code-review code-review-automation deep-research developer-tools multi-agent notebooklm product-management quality-gates review-pipeline structured-development workflow-automation

Last synced: 1 day ago
JSON representation

An MIT-licensed Claude Code operating layer for PM-led AI engineering. Planning, enrichment, delegated execution, staged review, ship decisions, durable handoffs.

Awesome Lists containing this project

README

          

# coordinator-claude

**A PM-native operating layer for AI engineering work.** An opinionated Claude Code plugin that turns product intent into scoped plans, delegated implementation, evidence, and ship/no-ship decisions. You're PM. Claude is EM. You stay technical enough to spot when something's wrong; the EM handles the rest.

## Who This Is For

You don't need to write code. You need to know enough to evaluate it — to spot when something smells wrong, to ask the right questions, to set direction. Amjad Masad (Replit's founder) [opined](https://youtu.be/PlDeqGQZ0CQ) that people who *don't* program may actually be better positioned for the LLM-coding world: they won't micromanage implementation. This plugin leans into that. It gives Claude standing authority to make engineering decisions — how to build, who to delegate to, when to refactor — while you hold product authority: what to build, what to cut, what ships. Just like an EM-PM dynamic.

What this system asks of the PM is altitude, not tactics. Architectural judgment, scope decisions, refactor-vs-patch calls, what-ships-and-what-doesn't. Slop is fought on the *plan* level — by treating each plan as an exhaustive software design document with named-reviewer pushback baked in, not by spot-checking commits afterward. The natural model defaults — act fast and move on, defer P2s into oblivion, size work in human-sprint timeframes — are the things this system is built to push back against. The PM sets direction; the EM and the named reviewers do the pushing.

This isn't a collection of prompt tricks. It's a **decision architecture**: routines (session orientation, plan review, multi-source code review, structured handoffs, daily flows) plus authority boundaries (EM owns implementation, PM owns scope/ship) plus reviewer personas that push back with fix-gates between them. Reviews are meaningful — blocking findings, not theatre. And the system actively reads its own accumulated wikis and lessons before drafting new plans (`prior-art-checker` pre-flight, `/learn-lessons` triage), so the scar tissue from past decisions informs the next one rather than sitting in a write-only memory file. It maps to how PM-EM collaboration actually feels in a real engineering team. The difference is that your "team" can work autonomously for hours, and you can review the output when you're ready.

## What This Is *Not*

- **Not another agent framework.** Claude Code already has subagents, plugins, skills, hooks, and MCP. This sits *on top of* that infrastructure — it doesn't compete with it.
- **Not a PRD-to-code pipeline.** Product intent is captured inside plan mode and acceptance criteria, not in a separate PRD funnel.
- **Not an autonomous coding agent.** The PM is in the loop on scope, tradeoffs, and ship decisions by design. Autonomous modes (`/mise-en-place`, `/autonomous`) exist for execution sprints, not for product authority.
- **Not a developer productivity tool.** The target user evaluates engineering work — they don't write it. If you want to be more productive at writing code yourself, you want vanilla Claude Code; this is for managing AI engineering work, not doing it.
- **Not aimed at the fully non-technical PM** (yet). The current sweet spot is a PM with technical altitude but not technical hands — someone who has architectural judgment, recognizes unwise plan-shape, and can spot when a plan, an architectural choice, or a tradeoff sounds wrong from how the EM describes it, without ever reading a diff. The PM does not review commits, does not perform code review, does not sign off on PRs. That altitude is the load-bearing constraint: code review and PR sign-off are delegated to AI peers and named-persona staff-tier reviewers (see below). Complex codebases require a PM who can tell when something looks wrong from the conversation, not from the patch.

🤖 Agents: start here → [AGENTS.md](AGENTS.md)

## Quick Start

You don't install this — your agent does. Open Claude Code in any project and paste:

```
Install coordinator-claude. The playbook is at
https://github.com/dbc-oduffy/coordinator-claude/blob/main/docs/agent-install.md
— read it, follow it, and queue /project-onboarding as the immediate next step
after I restart Claude Code.
```

Claude clones the repo, runs the installer, validates the result, and tells you when to restart. After restart, `/project-onboarding` bootstraps tracking infrastructure in your project.

**Auditing & uninstall** → [`docs/safety.md`](docs/safety.md) — what the installer changes, what it does not do, audit commands, and exact uninstall steps.

## Compatibility

| coordinator-claude | Claude Code | OS tested | Notes |
|--------------------|-------------|-----------|-------|
| v2.0.0 | tested with Claude Code 2026-05-07 release; minimum version not formally established | macOS, Linux, WSL, Windows (Git Bash) | Reference: `setup/install.sh`. **Requires bash 4.0+** — macOS ships bash 3.2 as `/bin/bash`, so Mac users `brew install bash` and run the installer with it (`"$(brew --prefix)/bin/bash" setup/install.sh`); the installer fails fast with this guidance if run under 3.2. Linux/WSL/Git Bash ship bash 4+. v2.0.0 has breaking changes — see [CHANGELOG](CHANGELOG.md#200--2026-05-07). |

## How a Session Works

Most tools hand you a bag of commands and wish you luck. This system has *routines* — woven into the session lifecycle so you don't have to remember them. Some fire automatically via lifecycle hooks (boot orientation, the context-pressure nudge); the rest are one-keystroke ceremonies the EM runs at the right moment. The distinction matters: *hooks* are what actually fire on their own; the slash commands below are invoked — by you, or by the EM on your behalf.

**Starting up.** When Claude opens a supported project, a `SessionStart` hook fires automatically — loading the current branch, pending handoffs, lessons from past sessions, project vitals, and an orientation cache. No cold start. Claude lands in the middle of the context window where performance is strongest, with forward-looking state already loaded. This is deliberate: research shows LLMs degrade toward the end of their context and, to a lesser extent, at the beginning ([Liu et al. 2023, "Lost in the Middle"](https://arxiv.org/abs/2307.03172)). The orientation hook front-loads context so the working session occupies the optimal window.

**Planning.** You describe what you want. Claude enters plan mode — but the plan isn't just written and executed. You review it. In a real dev team, the PM doesn't just say "build auth" and disappear; they review the spec, push back on scope, ask about edge cases. That's what happens here. The `coordinator:plan` skill is a decision-tree super-skill (triage → substrate → compose → exit) that mechanically binds the trigger word "plan" to the full pipeline: the prior-art-checker pre-flight cross-references the plan against accumulated wikis, lessons, and improvement queues to surface conflicts *before* an Opus reviewer touches it; the Staff Engineer then reviews; the integrator applies findings. For bigger decisions, `/staff-session` spawns role-based engineers who independently develop positions and debate to consensus — like pulling your tech lead and director of engineering into a room.

**Building.** Claude delegates to Sonnet subagents for implementation — cheaper, faster, fresh context. A `PreToolUse` hook nudges Claude away from doing implementation work directly, because the orchestrator's context is too valuable to spend on writing code. This is the same principle as a real EM: you don't want your engineering manager writing production code when they should be coordinating.

**Reviewing.** Code review comes from named personas with rich behavioral profiles — a domain expert reviews first (e.g., a web-dev specialist for front-end work, or a data-science specialist for ML pipelines), all findings are applied, then a generalist reviews the clean artifact. Sequential, with mandatory fix gates. Research supports both the [persona mechanism](docs/research/2026-03-19-named-persona-performance.md) (literature-backed for judgment-routing tasks; for mechanical bug-finding our [own controlled experiment](docs/research/2026-03-26-persona-experiment-results.md) found no recall gain — that's why bug-sweeps use bare agents) and [multi-agent review gains](https://www.anthropic.com/engineering/multi-agent-research-system) (Anthropic's own eval showed 90.2% improvement over single-agent). Plan-review with personas — the system's main use of named reviewers — leans on industry-standard PRD/SDD review patterns; we have not separately benchmarked it.

**Staying coherent.** Long sessions hit a hard constraint: context compaction. When triggered, the model summarizes what it *thinks* happened — a retrospective reconstruction that loses intent. A `PostToolUse` hook monitors context pressure and prompts Claude to create a structured handoff *before* compaction fires: decisions made, state reached, explicit next steps. Each handoff chains forward from its predecessor. Research shows structured handoffs significantly outperform automatic summarization ([Kang et al. 2025, ACON](https://arxiv.org/abs/2510.00615); Sourcegraph [retired compaction](https://sourcegraph.com/blog) in their Amp agent in favor of explicit handoffs after measuring degradation).

**Navigating the codebase.** This system invests heavily in documentation-as-navigation. Claude's natural mode is grep-heavy — searching text, reading prose, following paper trails. An architecture atlas, project tracker, orientation caches, and structured comments throughout the codebase give Claude something to *find* when it searches. We call this "grep bait." It's why the doc maintenance pipeline and architecture atlas exist: not bureaucracy, but navigation infrastructure that lets Claude plan from 60 lines of orientation instead of reading 20 source files cold. Research artifacts, lessons files, and handoff documents all serve double duty — they record decisions *and* create searchable landmarks for future sessions.

**Wrapping up.** `/session-end` captures lessons, updates documentation, and commits state. `/workday-complete` goes further: syncs docs, merges to main via PR, and optionally hibernates the machine. The cycle is continuous — each session starts where the last one left off.

## What You Need to Remember

The EM handles most of this on its own. Your key moves:

| Command | When | What It Does |
|---------|------|-------------|
| `/session-start` | Beginning of work | Deliberate orientation — triage handoffs, surface staleness, choose work. PM-invoked; *not* auto-fired. (Boot context loads automatically via a separate `SessionStart` hook; this command is the optional deeper orient.) |
| `/pickup` | Resuming work | Load a handoff artifact and continue where you left off |
| `/handoff` | Stepping away | Save session state for the next session |
| `/staff-session` | Big decisions | Multi-perspective planning or review from persona-based contributors |
| `/mise-en-place` | Heads-down time | Claude burns through the backlog autonomously — no input needed |
| `/autonomous` | Override | Suppress handoff nudges when you want Claude to push through compaction |

That's it for daily use. Everything else — delegation, review routing, doc maintenance — the EM drives as part of its workflow; context-pressure management is the one piece handled automatically, by a `PostToolUse` hook. You don't have to ask for any of it.

## Flows

Don't memorize commands; learn five flows. Most of what the system does, you'll touch through one of these.

**Flow 1 — Build a feature.** You describe intent → Claude enters plan mode and proposes acceptance criteria + scope mode → you review and approve → Claude delegates implementation → reviewers (domain expert first, generalist second) check the artifact with fix gates between → for user-visible work or patches that smell like they should be refactors, **the VP-Product Reviewer** (`coordinator:vp-product`) — scope challenger, naming optional via [`setup/name-personas.sh`](setup/name-personas.sh) — stress-tests the choice → `/merging-to-main` produces a ship verdict and you decide.

**Flow 2 — Fix a bug.** Reproduction first (don't trust the report) → root cause via the [systematic-debugging guide](docs/wiki/systematic-debugging.md) → scoped fix in production-patch mode (minimal diff, no opportunistic refactors) → regression check → reviewer → merge. For codebase-wide grinds, `/bug-blitz` autonomously works through the bug backlog with EM-serial commits at each wave gate.

**Flow 3 — Resume work.** The boot `SessionStart` hook automatically loads orientation, lessons, and pending handoffs into context → optionally run `/session-start` for a deeper triage → you pick up via `/pickup ` or pick from the menu → Claude lands mid-context and starts where the last session stopped.

**Flow 4 — Autonomous sprint.** `/mise-en-place` gathers ready work, builds a compaction-proof flight recorder, and executes through the backlog without stopping. `/autonomous` suppresses handoff nudges when you want it to push through context pressure.

**Flow 5 — Architecture change.** `/staff-session plan` (multi-perspective debate) → migration plan with rollback → architecture mode review → implementation → verification → architecture atlas update via `/update-docs`.

**Closing the day:** `/workday-complete` validates, syncs, runs the daily review, and merges. `/workweek-complete` is the larger weekly ceremony with version bump and release notes.

## How heavy is the workflow?

The system scales — a typo fix is a two-word instruction; a system rewrite is a multi-agent pipeline. Here are three representative tiers:

| Tier | Skill / command | Reviewer | Wall time |
|------|-----------------|----------|-----------|
| **Tiny edit** (typo, constant, rename) | Direct EM edit — no plan | None required | < 5 min |
| **Feature** (new command, new skill) | `/execute-plan` after PM approves a plan | Domain reviewer → the Staff Engineer (`coordinator:staff-eng`) generalist; the Director of Engineering (`coordinator:eng-director`) as backstop at High effort (sequential) | 30 min – 2 hrs |
| **System rewrite** (multi-plugin overhaul) | `/staff-session plan` → `/execute-plan` (with executor dispatch per [`docs/wiki/delegate-execution.md`](docs/wiki/delegate-execution.md)) | Full sequential chain + PM ship verdict | Half day+ |

See [`docs/wiki/task-tier-guidance.md`](docs/wiki/task-tier-guidance.md) for the full tier table, reviewer routing guide, and flow diagrams.

All Commands (appendix)

| Command | Purpose | Why It Exists |
|---------|---------|---------------|
| `/session-start` | Orient to project, load context, choose work | Eliminate cold starts; position key context optimally |
| `/session-end` | Capture lessons, update docs, commit | Preserve institutional knowledge between sessions |
| `/pickup` | Resume from a handoff artifact | Structured continuity beats re-reading git log |
| `/handoff` | Save state before stepping away | Prospective capture > retrospective reconstruction |
| `/staff-session` | Multi-perspective planning or review | Debate surfaces tradeoffs a solo agent misses |
| `/mise-en-place` | Autonomous backlog execution | Front-load all context, then execute without interruption |
| `/autonomous` | Toggle autonomous mode | Trust escalation — suppress handoff nudges |
| `/workday-start` | Morning orientation and triage | Surface staleness, align priorities, suggest work |
| `/workday-complete` | End-of-day wrap-up | Daily housekeeping: validate, consolidate, append week-changelog |
| `/workweek-start` | Strategic weekly orient | Surface last week's results, set this week's priorities |
| `/workweek-complete` | Release-grade weekly close | Full validation, version bump, parallel code review, merge to main |
| `/review` | Review a plan / design / RFC | Domain expert → generalist, sequential with fix gates |
| `/review-code` | Review a code change / diff / PR | Same reviewer pipeline, code-shaped artifacts |
| `/execute-plan` | Execute a PM-approved plan | Direct implementation without re-planning |
| `/update-docs` | Repo-wide documentation maintenance | 11-phase pipeline that fights doc staleness |
| `/bug-sweep` | Systematic codebase bug hunt | Find and fix all AI-fixable bugs in-session |
| `/bug-blitz` | Autonomous bug-backlog grinder | Wave-based execution with EM-serial commits at each gate |
| `/dogfood` | Smoke-driven fix-through loop | Binary outcome: converge on a working capability or switch gears |
| `/learn-lessons` | Triage `tasks/lessons.md` as doctrine change-requests | Three modes: local (per-repo), central (cross-repo), recheck (deferred follow-ups) |
| `/code-health` | Night-shift code health review | Scan today's commits, dispatch reviewer, apply findings |
| `/architecture-audit` | Deep architecture analysis | Multi-phase agent pipeline for system health |
| `/distill` | Extract knowledge from session artifacts | Turn plans and handoffs into evergreen wiki docs |

How this maps to real teams

| Real Team Practice | coordinator-claude Equivalent |
|--------------------|-------------------------------|
| **Daily standup** | `/session-start` — what happened, what's blocked, what's next |
| **Sprint planning** | `/staff-session plan` — persona-based engineers debate approach |
| **Spec review** | Plan mode with PM sign-off — Claude proposes, you approve |
| **Code review** | `/review-code` — domain expert first, generalist second, fix gates between |
| **Tech lead gut-check** | Any reviewer can be dispatched ad-hoc for a quick assessment |
| **Retrospective** | `/session-end` — capture lessons, update docs |
| **Heads-down sprint** | `/mise-en-place` — autonomous execution through the backlog |
| **Handoff between shifts** | `/handoff` — structured state capture, not "check the git log" |

The one role we don't have deeply embedded in workflows: **designer.** Meatspace designers are still better at that, and their judgment is going to remain difficult for LLMs to replicate. There's a vibe-design functionality, but it's not gonna rock your world.

Under the hood — architecture details

**Inverted capability delegation.** The coordinator sees ~8 thin MCP tools; domain agents access 40+ via proxy with full schemas. This saves ~40K tokens from the coordinator's context and forces delegation by design. A `PreToolUse` hook nudges the coordinator when it reaches for domain tools directly.

**Proactive artifact generation.** Before compaction fires, a hook prompts structured handoff creation — a prospective document capturing decisions, state, and next steps. Each artifact chains from its predecessor (cascade obligation) and opens with a synthesis of the prior handoff (anti-amnesia chain). See our [handoff vs. compaction research](docs/research/2026-03-21-handoff-artifacts-vs-compaction.md).

**Role-based sequential review.** Reviewers carry rich behavioral profiles — not just "code reviewer" but distinct roles with expertise domains and review lenses. Sequential review with mandatory fix gates means each reviewer sees a clean artifact. See the [persona research](docs/research/2026-03-19-named-persona-performance.md) and [experiment results](docs/research/2026-03-26-persona-experiment-results.md). Role labels ship as defaults; an optional naming flow lets users bind personal names to roles if that aids their cognitive ease.

**6-layer project knowledge.** Structure, architecture, activity, temporal, intent, state — none bulk-loaded. A tiered context model loads a ~60-line orientation cache at L1, pulls detailed artifacts on demand at L2, and reserves L3 for deep storage read by subagents. An 11-phase maintenance pipeline fights doc staleness automatically.

**Agent Teams for planning.** Claude Code's [Agent Teams](https://docs.anthropic.com/en/docs/claude-code/agent-teams) enables multiple Claude sessions that communicate and coordinate. This system uses it for multi-perspective planning: persona-based debaters form independent positions, challenge each other, and a synthesizer cross-references into consensus. Also powers the bundled [deep-research](plugins/deep-research/) pipelines (internet, repo, structured, NotebookLM).

**Cross-model delegation.** Haiku for mechanical checks, Sonnet for most execution, Opus for judgment and synthesis. Codex CLI integration is available as an opt-in add-on (`setup/install.sh --enable-codex` plus the external [openai/codex-plugin-cc](https://github.com/openai/codex-plugin-cc) plugin) for a second-opinion channel and independent implementation path; default installs omit it.

See [docs/architecture.md](docs/architecture.md) for the full model. For the testable claims — what each agent role promises and which hook or script enforces each promise — see [docs/contracts.md](docs/contracts.md). For broader context, the [novelty research](docs/research/2026-03-20-agent-orchestration-novelty-unified.md) assesses all patterns against published prior art.

For the failure modes this system was designed around — false completion, silent scope expansion, test theater, review laundering, context amnesia, integration blindness, and a dozen others — see [docs/evolution/05-failure-modes.md](docs/evolution/05-failure-modes.md).

For evidence — what was actually built under this workflow, alongside the controlled persona experiment and the qualitative failure-modes ledger — see [docs/research/2026-05-08-built-with-coordinator.md](docs/research/2026-05-08-built-with-coordinator.md), [docs/research/2026-03-26-persona-experiment-results.md](docs/research/2026-03-26-persona-experiment-results.md), and [docs/evolution/05-failure-modes.md](docs/evolution/05-failure-modes.md). These are the three peer artifacts in the evidence corpus.

## Plugins

| Plugin | Purpose | When to Enable |
|--------|---------|----------------|
| **[coordinator](plugins/coordinator/)** | Core orchestration, reviewers, all workflow skills | Always |
| **[deep-research](plugins/deep-research/)** | Multi-agent research pipelines — internet (A), repo (B), structured (C) | Any project that needs grounded research |
| **[notebooklm](plugins/deep-research/notebooklm/)** | NotebookLM-backed research pipeline (D) — YouTube, podcasts, audio sources | When you need media Claude can't read directly |
| **[web-dev](plugins/web-dev/)** | Front-end architecture review + UX flow review | Web projects |
| **[data-science](plugins/data-science/)** | ML, statistics, data modeling review | ML/data work |

The coordinator plugin is always enabled. Domain plugins are toggled per-project via `.claude/coordinator.local.md`.

## Customization

- **Name your reviewers (optional).** Role labels ship as the default — `bash setup/name-personas.sh "the Staff Engineer" "Alex" "the Director of Engineering" "Jordan"` binds chosen names to role labels across all plugin files. See the role table in [docs/customization.md](docs/customization.md) for all seven roles and their slugs.
- **Create your own domain reviewer.** The shipped domain plugins are the reference pattern: web-dev ships a Front-end Reviewer and a UX Reviewer, data-science ships a Data Science Reviewer (ML/statistics). Each is a self-contained plugin with an agent prompt, a `CLAUDE.md`, and a `.claude-plugin/plugin.json` — copy any of them as a starting point for your own specialization.
- **Per-project configuration.** Create `.claude/coordinator.local.md` with `project_type` to control which reviewers activate.

See [docs/customization.md](docs/customization.md) for templates, the full persona registry, and instructions for adding skills and CI checks.

## Companion Plugins

- **[clangd-lsp](https://github.com/anthropics/claude-code-plugins/tree/main/clangd-lsp)** — C++ code intelligence. Reviewer agents gain go-to-definition, find-references, and call hierarchy ‒ helpful for those (like us) using Claude Code with Unreal Engine.
- **[codex-plugin-cc](https://github.com/openai/codex-plugin-cc)** — Codex CLI integration for parallel execution and second-opinion reviews.
- **[Context7](https://github.com/upstash/context7)** — External library documentation lookup.

All are optional. Coordinator works without them; relevant features degrade gracefully.

Directory structure

```
coordinator-claude/
├── plugins/
│ ├── coordinator/ # Core orchestration (always enabled)
│ │ ├── .claude-plugin/plugin.json
│ │ ├── agents/ # 18 — enricher, executor, docs-checker, reviewers, eng-director, vp-product, reviewer-routed workers
│ │ ├── commands/ # 12 workflow commands (hook/ceremony auto-runners)
│ │ ├── hooks/ # context pressure, orientation, commit validation, tier-usage telemetry
│ │ ├── pipelines/ # staff-session team protocol + prompt templates
│ │ └── skills/ # 29 skills (planning, review, debugging, TDD, etc.)
│ ├── deep-research/ # Pipelines A/B/C + 6 research agents
│ │ └── notebooklm/ # Pipeline D (media research via NotebookLM)
│ ├── web-dev/ # Front-end + UX flow reviewers
│ └── data-science/ # ML, statistics reviewer (the Data Science Reviewer)
├── docs/ # Architecture, customization, research
├── setup/ # Installer
└── assets/ # Social preview
```

Troubleshooting

**Plugins not loading:**
- Check `enabledPlugins` in `~/.claude/settings.json` — must be `true`
- Check `~/.claude/plugins/installed_plugins.json` — must have entry with correct `installPath`
- Restart Claude Code (changes take effect on next session)
- The installer (`setup/install.sh`) manages all config files automatically

**Plugin cache not syncing after editing source:**
- Claude Code caches plugins by version. Run `bash setup/dev-sync.sh` to sync.

**Per-project plugin selection:**
- Create `.claude/coordinator.local.md` with `project_type` field
- Coordinator is always enabled; domain plugins activate per-project

## Research

This system's design is informed by published research and validated through controlled experiments:

- [Handoff artifacts vs. compaction](docs/research/2026-03-21-handoff-artifacts-vs-compaction.md) — why structured baton-passing beats automatic summarization
- [Named persona performance](docs/research/2026-03-19-named-persona-performance.md) — evidence that named behavioral profiles improve review quality
- [Agent orchestration novelty assessment](docs/research/2026-03-20-agent-orchestration-novelty-unified.md) — honest assessment of all patterns against prior art
- [Anthropic multi-agent alignment](docs/research/2026-04-01-anthropic-multi-agent-alignment.md) — independent convergence with Anthropic's production research system

---

[the PM O'Duffy](https://github.com/dbc-oduffy) & Claude