https://github.com/maxritter/pilot-shell
How real engineers run Claude Code: spec-driven planning, enforced TDD, persistent memory, and quality enforcement on all levels. Make Claude Code production-ready.
https://github.com/maxritter/pilot-shell
ai-agents ai-assistant ai-coding ai-coding-tools ai-engineering ai-tools anthropic anthropic-claude claude claude-ai claude-code claude-context claude-skills claudecode model-context-protocol spec-driven-development
Last synced: 7 days ago
JSON representation
How real engineers run Claude Code: spec-driven planning, enforced TDD, persistent memory, and quality enforcement on all levels. Make Claude Code production-ready.
- Host: GitHub
- URL: https://github.com/maxritter/pilot-shell
- Owner: maxritter
- License: other
- Created: 2025-10-18T08:30:23.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2026-05-21T11:50:16.000Z (13 days ago)
- Last Synced: 2026-05-21T19:59:02.921Z (12 days ago)
- Topics: ai-agents, ai-assistant, ai-coding, ai-coding-tools, ai-engineering, ai-tools, anthropic, anthropic-claude, claude, claude-ai, claude-code, claude-context, claude-skills, claudecode, model-context-protocol, spec-driven-development
- Language: TypeScript
- Homepage: https://pilot-shell.com
- Size: 153 MB
- Stars: 1,726
- Watchers: 19
- Forks: 145
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-mcp-lists - GitHub
- awesome-github-projects - pilot-shell - How real engineers run Claude Code and Codex: spec-driven planning, enforced TDD, persistent memory, and quality enforcement on all levels. Make your agents production-ready. โญ1,736 `TypeScript` ๐ฅ (๐ค AI & Machine Learning)
README

### How real engineers run Claude Code.
From requirement to production-grade code โ planned, tested, verified.
**Spec-driven plans. Enforced quality gates. Persistent knowledge.**
[](https://github.com/maxritter/pilot-shell/stargazers)
[](https://star-history.com/#maxritter/pilot-shell&Date)
[](https://github.com/maxritter/pilot-shell/releases)
[](https://github.com/maxritter/pilot-shell/pulls)
Install โข
Features โข
Docs โข
Blog โข
Website โข
Changelog
```bash
curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash
```
**macOS ยท Linux ยท Windows (WSL2)** โ installs in under 2 minutes.

---
---
## Why Pilot Shell
**Claude Code writes code fast** โ but without structure, it skips tests, loses context, and produces inconsistent results. Other frameworks add complexity (dozens of agents, thousands of lines of config) without meaningfully better output.
**Pilot Shell is different.** Every component solves a real problem with an engineered solution:
- **`/spec`** โ plans, implements, and verifies features end-to-end with TDD
- **`/fix`** โ bugfix workflow with TDD; bails out when complexity exceeds the standard fix lane
- **`/prd`** โ brainstorm ideas into clear requirements through with optional deep research
- **Spec collaboration** โ share specs with teammates, annotations flow back grouped by author
- **Quality hooks** โ enforce linting, formatting, type checking, and tests as quality gates
- **Context engineering** โ preserves decisions and knowledge across sessions
- **Code intelligence** โ semantic search (Semble) + code knowledge graph (CodeGraph)
- **Token optimization** โ 60โ90% cost reduction via RTK compression and Semble code search
- **Extensions** โ reusable rules, skills, and MCP servers with team sharing and customization
- **Console** โ local web dashboard with real-time notifications and session management
- **Pilot Bot** โ persistent automation agent with scheduled tasks and background jobs
Run `pilot` for Spec-Driven Development with `/spec`, or `pilot bot` for 24/7 automations.
---
Getting Started
### Prerequisites
**Claude Code:** Install [Claude Code](https://code.claude.com/docs/en/quickstart) using the **native installer** before setting up Pilot Shell. If you have the `npm` or `brew` version installed, uninstall it first. If no Claude Code installation is detected, the Pilot installer will attempt to set it up for you.
**Claude Subscription:** Solo developers should choose [Max 5x](https://claude.com/pricing) for moderate usage or [Max 20x](https://claude.com/pricing) for heavy usage. Teams should use [Team Premium](https://claude.com/pricing) (6.25x usage per member, SSO, admin tools, billing management). Companies with stricter compliance or procurement requirements should use [Enterprise](https://claude.com/pricing) (API based pricing applies per usage).
**Terminal (Recommended):** [cmux](https://cmux.com) works great with Pilot Shell โ its vertical tab layout lets you run multiple sessions side by side. Any modern terminal works: [Ghostty](https://ghostty.org/), [iTerm2](https://iterm2.com/), or the built-in macOS/Linux terminal.
**Claude Chrome (Recommended):** Install the [Claude Code Chrome extension](https://code.claude.com/docs/en/chrome) for browser automation and E2E testing. Pilot automatically detects it and uses it as the preferred tool. When the extension isn't available, Pilot falls back to [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) (direct CDP access, Lighthouse, performance tracing), then [playwright-cli](https://github.com/microsoft/playwright-cli) (persistent sessions, tracing) or [agent-browser](https://agent-browser.dev/) (lightweight, fast startup).
**Codex Plugin (Included):** The [Codex plugin](https://github.com/openai/codex-plugin-cc) is installed automatically with Pilot. It provides adversarial code review powered by OpenAI Codex โ an independent second opinion during `/spec` planning and verification. Run `/codex:setup` once to authenticate, then enable reviewers in Console Settings โ Reviewers. A [ChatGPT Plus](https://chatgpt.com/#pricing) subscription ($20/mo) covers the Codex API usage.
### Installation
**Works with any existing project.** Pilot Shell is installed on top of Claude Code and uses its built-in concepts like commands, rules, hooks, skills, subagents, MCP, LSP and optimized settings to improve your experience:
```bash
curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash
```
Installs globally on macOS, Linux, and Windows (WSL2). All tools and rules go to `~/.pilot/` and `~/.claude/`. After installation, `cd` into any project and run `pilot` or `ccp` to start.
Downgrade
If you encounter an issue or unfixed bug in the latest version, you can always go back to a previous version (see [releases](https://github.com/maxritter/pilot-shell/releases)):
```bash
export VERSION=9.0.6
curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash
```
Uninstalling
Removes the Pilot binary, plugin files, managed commands/rules, settings and shell aliases:
```bash
curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/uninstall.sh | bash
```
Reset & Refresh
Over time, Claude Code's accumulated session logs and Pilot Shell's caches can slow things down. A periodic reset gives you a clean baseline:
```bash
# 1. Inside Claude Code, log out
/logout
# 2. Back up your current config (just in case)
mv ~/.claude.json ~/.claude.json.bak
mv ~/.claude ~/.claude.bak
mv ~/.pilot ~/.pilot.bak
# 3. Reinstall Pilot Shell from the official installer
curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash
# 4. Start Pilot, sign in to Claude, and re-activate your license
pilot
pilot activate
```
Once Pilot Shell is running smoothly again, you can delete the `.bak` copies. Forgot your license key? Recover it in the [Pilot members area](https://polar.sh/max-ritter/portal).
Using a Dev Container
Pilot Shell works inside Dev Containers. Copy the [`.devcontainer`](https://github.com/maxritter/pilot-shell/tree/main/.devcontainer) folder from this repository into your project, adapt it to your needs (base image, extensions, dependencies), and run the installer inside the container. The installer auto-detects the container environment and skips system-level dependencies like Homebrew.
For tighter isolation when working with untrusted code, combine the dev container with Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) โ `bubblewrap`, `socat`, `iptables`, and `ipset` are pre-installed in the Dockerfile so it works out of the box on Linux. See Anthropic's [development containers](https://code.claude.com/docs/en/devcontainer) and [sandboxing](https://code.claude.com/docs/en/sandboxing) docs for hardening patterns (egress allowlist, managed settings, persistent volumes).
What the installer does
7-step installer with progress tracking, rollback on failure, and idempotent re-runs:
1. **Prerequisites** โ Checks/installs Homebrew, Node.js, Python 3.12+, uv, git, jq
2. **Claude files** โ Installs into `~/.claude/` (native layout) โ rules, commands, hooks, MCP servers, agents
3. **Config files** โ Creates `.nvmrc` and project config
4. **Dependencies** โ Installs Semble, RTK, CodeGraph, [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp), [playwright-cli](https://github.com/microsoft/playwright-cli), [agent-browser](https://agent-browser.dev/), language servers
5. **Shell integration** โ Auto-configures bash, fish, and zsh with `pilot` alias
6. **VS Code extensions** โ Installs recommended extensions for your stack
7. **Finalize** โ Success message with next steps
---
How It Works
Just chat โ no plan, no approval gate. [Quick mode](https://pilot-shell.com/docs/workflows/quick-mode) is the default: quality hooks and TDD enforcement still apply, best for small tasks and exploration. For anything that needs a plan, use `/spec` โ not Claude Code's built-in plan mode.
### /spec โ Spec-Driven Development
**[`/spec`](https://pilot-shell.com/docs/workflows/spec) replaces Claude Code's built-in plan mode** (Shift+Tab) for new features, refactoring, and architectural work. It provides a complete planning workflow with TDD, verification, and code review. **[Collaborative spec review](https://pilot-shell.com/docs/features/spec-collaboration) shifts review left** โ share a single link, teammates annotate inline, feedback flows back into the Console grouped by author.
```bash
pilot
> /spec "Add user authentication with OAuth and JWT tokens"
> /spec "Migrate the REST API to GraphQL"
```
```
Discuss โ Plan โ Approve โ Implement (TDD) โ Verify โ Done
โ โ
โโโ Loopโโโ
```

How /spec works
`/spec` auto-detects whether the request is a feature or a bugfix and routes to the right workflow. The three phases below apply to both โ the verify step differs slightly (features get E2E scenarios; bugfixes get a Behavior Contract audit, see the `/fix` section below).
**Plan:** Explores codebase with semantic search โ asks clarifying questions โ writes detailed spec with scope, tasks, and definition of done โ for UI features, writes **E2E test scenarios** (step-by-step, browser-executable) that become the verification contract โ **spec-review sub-agent** validates completeness โ waits for your approval. Optional **Codex adversarial review** provides an independent second opinion when enabled.
**Implement:** Creates an isolated git worktree โ implements each task with strict TDD (RED โ GREEN โ REFACTOR) โ quality hooks auto-lint, format, and type-check every edit โ full test suite after each task.
**Verify:** Full test suite + actual program execution โ **unified review sub-agent** (compliance + quality + goal) โ for UI features, executes each E2E scenario step-by-step via browser automation (pass/fail tracked, results written to plan) โ auto-fixes findings โ squash merges to main on success.
Collaborative spec review โ catch flaws in the spec, not the PR
Send a single persistent share link to your teammates. They open it in any browser at `pilot-shell.com/s/` โ no Pilot Shell install required โ and annotate the spec inline. Submitted feedback flows back into your Console automatically (60-second poll), surfaces in the bell as `" left N annotations on "`, and lands in the annotation panel grouped by teammate.
- **One link, many reviewers.** Click "Share with Teammates" once. The button is permanently replaced by a copyable link. Forward it to anyone.
- **Zero handoff.** Teammates click Submit on pilot-shell.com โ no copy-URL-back step, no manual import.
- **Author-grouped feedback in the Console.** Keep what's useful, delete what isn't. Annotations land in `.annotations/.json` so the agent reads them at the next review checkpoint.
- **Shift left.** Wrong approach, missed edge case, unclear scope, weak architecture โ spot it before code is written, where a one-sentence annotation is the entire fix. Not just bugs: anything you'd flag in PR review costs less to flag here.
Links expire after 7 days. No accounts, no encryption โ the unguessable URL is the access token.
### /fix โ Bugfix Workflow
**[`/fix`](https://pilot-shell.com/docs/workflows/fix) is the bugfix command.** Investigate the bug, write the failing test, fix at the root cause, single-pass audit, done. No plan file, no approval mid-flow, no separate verify phase.
```bash
pilot
> /fix "annotation persistence drops fields between save and reload"
> /fix "off-by-one in pagination at boundary"
> /fix "wrong default for max_retries"
```
```text
Investigate โ RED โ Fix โ Audit โ Quality Gate โ Done
```
If investigation reveals the bug is multi-component or architectural, `/fix` stops cleanly and tells you to re-invoke with `/spec`. `/fix` is always quick; `/spec` is the full workflow.
How /fix works
For local bugs. Single file, obvious-once-traced root cause. No plan file, no approval mid-flow, no separate verify phase. TDD still enforced โ bugfixes without a failing test don't ship.
- **Investigate:** Reproduce the bug โ trace to root cause at `file:line` with `codegraph_context` + targeted reads โ state confidence (High/Medium required to proceed). For UI / async / race bugs, add temporary `SPEC-DEBUG:`-marked logs at component boundaries before tracing.
- **RED:** Write the failing test via an existing public entry point โ run, must fail with the documented symptom.
- **Fix:** Minimal change at the root cause. Symptom patches are forbidden. Reproducing test must pass, then the targeted test module. Diff sanity check (root-cause file in diff, no unplanned files, < 20 lines, symptom-patching grep) catches issues with the fix itself.
- **Verify End-to-End:** The primary correctness signal. Run the actual program with the original input (Claude Code Chrome โ Chrome DevTools MCP โ playwright-cli โ agent-browser for UI; CLI / API / REPL / job trigger for non-UI) and capture concrete evidence. A passing unit test alone is never accepted as proof.
- **Quality Gate:** Lint + types + build + full anti-regression suite, once.
- **Bail-out:** If investigation reveals the bug is multi-component, architectural, needs defense-in-depth at multiple layers, or two fix attempts have failed, `/fix` stops cleanly and tells you to re-invoke with `/spec`. It does not silently switch lanes.
How /spec handles bugs
When you type `/spec ""`, the full bugfix workflow runs โ for bugs that warrant a written plan, approval, code review, and the full verify ceremony.
- **Behavior Contract:** every plan pins down `Given / When / Currently / Expected / Anti-regression` โ the invariant the fix must produce and the behavior it must not break
- **Three uniform tasks** (always, regardless of bug size): Write Reproducing Test (RED) โ Implement Fix at Root Cause โ Quality Gate
- **Verify audit:** always-on `cp`+`trap` revert-test (proves the reproducing test would genuinely fail without the fix โ rules out retroactive rubber-stamp tests) + root-cause-at-source audit (flags symptom patches and caller-side workarounds) + original-symptom re-check โ no sub-agents, tests carry the proof
- **Iteration cap at 3:** after three failed verify cycles, the workflow stops and asks if the bug is architectural rather than letting you loop forever
### /prd โ Brainstorm Ideas Into Product Requirements Documents
[`/prd`](https://pilot-shell.com/docs/workflows/prd) is the brainstorming surface for ideas that aren't specs yet โ vague problem statements and fuzzy shapes. It pitches directions, pressure-tests them with you, and converges on a PRD you can hand to `/spec`. PRDs are saved to `docs/prd/` and visible in the Console's **Requirements** tab.
```bash
pilot
> /prd "Add real-time notifications for team updates"
> /prd "We need better onboarding โ users drop off after signup"
```
What /prd Does
**When to use `/prd` over `/spec`:** `/prd` is for **what** and **why**; `/spec` is for **how**. Reach for `/prd` first when you only have a problem statement, want to riff across multiple directions, or need scope boundaries defined before someone starts building.
**Flow:** two modes, picked automatically from how fuzzy the idea is:
1. **Ideate** โ free-form prose, Claude pitches 3-5 directions, you react (only runs when the idea is vague)
2. **Clarify โ Converge โ Write** โ structured multiple-choice questions once the shape is known, then the PRD is written
**Research tiers** (picked at the start):
| Tier | Behavior |
|------|----------|
| **Quick** | Skip research |
| **Standard** | Web search for competitors, prior art, best practices |
| **Deep** | Parallel research agents for comprehensive findings |
The final PRD covers problem statement, core user flows, scope boundaries, and technical context โ then offers to hand off directly to `/spec` for implementation.
### /setup-rules โ Generate Modular Rules
[`/setup-rules`](https://pilot-shell.com/docs/workflows/setup-rules) explores your codebase, discovers conventions, generates modular rules and documents MCP servers. Run once initially, then anytime your project changes significantly.
```bash
pilot
> /setup-rules
```
What /setup-rules Does
12 phases that read your codebase and produce comprehensive AI context:
0. **Reference** โ load best practices for rule structure, path-scoping, and quality standards
1. **Read existing rules** โ inventory all `.claude/rules/` files, detect structure and path-scoping. Also detects `CLAUDE.md` and `AGENTS.md` (the cross-framework agent context file used by Codex, Cursor, etc.)
2. **Migrate unscoped assets** โ prefix with project slug for better sharing
3. **Quality audit** โ check rules against best practices (size, specificity, stale references, conflicts)
4. **Explore codebase** โ hybrid semantic+lexical search with Semble, structural analysis with CodeGraph
5. **Compare patterns** โ discovered vs documented conventions
6. **Sync project rule** โ update `{slug}-project.md` with current tech stack, structure, commands. Migrates `CLAUDE.md` / `AGENTS.md` content into modular rules
7. **Sync MCP docs** โ smoke-test user MCP servers, document working tools
8. **Discover new rules** โ find undocumented patterns worth capturing
9. **Cross-check** โ validate all references, ensure consistency across generated files
10. **Sync AGENTS.md** โ if `AGENTS.md` already exists, offer to re-export the updated rules into it so non-Claude agents see the same context. Always asks first, never creates the file if absent, preserves user-authored sections
11. **Summary** โ report all changes made
**For monorepos:** Organizes rules in nested subdirectories by product and team, with `paths` frontmatter to scope rules to specific file types. Generates a `README.md` documenting the structure.
### /create-skill โ Reusable Skill Creator
[`/create-skill`](https://pilot-shell.com/docs/workflows/create-skill) builds a reusable skill from any topic โ explores the codebase and creates it interactively with you. If no topic is given, evaluates the current session for extractable knowledge.
```bash
pilot
> /create-skill "Automate the review and triaging of our PR Bot comments"
```
What /create-skill Does
6 phases that turn domain knowledge into a reusable skill:
1. **Reference** โ load use case categories, complexity spectrum, file structure template, description formula, security restrictions
2. **Understand** โ explore the codebase for relevant patterns, ask clarifying questions, or evaluate the current session for extractable knowledge
3. **Check existing** โ search project and global skills to avoid duplicates
4. **Create** โ write to `.claude/skills/` (project) or `~/.claude/skills/` (global), apply portability and determinism checklists
5. **Quality gates** โ structure checklist (SKILL.md naming, frontmatter fields), content checklist (error handling, examples, exclusions), triggering test (should/shouldn't trigger), iteration signals
6. **Test & iterate** โ run test prompts with sub-agents, evaluate results, optimize description triggering
**Use case categories:**
| Category | Best For |
| ----------------------------- | -------------------------------------------------------------------------- |
| **Document & Asset Creation** | Consistent reports, designs, code with embedded style guides and templates |
| **Workflow Automation** | Multi-step processes with validation gates and iterative refinement |
| **MCP Enhancement** | Workflow guidance on top of MCP tool access, multi-MCP coordination |
**Skill structure:** Each skill is a folder with a `SKILL.md` file (case-sensitive), optional `scripts/`, `references/`, and `assets/` directories. The YAML frontmatter description determines when Claude loads the skill โ it must include what the skill does, when to use it, and specific trigger phrases. Progressive disclosure keeps context lean: frontmatter loads always (~100 tokens), SKILL.md loads on activation, linked files load on demand.
### /benchmark โ Measure Rule & Skill Impact
[`/benchmark`](https://pilot-shell.com/docs/workflows/benchmark) runs your prompts with and without the target, grades outputs against falsifiable assertions, and shows a structured report you can absorb in 30 seconds โ labeled verdict, quadrant breakdown, and only the divergent assertions in the drill-down. Finishes with a concrete improvement plan so you know exactly what to change next.
```bash
pilot
> /benchmark pilot/skills/create-skill
> /benchmark pilot/rules/testing.md
```
What /benchmark Does
Six phases turn a rule or skill into a before/after comparison with an actionable plan:
1. **Intake** โ pick up an existing `benchmarks//evals.json` or author one
2. **Target discovery** โ classify as `skill` or `rules`
3. **Author evals** โ draft 3 falsifiable assertions; falsifiability gate ensures baseline actually fails
4. **Execute** โ run both configs in isolated sandboxes; grader subagent scores every assertion
5. **Present findings** โ three layers, scannable top-to-bottom:
| Layer | Content |
|---|---|
| **Verdict** | One labeled sentence with a recommended next step. Delta bands: ๐ข Strong (โฅ +0.50) / ๐ข Moderate (+0.20) / ๐ก Weak (+0.05) / โช Indistinguishable (ยฑ0.05) / ๐ด Regression (< โ0.05) |
| **Quadrant breakdown** | Counts each assertion as Signal (โ/โ) / Baseline (โ/โ) / Unreachable (โ/โ) / Regression (โ/โ). The dominant quadrant drives the plan |
| **Per-eval drill-down** | Only divergent assertions get a row; matching ones fold into header counts so the report stays under one screen |
6. **Improvement plan** โ โค 5 ranked proposals in a uniform format (`[TARGET]` or `[EVALS]` tag, location, current quote, replacement, "Lever" line). You pick: apply target edits, iterate on evals, both, or save the plan and stop. Re-runs land in a fresh `runs//` so iteration deltas stay legible.
**Isolation:** each run gets its own sandbox directory; a globally-installed copy of the target in `~/.claude/` is auto-hidden for the duration and restored afterward (with on-disk recovery manifest covering SIGKILL / power loss / segfault). Conditional-loading frontmatter (`path:` / `paths:`) is stripped from the copy installed into the `with` sandbox so the target loads unconditionally for every prompt โ without that, rules scoped to e.g. `paths: ["**/*.py"]` would stay dormant in both configs and the delta would collapse to 0.00. The source file is never modified.
**Key flags:** `--runs N` (default 1), `--configs with,without`, `--workers N`, `--model`, `--no-isolate-global`, `--restore-hidden`.
### Status Line
Pilot shell ships with its own advanced status line with real-time session metrics and spec progress:

All fields explained
**Line 1 โ Session Metrics** (separated by `|`):
| Widget | Description |
| ----------------- | ------------------------------------------------------------------------------- |
| **Model** | Active model in short form (`Opus 4.7 [1M]`, `Sonnet 4.6`) |
| **Context** | Effective context usage with progress bar and buffer indicator. Green < 80%, Yellow 80โ95%, Red 95%+ |
| **Lines changed** | `+added -removed` in session (hidden when `rate_limits` is available) |
| **Git** | Branch with staged (`+N`) / unstaged (`~N`) counts (hidden when `rate_limits` is available) |
| **5h / 7d usage** | Rate-limit percentage with pacing arrow and reset countdown (`5h: 42% โก 2h`). โก red = over pace, โฃ green = under pace. Read cross-platform from Claude Code's `rate_limits` stdin field (Pro/Max subscriptions on Claude Code 2.1.80+). Replaces lines+git when present. |
| **Cost** | Session cost in USD. Green < $1, Yellow $1โ5, Red $5+. Hidden when `rate_limits` is available โ on Pro/Max the subscription covers API usage, so the dollar figure is noise. |
| **Savings** | Token savings percentage from RTK proxy (`Savings: N%`). Always shown when RTK has data. |
**Line 2 โ Mode:**
- **Quick Mode:** `Quick Mode`
- **Spec Mode:** Plan name, type (`feature`/`bugfix`), phase (`plan`/`implement`/`verify`), progress bar, task count, and iteration count
**Line 3 โ Version & Session Info:**
`Pilot () ยท CC () ยท sessions ยท memories `
Pilot tier: Solo, Team, or Trial with time remaining. Claude subscription (Pro/Max/Team/Enterprise) detected via `claude auth status` and cached for 24 hours.
### Pilot Shell Console
A local web dashboard with different views and real-time notifications when Claude needs your input:

All views
Each view with project-specific data has an inline **Project Filter** dropdown โ switch projects without leaving the page. Dashboard stats tiles are clickable โ navigate directly to the relevant view.
| View | What it shows |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| **Dashboard** | Global command center โ 8 clickable stat cards, 4 recent cards (Specifications, Requirements, Sessions, Memories) with "Show all" links. Active specs as pills in the top bar, notification bell in top right. |
| **Sessions** | Browse past sessions with search. Copy the session ID and use `/resume ` in Claude Code to jump back into any session. |
| **Memories** | Browsable observations โ decisions, discoveries, bugfixes โ with type filters and search. Each memory shows its session โ click to navigate. |
| **Requirements** | PRD documents with view/annotate modes. Selected shown as a tab, others in a Previous dropdown. |
| **Specifications** | All spec plans with task progress, phase tracking, and iteration history. **Annotate mode** lets you mark up plans visually before approving โ select text or click **+** on any block to write a note. **Share with Teammate** generates a compressed share link; **Receive Feedback** imports their annotations with accept/reject controls |
| **Extensions** | All extensions โ local, plugin, and remote โ with team sharing via git, diff view, push/pull, and color-coded categories. |
| **Changes** | Git diff viewer with staged/unstaged files, branch info, and worktree context. **Review mode** adds inline annotations on diff lines โ the agent reads them directly before marking a spec as verified |
| **Usage** | Daily token costs, model routing breakdown, and usage trends |
| **Help** | Documentation, guides, and quick-start resources |
| **Settings** | One scrollable page (Spec Workflow / Console). Spec workflow toggles (branch isolation, ask questions, plan approval, **Model Switching**), reviewer toggles (spec review, changes review, optional Codex). Active model is set via `/model` โ no model dropdowns here. |
Visual plan annotation
When a spec plan is pending approval, the Specifications tab defaults to **Annotate mode**. Two ways to annotate:
- **Select text** โ highlight any passage and write a note via the floating toolbar
- **Click +** โ hover over any block to add a note without selecting text
Annotations save automatically. The agent reads them at the next checkpoint, revises the plan accordingly, and asks for approval again. This turns plan review into a conversation โ you mark what needs changing, the agent addresses it, you review again.
Spec sharing
Share specs and requirements with teammates for async review โ no cloud service required:
1. **Share** โ Click **Share with Teammate** to generate a compressed link. The entire spec and annotations are encoded into the URL fragment, so no data is transmitted to any server.
2. **Review** โ Your colleague opens the link in their Console (or on pilot-shell.com), sees your annotations, and adds their own feedback.
3. **Import** โ Click **Receive Feedback** to import their annotations. Each annotation has per-item **accept/reject** controls โ accepted annotations merge into your plan, rejected ones are removed from both the sidebar and inline markers.
Everything works locally. The link is self-contained โ the recipient doesn't need access to your machine, repository, or any shared service.
Code review
After a spec completes all automated checks, the agent prompts you to review the changes in the **Changes** tab:
1. **Enable Review mode** โ toggle it in the Changes tab header
2. **Annotate diffs** โ click **+** on any diff line to add an inline note. Annotations save automatically.
3. **Agent addresses feedback** โ the agent reads every annotation and resolves them before marking the spec as verified
This gives you a final quality gate with direct, line-level feedback โ the same workflow as a PR review, but before the code ever leaves your machine.
### Extensions
Rules, commands, skills, and agents โ all plain markdown files in `.claude/` (project) or `~/.claude/` (global). The Console Extensions page lets you browse, edit, compare, and share everything from one place:

Extension categories
| Extension | Location | When it loads |
| ------------ | ------------------- | ------------------------------------------- |
| **Skills** | `.claude/skills/` | Automatically when relevant |
| **Rules** | `.claude/rules/` | Every session, or by file type |
| **Commands** | `.claude/commands/` | On demand via `/command-name` |
| **Agents** | `.claude/agents/` | Spawned as sub-agents for specialized tasks |
Use `/setup-rules` to auto-generate rules from your codebase. Use `/create-skill` to capture workflows as reusable skills.
Scopes: Global, Project, Plugin
**Project** extensions live in `.claude/` โ commit them so teammates get them on `git clone`. **Global** extensions live in `~/.claude/` โ personal and available across all projects. Move extensions between scopes with one click.
**Plugin** extensions come from installed Claude Code plugins (`claude plugin install `). They appear as read-only items โ visible but not editable.
Team sharing & APM (Team tier)
Connect a git repository to share extensions across your team:
- **Push** local extensions to the team remote
- **Pull** remote extensions to your machine (global or project scope)
- **Compare** local vs remote with a built-in side-by-side diff view
- **Conflict detection** โ when local and remote differ, choose which version to keep
**APM format** โ check one box and your remote becomes an [APM package](https://microsoft.github.io/apm/introduction/key-concepts/), directly installable via `apm install owner/repo` by anyone using Copilot, Cursor, OpenCode, or Claude. Extensions are automatically converted to APM conventions on push:
| Pilot Shell | APM Remote |
| --- | --- |
| `rules/my-rule.md` | `instructions/my-rule.instructions.md` |
| `commands/my-cmd.md` | `prompts/my-cmd.prompt.md` |
| `skills/my-skill/SKILL.md` | `skills/my-skill/SKILL.md` |
| `agents/my-agent.md` | `agents/my-agent.agent.md` |
APM-compatible frontmatter is injected automatically. An `apm.yml` manifest is generated. Toggling APM on/off migrates existing extensions in a single commit.
### Customization
Customize everything Pilot auto-installs โ tweak the built-in `/spec` workflow, modify existing rules, register additional hooks, add agents, and adjust the auto-applied `settings.json`, `claude.json`, and `.mcp.json`. Source can be a **git repo** (team-wide) or a **local directory** (personal, no git needed). Team and Enterprise plans.
```bash
pilot customize install # Install and apply
pilot customize update # Pull latest (git) or re-read (local)
pilot customize status # Active source + drift warnings
pilot customize remove # Restore Pilot defaults
```
What you can customize
| Target | How |
|--------|-----|
| **Skills** (built-in workflows like `/spec`, `/prd`) | Overlay ops in `customization.json`: `insert_after`, `insert_before`, `replace`, `disable` โ or ship an entirely new skill folder |
| **Rules** | New rules are additive; same filename as a core rule overrides it |
| **Hooks** | Scripts copy to `~/.claude/hooks/`; ship `hooks.json` to register them alongside Pilot's built-ins |
| **Agents** | Drop `.md` files to add review/helper agents into `~/.claude/agents/` |
| **MCP servers** | Top-level `.mcp.json` deep-merges into `~/.claude.json` `mcpServers` (adds team servers alongside Pilot's; pack values win on conflict) |
| **Claude settings** | Top-level `settings.json` and `claude.json` deep-merge into the user's files โ model prefs, permissions, env vars, etc. User state (oauth, projects) is preserved |
Replaced skill fragments stay pinned to upstream by hash. `pilot customize status` surfaces drift when Pilot upgrades a replaced step; `pilot customize diff /` shows what changed so you can port improvements. See the [Customization guide](https://pilot-shell.com/docs/features/customization) for the full schema.
Repository / folder structure
The layout is the same whether you publish it as a git repo or keep it as a local folder. Directory names map 1:1 to `~/.claude/`:
```
my-customization/
โโโ customization.json # Required: metadata + optional skill overlays
โโโ settings.json # Optional: deep-merges into ~/.claude/settings.json
โโโ claude.json # Optional: deep-merges into ~/.claude.json
โโโ .mcp.json # Optional: deep-merges into ~/.claude.json `mcpServers`
โโโ skills/ # โ ~/.claude/skills/
โ โโโ spec-plan/steps/
โ โ โโโ security-review.md # New step injected into spec-plan
โ โโโ team-deploy/ # Brand-new skill
โ โโโ manifest.json
โ โโโ orchestrator.md
โ โโโ steps/01-stage.md
โโโ rules/ # โ ~/.claude/rules/
โ โโโ team-standards.md # Additive
โ โโโ testing.md # Overrides core
โโโ hooks/ # โ ~/.claude/hooks/
โ โโโ team-lint-check.sh
โ โโโ hooks.json # Registers team-lint-check.sh alongside Pilot's hooks
โโโ agents/ # โ ~/.claude/agents/
โโโ team-reviewer.md
```
Only ship the files and directories you need. A repo with just `rules/` is a valid customization; so is one with just `.mcp.json`.
### Pilot Bot
Run Claude Code as a persistent 24/7 automation agent with scheduled tasks, background jobs and heartbeat monitoring:
```bash
pilot bot # Launch automation session (auto-initializes on first run)
```
Pilot Bot defines scheduled jobs, automates recurring tasks, and monitor system health around the clock. If the [Telegram Channels plugin](https://github.com/anthropics/claude-plugins-official/tree/main/external_plugins/telegram) is installed, the bot auto-detects it and enables bidirectional messaging. Similar to OpenClaw, but without the added complexity and costs.
Bot skills
| Skill | Purpose |
|-------|---------|
| `bot-boot` | Boot sequence โ health check, job registration, heartbeat setup |
| `bot-heartbeat` | Periodic health checks, notifies only when issues are detected |
| `bot-jobs` | Manage scheduled jobs โ add, remove, pause, resume, edit, list |
| `bot-channel-task` | Channel task flow โ acknowledge, execute, report (when Telegram is available) |
| `bot-defaults` | Standard bot behaviors (dedup, reporting, error handling) |
### Claude CLI Flag Passthrough
All Claude Code CLI flags work directly with `pilot` โ current and future. Pilot forwards any flag it doesn't recognize to the Claude CLI automatically.
```bash
pilot --channels plugin:telegram@claude-plugins-official
pilot --model opus --verbose
pilot --resume
```
### Headless Mode
Run Pilot non-interactively with `-p` for CI/CD pipelines, scripts, and automated workflows. All Claude Code CLI flags work โ `--output-format`, `--allowedTools`, `--channels`, `--continue`, `--bare`, etc.
```bash
pilot -p "Run tests and fix failures" --allowedTools "Bash,Read,Edit"
pilot -p "Summarize this project" --output-format json
pilot --channels plugin:telegram@official -p "Check messages"
```
---
## Demo
A full-stack project โ created from **scratch with a single prompt**, then extended with **3 features built in parallel** using `/spec` and Git worktrees. Every line of code tested and verified by Pilot, zero manual code edits. **[Check out the Demo Project here โ](https://github.com/maxritter/pilot-shell-demo)**
---
## Under the Hood
For full details on every component, see the **[Documentation](https://pilot-shell.com/docs/)**.
| Component | What it does |
| --- | --- |
| [**Pilot Console**](https://pilot-shell.com/docs/features/console) | Local web dashboard at `localhost:41777` โ 10 views (Dashboard, Sessions, Memories, Requirements, Specifications, Extensions, Changes, Usage, Help, Settings). Port is configurable in Console Settings (`CLAUDE_PILOT_WORKER_PORT`). SQLite-backed, nothing leaves your machine |
| [**Pilot Bot**](https://pilot-shell.com/docs/features/bot) | Persistent 24/7 automation agent with scheduled jobs, background tasks, heartbeat monitoring, and optional Telegram integration for bidirectional messaging |
| [**Status Line**](https://pilot-shell.com/docs/features/statusline) | Real-time session dashboard below every response โ model, context usage, git status, cost, spec progress, and savings metrics across 3 lines |
| [**Model Switching**](https://pilot-shell.com/docs/features/model-routing) | `/model` is the only switch. Recommended flow: Opus for planning (the `spec-mode-guard` hook hard-blocks `/spec` on non-Opus), then `/model sonnet[1m]` for implementation. Two resume paths: Option A โ switch model and type any prompt (fast, carries planning context); Option B โ `/clear` then `/spec ` (fresh session, lower cost). The opt-out **Model Switching** toggle handles the pause. |
| [**Rules & Standards**](https://pilot-shell.com/docs/features/rules) | 10 built-in rules for workflow, testing, verification, debugging, code review, documentation sync, tooling, and MCP routing + 5 coding standards activated by file type (Python, TypeScript, Go, Frontend, Backend) |
| [**Context Optimization**](https://pilot-shell.com/docs/features/context-optimization) | Lean context strategies โ RTK output compression, Semble chunk-only code search, conditional rule loading, progressive skill disclosure, lazy MCP tool loading. Compaction resilience for 200K windows |
| [**Remote Control**](https://pilot-shell.com/docs/features/remote-control) | Control Pilot sessions from your phone, tablet, or any browser โ send prompts, monitor progress, and receive notifications remotely |
| [**Hooks Pipeline**](https://pilot-shell.com/docs/features/hooks) | 15 hook registrations across 7 events โ quality checks on every file edit (ruff, ESLint, go vet), TDD enforcement, token optimization via RTK (60โ90% savings), session continuity, memory capture, and session lifecycle management |
| [**Extensions**](https://pilot-shell.com/docs/features/extensions) | Unified view of skills, rules, commands, and agents across global, project, plugin, and remote scopes. Team sharing via git with push, pull, diff, and APM-compatible export |
| [**Customization**](https://pilot-shell.com/docs/features/customization) | Customize what Pilot auto-installs โ tweak built-in skills (spec, prd, etc.), modify rules, register additional hooks, add agents, and adjust auto-applied MCP / Claude settings. Source is a git repo (team-wide) or local directory (personal). Skill overlays (`insert_after` / `insert_before` / `replace` / `disable`) modify core workflows without full-file forks; fragments stay pinned to upstream by hash with drift detection. Team and Enterprise plans |
| [**Pilot CLI**](https://pilot-shell.com/docs/features/cli) | Session management, headless mode (`-p`) for CI/CD and scripts, worktree isolation, licensing, context monitoring. Run `pilot` or `ccp` to start |
| [**MCP Servers**](https://pilot-shell.com/docs/features/mcp-servers) | 7 preconfigured MCP servers for library docs, persistent memory, web search, GitHub code search, web page fetching, code knowledge graphs, and hybrid code search (Semble), plus the Chrome DevTools MCP plugin for browser automation |
| [**Language Servers**](https://pilot-shell.com/docs/features/language-servers) | Real-time diagnostics for Python (basedpyright), TypeScript (vtsls), Go (gopls). Auto-installed, auto-configured |
| [**Open Source Tools**](https://pilot-shell.com/docs/features/open-source-tools) | 20+ open-source tools installed alongside Pilot โ Semble (hybrid code search), CodeGraph (code intelligence), RTK (token optimization), language servers, and system prerequisites |
---
## What Users Say
> "Spec-driven development in Pilot Shell is incredible. I'm so impressed that I have to resist the urge to fix every issue all at once."
> "Instead of just letting Claude Code run on its own, you've managed to make it work in a much more organized, consistent, and reliable way within a workflow, which I think is fantastic. What you've built is truly impressive."
> "I have fallen in love with Pilot and just can't stand the idea of having to go back to native Claude."
---
## License
See [LICENSE](LICENSE).
---
## FAQ
Does Pilot Shell send my code or data to external services?
**No code, files, prompts, project data, or personal information ever leaves your machine through Pilot Shell.** All development tools โ code search (Semble), code intelligence (CodeGraph), persistent memory (Pilot Shell Console), session state, and quality hooks โ run entirely locally. No OS info, no version strings, no analytics, no telemetry, no heartbeats. Pilot Shell works fully offline between periodic license checks. If you enable the optional [Codex plugin](https://github.com/openai/codex-plugin-cc), adversarial reviews are sent to OpenAI's API โ this is opt-in and disabled by default.
Do I need a separate Anthropic subscription?
Yes. Pilot Shell enhances Claude Code โ it doesn't replace it. You need an active Claude subscription โ [Max 5x or 20x](https://claude.com/pricing) for solo developers, [Team Premium](https://claude.com/pricing) for teams, or [Enterprise](https://claude.com/pricing) for organizations with compliance or procurement requirements. Pilot Shell adds quality automation on top of whatever Claude Code access you already have.
Does Pilot Shell support AI models beyond Claude?
Pilot Shell is built for Claude Code and uses Anthropic's Claude models (Opus, Sonnet) for all planning, implementation, and verification. The [Codex plugin](https://github.com/openai/codex-plugin-cc) is included and adds OpenAI-powered adversarial review during `/spec` โ an independent second opinion on your plans and code changes. Run `/codex:setup` to authenticate, then enable the reviewers in Console Settings โ Reviewers.
Does Pilot Shell work with existing projects?
Yes โ that's the primary use case. Pilot Shell doesn't scaffold or restructure your code. You install it, run `/setup-rules`, and it explores your codebase to discover your tech stack, conventions, and patterns. From there, every session has full context about your project. The more complex and established your codebase, the more value Pilot Shell adds โ quality hooks catch regressions, persistent memory preserves decisions across sessions, and `/spec` plans features against your real architecture.
Does Pilot Shell work with any programming language?
Pilot Shell's quality hooks (auto-formatting, linting, type checking) currently support Python, TypeScript/JavaScript, and Go out of the box. TDD enforcement, spec-driven development, persistent memory, context optimization, and all rules and standards work with any language that Claude Code supports. You can add custom hooks for additional languages.
Can I use Pilot Shell on multiple projects?
Yes. Pilot Shell installs once globally and works across all your projects โ you don't need to reinstall per project. All tools, rules, commands, and hooks live in `~/.pilot/` and `~/.claude/`, available everywhere. Just `cd` into any project and run `pilot`. Each project can optionally have its own `.claude/` rules, custom skills, and MCP servers for project-specific behavior. Run `/setup-rules` in each project to generate project-specific documentation and standards.
Why does Pilot Shell use bypass permissions mode?
Pilot Shell sets Claude Code to `bypassPermissions` mode by default so the `/spec` workflow can run autonomously โ planning, implementing, and verifying without pausing for permission prompts at every tool call. This is what enables the end-to-end spec-driven development experience.
**In Quick Mode (regular chat), you have full control.** Press `Shift+Tab` at any time to cycle through [Claude Code's permission modes](https://pilot-shell.com/docs/features/permission-modes):
| Mode | Behavior |
| ---------------- | ----------------------------------------------------- |
| **Plan** | Claude proposes changes, you approve before execution |
| **Accept Edits** | File edits auto-approved, other actions still prompt |
| **Normal** | Fine-grained permission prompts for each tool call |
You can also set a persistent default in `~/.claude/settings.json` by changing the `defaultMode` field to `acceptEdits`, `default`, `plan`, or `dontAsk`. Pilot Shell preserves your choice across updates โ the installer merges permissions additively and never overwrites user customizations.
Can I add my own rules, commands, skills, and agents?
Yes. Create your own in your project's `.claude/` folder โ rules, commands, skills, and agents are all plain markdown files. Your project-level assets load alongside Pilot Shell's built-in defaults and take precedence when they overlap. `/setup-rules` auto-discovers your codebase patterns and generates project-specific rules. `/create-skill` builds reusable skills from any topic interactively. View and manage all extensions on the Console Extensions page.
For monorepos, organize rules in nested subdirectories by product and team (e.g. `.claude/rules/my-product/team-x/`). Team-level rules must use `paths` frontmatter so they only load when working on relevant files. `/setup-rules` validates this structure, enforces path-scoping, and generates a `README.md` to document the organization.
Can I customize Pilot's built-in workflows and defaults?
Yes โ the **Customization** feature (available on the Team plan) lets you modify what Pilot Shell auto-installs, not just add alongside it. Tweak the built-in `/spec` workflow (insert a security-review step, replace the planning template, disable a step you don't need), adjust existing rules, register additional hooks, add review agents, change which MCP servers get configured, and override the auto-applied `settings.json` and `claude.json`. Source is either a **git repo** for your team or a **local directory** for personal use โ no git needed for a one-off tweak. Skill overlays stay pinned to Pilot's upstream by hash, so when Pilot ships an improvement to a step you replaced, `pilot customize status` flags the drift and `pilot customize diff` shows you what changed. See the [Customization guide](https://pilot-shell.com/docs/features/customization) for the full schema.
Can I use Pilot Shell inside a Dev Container?
Yes. Copy the `.devcontainer` folder from this repository into your project, adapt it to your needs (base image, extensions, dependencies), and install Pilot Shell inside the container. Everything works the same โ hooks, rules, MCP servers, persistent memory, and the Console dashboard all run inside the container. This is a great option for teams that want a consistent, reproducible development environment.
For tighter isolation when working with untrusted code, layer Claude Code's [`/sandbox`](https://code.claude.com/docs/en/sandboxing) on top โ the Dockerfile pre-installs `bubblewrap`, `socat`, `iptables`, and `ipset` so it works out of the box. See Anthropic's [development containers](https://code.claude.com/docs/en/devcontainer) and [sandboxing](https://code.claude.com/docs/en/sandboxing) docs for the hardening patterns.
Pilot feels slower after a few weeks โ what should I do?
Claude Code's session logs and Pilot's caches grow over time and can degrade performance. A periodic reset every few weeks restores a clean baseline:
1. Run `/logout` inside Claude Code.
2. Back up `~/.claude.json`, `~/.claude/`, and `~/.pilot/` (`mv` them to `.bak` copies).
3. Reinstall Pilot Shell with the official installer (`curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash`).
4. Run `pilot`, sign in to Claude again, and re-activate with `pilot activate `.
Once Pilot is running smoothly, delete the `.bak` copies. Forgot your license key? Recover it in the [Pilot members area](https://polar.sh/max-ritter/portal). Full step-by-step is in the [Reset & Refresh](#install) section above.
What's the difference between pilot and pilot bot?
**`pilot`** is for interactive development โ you chat with Claude, use `/spec` for planned work or quick mode for ad-hoc tasks, and drive every session yourself. It's your daily coding tool.
**`pilot bot`** is for unattended automation โ it launches Claude Code as a persistent background agent that runs 24/7. You define scheduled jobs (health checks, deployments, monitoring) and the bot executes them on a cron schedule with heartbeat monitoring. If the [Telegram plugin](https://github.com/anthropics/claude-plugins-official/tree/main/external_plugins/telegram) is installed, you can send tasks to the bot from your phone and receive results back. Think of `pilot` as your IDE and `pilot bot` as your ops assistant.
---
## Changelog
See the full changelog at [pilot.openchangelog.com](https://pilot.openchangelog.com/).
---
## Contributing
Found a bug or missing a feature? [Open an issue](https://github.com/maxritter/pilot-shell/issues) on GitHub.
---
**How real engineers run Claude Code.**