An open API service indexing awesome lists of open source software.

https://github.com/claude-code-expert/carve-harness

Harness Engineering for Project
https://github.com/claude-code-expert/carve-harness

Last synced: about 11 hours ago
JSON representation

Harness Engineering for Project

Awesome Lists containing this project

README

          


carve-harness — carve away the excess, keep the craft

Keep what's essential. Carve away the rest.

한국어 · English

## Update
> **Changelog** — latest 3 shown; full history in [CHANGELOG.md](CHANGELOG.md)
> - `2026-06-21` **v1.8.0** — Removed skill double slash-exposure (`/`+`/carve-`): prefixed all 16 skill directories to `carve-` so each collapses to a single `/carve-` slash (built-in collision class gone), and retired the command-shim layer (skills support `$ARGUMENTS` natively). Catalog ids stay bare; re-running `carve install` auto-cleans the old paths + legacy shims from prior installs (hash-guarded)
> - `2026-06-15` **v1.7.0** — Procedure-enforcement skill (`workflow`·`/carve-workflow`): a Fable5-style 7-step loop (goal→decompose→criteria→assumptions→execute→verify→risks) + minimal output format · completion gate · escalation. Conducts existing assets (iterate·sprint-contract); net-new principles (distrust·externalized-decisions·scope-convergence) are injected always-on into CLAUDE.md·sprint-contract·squad-evaluator
> - `2026-06-15` **v1.6.0** — v2.0 roadmap M12 (closed-loop feedback: extract `src/metrics.ts` aggregation + designer demote suggestions + surface in `carve update`/`report`, recommendations unchanged) · M11 Phase A (bench measurement infra: `collect.mjs`·`gen-fixture.mjs`·`test-trigger.sh` trigger 17/17·`report.mjs` axes 3·4) · first publish since 1.4.1, so it also delivers v1.5.0's 7-component deletion + orphan auto-clean

# carve-harness

**carve-harness** is a harness-engineering tool for developers that assembles only the essentials for development — nothing more.

> A CLI that analyzes a project and interactively selects and installs a harness (skills, hooks, subagents) tailored to that project.

**v1.8.0** · TypeScript (ESM, no build step) · Node >=22.18 · 289 tests / ~88.5% coverage

`carve` reads the codebase to detect the project type and tooling, then recommends suitable components.
It installs only what the user selects into `.claude/`. carve = carving general-purpose assets down to fit a project.

Optimize per project: say **"set up a harness for this project"** to compose the full harness (with optional
component selection), use `harness-audit` to actually check the integrity of the current install, or use
`harness-architect` to pick and prune components to fit the project.

The core behavior is verified in a single line:

```
carve install → stack detection → (component selection) → generate assets into .claude/
→ the generated verification hooks deterministically block dangerous commands with exit code 2
```

## Features

- Token efficiency built in: codesight (structure-map MCP) and LSP (cclsp MCP) are auto-registered at install time — precise search instead of grep, with no separate installation.
- Deterministic safety: dangerous commands (`rm -rf /`, fork bombs) and secret files (`.env`, keys) are forcibly blocked with exit code 2 (not just advisory).
- Tailored selective installation: detection → recommendation → user selection. No bulk install; idempotent reinstall and clean removal.
- anti-slop generation: AI slop in HTML, SVG, and documents is gated by a linter.
- 100% preservation of Squad subagents: 9 specialists (+evaluator) with keyword routing and chaining.
- Self-verification: before installing, the auditor scans generated assets for secrets, excessive permissions, hook injection, and shell syntax.
- Zero build: runs `.ts` directly. Distributed via both npx and bash.

## Cheatsheet — all commands

> An at-a-glance index of every command. For step-by-step install/removal see the **Install** and **Removal** sections below; for scenario usage see **Going deeper**. Flags not in the table don't exist (don't invent options).

### CLI commands (global install)

| Command | What it does | Key options |
|---------|--------------|-------------|
| `carve install` | Interactive selective install (detect → recommend → select; no bulk) | `--level ` · `--only a,b` · `--lsp-servers` |
| `carve init-claude` | Generate CLAUDE.md baseline + stack rules | `--lang ` (default `en-ko`) |
| `carve list` | List installable / installed components | — |
| `carve doctor` | Audit the install (security, permissions, shell syntax) | — |
| `carve diff` | 3-way compare installed vs current carve assets (read-only) | — |
| `carve update` | Refresh carve-updated assets in place; your edits kept as `.bak` | `--force` · `--yes` |
| `carve migrate` | Promote carve-manifest schema v1 → v2 (lossless) | — |
| `carve report` | Aggregate what the installed hooks actually blocked (opt-in, no network) | — |
| `carve uninstall` | Clean removal (carve files only · restore `.bak` · preserve your settings) | — |
| `carve --version` | Print version | `-v` |
| `carve --help` | Print help | `-h` |

### In a session (natural language or slash)

| What you want | How to call |
|---|---|
| Commit message | "write a commit message" · `/carve-commit` |
| Code review | "review this" · `/squad review` |
| Session handoff | "handoff" · `/carve-handoff` |
| Verify loop (`build→lint→test→typecheck`) | "run the verify loop" · `/verify` |
| Auto-fix until green | "fix it until it passes" · `iterate` |
| Procedural completion of long tasks (7 steps) | `/carve-workflow` |
| Slop-free HTML / docs | "make a slop-free html" (gated by `check-slop` after generation) |
| Call a Squad specialist | `/squad ` — review · plan · refactor · qa · debug · docs · gitops · audit · evaluator |

> **Automatic hooks (no need to call)**: `block-destructive`·`protect-secrets` deterministically block dangerous commands / secret files with `exit 2`; `pre-commit-lint`·`pre-push-test` enforce before commit/push; `auto-format` runs after save; `precompact-handoff` preserves state before compaction.

## Install — global install recommended

> Full manual: [INSTALL.md](./INSTALL.md) (Korean) · [INSTALL.en.md](./INSTALL.en.md) (English) — requirements, modes, troubleshooting.

Install `carve` as a permanent command and every `carve …` in these docs (especially the recurring `update`/`diff`/`doctor`) works as written. **For the full feature set, a global install is recommended.**

> **Two distinct installs**: `npm i -g carve-harness` installs the **carve CLI (the tool)** on your system once; `carve install` uses that CLI to install the **harness into the current project** (`.claude/`). The former is once per machine; the latter is per project.

### 1. First install

```bash
npm i -g carve-harness # install the carve CLI (the tool) — once per machine
carve install # interactive selective install into the current project (detect → recommend → select)
carve init-claude # CLAUDE.md baseline + language stack rules
carve doctor # inspect install (config, hook syntax)
```

After install, just **open Claude Code** in the project: hooks and MCP (codesight·LSP) turn on automatically, and skills/Squad are called via natural language or `/carve-`.

> **Just trying it once (no global)**: `npx carve-harness@latest install` (prefix every run with `npx carve-harness@latest `). But `npx` is one-shot — it leaves no `carve` on PATH, which is awkward for recurring commands like `update`/`uninstall`, so **`npm i -g` is recommended for full use**. (repo-clone/`curl` users can use the `install.sh` wrapper — see [INSTALL.md](./INSTALL.md).)

### 2. Install options (non-interactive · forced selection)

Skip the wizard and pin the level/components with flags. (For what the levels mean, see **Install levels** below.)

```bash
carve install --level full # Force level (minimal|standard|full)
carve install --only commit,handoff,tdd # Explicit selection (no bulk install)
carve install --lsp-servers # Auto-install LSP language servers
```

### 3. Update

The CLI (the tool) and the project harness are updated separately.

```bash
npm i -g carve-harness@latest # 1. update the carve CLI (the tool)
carve diff # 2. (optional) 3-way compare installed vs latest assets (read-only)
carve update # 3. update the project harness — carve-updated files only, your edits kept as .bak (--force · --yes)
```

#### Existing users (v1.3.4 hook-path fix)

Projects installed with v1.3.3 or earlier have hook commands written as **relative paths** (`bash .claude/hooks/carve-*.sh`) in settings.json. When Claude Code runs a hook from a directory other than the project root, they fail with `No such file or directory`. From v1.3.4 they are registered with absolute paths (`$CLAUDE_PROJECT_DIR`).

If you already installed, fix it either way:

```bash
# Recommended — auto-migrate, no reinstall
npm i -g carve-harness@latest # 1. get the fixed carve CLI (migration won't run without this)
carve update # 2. one-time in-place rewrite of carve hooks to absolute paths (idempotent · user hooks untouched)

# Or — clean reinstall
carve uninstall && carve install
```

## Removal

Remove just the harness, or the CLI (the tool) too.

```bash
carve uninstall # 1. remove the project harness — carve files only · restore .bak · preserve your settings
npm uninstall -g carve-harness # 2. (optional) remove the carve CLI (the tool)
```

## Daily workflow — natural language in a session

Four commands cover everyday use. Call them in natural language or via `/carve-`.

| What you want | How to call | What it does |
|---|---|---|
| Commit message | "write a commit message" · `/carve-commit` | Generates a Conventional Commit (quick inline) |
| Code review | "review this" · `/squad review` | Keyword auto-delegation → squad-review |
| Session handoff | "handoff" · `/carve-handoff` | Leaves progress/decisions/next steps for the next session |
| Remember a decision | "remember this" · `/memory` | Claude Code built-in memory |

And the **block / protect / format hooks run automatically — no need to call them**: dangerous commands (`rm -rf /`, fork bombs) and secret files (`.env`, keys) are stopped with `exit 2` (not advisory), the formatter runs on save, and lint/test are enforced before commit/push.

> This is "easy carve" — 3 lines to install, 4 for daily use. Go below only when you want to dig deeper.

## Going deeper — commands by scenario (the more you use, the more advanced)

Add them one at a time as needed. All are part of the install (by level) and cost no context until you call them (on-demand loading). Two ways to call: **skills** via natural language or `/carve-`; **Squad specialists** via `/squad ` or `/squad-` (the `squad-…` entries below are those members).

**Code quality & verification**
- `/verify` (Claude Code built-in) — `build→lint→test→typecheck` in one go ("run the verify loop")
- `iterate` — diagnose→fix→re-run until tests are green, report only the final result ("fix it until it passes")
- `workflow` (`/carve-workflow`) — redefine goal→decompose→criteria→assumptions→execute→verify→risks: a 7-step procedure for long-running tasks ("run it as a procedure", "Fable")
- `squad-refactor` extract/simplify · `squad-debug` root cause · `squad-evaluator` independent evaluation against completion criteria (Self-Eval Blindspot)

**Testing**
- `test-gen` tests from UAT criteria · `tdd` red-green-refactor first · `squad-qa` test execution & QA report

**Security**
- `squad-audit` security audit & vulnerability scan *(the `security-scan` skill is deprecated as of v1.4.0 — consolidated into squad-audit)*

**Release & collaboration**
- `squad-gitops` commits/PRs/changelog · `squad-docs` docs · `squad-plan` planning/user stories *(PR body via built-in `/pr` or squad-gitops — the `pr`·`changelog` skills are inactive)*

**Docs & visuals (anti-slop)**
- When generating HTML, SVG, card news, reports, and slides, AI slop (gradients, glow, watermarks, etc.) is removed and `check-slop` gates deterministically ("make a slop-free html")

**Multi-agent & cost**
- `parallel-agents` 3–4 in parallel + git worktree isolation · `model-route` Haiku/Sonnet/Opus routing · `evaluator-tuning` few-shot evaluator correction (optional — install by explicit selection) *(`coordinator` is deprecated — parallel-agents suffices)*

**Other helpers** — `caveman` ultra-compression (~75% fewer tokens) · `write-a-skill` skill scaffolding · `zoom-out` system-level view. *(tdd, caveman, write-a-skill, zoom-out are rewrites of [mattpocock/skills](https://github.com/mattpocock/skills) patterns, MIT)*

**9 Squad specialists** — call via `/squad [task]` (e.g. `/squad review`) or directly `/squad-` (e.g. `/squad-refactor src/`); keyword auto-delegation also works. Members: review · plan · refactor · qa · debug · docs · gitops · audit · evaluator (the actual agent name is `squad-`).

**Automatic hooks (event-based · no need to call)** — blocking hooks are deterministic `exit 2`, not advisory:
`block-destructive` (dangerous commands) · `protect-secrets` (.env/keys) · `pre-commit-lint` (before commit) · `pre-push-test` (before push) · `auto-format` (after save) · `precompact-handoff` (persists state before compaction) · `slack-notify` (at session end, if a webhook is set) · `auto-commit` (optional, OFF by default).

### Harness lifecycle management (CLI)

Commands for managing the harness itself after install. You can ignore these day-to-day — reach for them only when a new carve ships or you want to inspect the install.

```bash
carve list # List installable / installed components
carve diff # 3-way compare installed assets vs current carve assets (read-only)
carve update # Refresh carve-updated assets in place, preserving your edits (--force · --yes)
carve migrate # Promote carve-manifest schema v1 → v2
carve report # Aggregate what the installed hooks actually blocked (opt-in, no network)
carve uninstall # Clean removal — removes only carve files, restores .bak, preserves your settings
```

Within a session, the `harness-audit` skill checks install integrity (hook registration, shell syntax, assets).

## Install levels (auto-determined by profile, forceable with `--level`)

Core skills, the 9 Squad agents, and anti-slop are recommended at *every level*; what changes by level is the **number of hooks and additional skills**.
- `minimal` — small CLI/library/batch: core + 9 Squad + anti-slop + **3 essential hooks** (block, protect, handoff)
- `standard` (default) — general apps: minimal + **the remaining core hooks** (7 total: +lint, test, format, Slack)
- `full` — standard + **additional skills** (iterate, test-gen, tdd, parallel-agents, model-route, etc. — deprecated/hidden components are excluded automatically)

For the force-level (`--level`), explicit-selection (`--only`), and LSP-auto-install commands, see **Install → Install options** above.

> The score (the number in parentheses in `carve list`, ≥75) is carve's internal usefulness assessment. For per-level defaults and the full component list, see [INSTALL.en.md](./INSTALL.en.md).

Supported projects: CLI · web · mobile · responsive · desktop · batch.

## CLAUDE.md baseline + stack rules (`carve init-claude`)

After installation, running `carve init-claude` carves out a working-guideline baseline and per-language stack rules.

- `.claude/CLAUDE.md` — stack-agnostic baseline: think before coding, simplicity, surgical changes, TDD, commit discipline, response control, hallucination guard, safety guardrails.
- `.claude/rules/*.md` — 6 best-practice rules for the detected language (`techstack`, `project-structure`, `commands`, `code-style`, `safety`, `gotchas`) + a stack-agnostic `anti-ai-slop` rule (visual/document slop prevention).
- The root `CLAUDE.md` is automatically wired to `@import` these (idempotent). They are loaded together each session.

The stack is auto-selected by the detected language (TypeScript/JavaScript, Python, Go, Rust, Java, Dart; otherwise `_default`). Package manager and test/lint commands are substituted with values detected from the project. Within a session, the harness-architect skill guides the same flow as the "CLAUDE.md setup" step.

## anti-slop visual and document generation

When creating HTML, SVG, card news, reports, slides, and documents, it removes AI-characteristic decorations (gradients, glow/colored shadows,
glassmorphism, motion decorations, watermarks, marketing boilerplate) and builds hierarchy through size, spacing, alignment, and typography.
The rules are injected by the skill before generation, and after generation the `check-slop.mjs` linter checks deterministically.
A script gates it — not the model's eyeballing (warning mode, with an exception path for intentional use).

## Token efficiency (built in)

codesight and LSP are auto-registered at install time, so token-efficient search applies without the user installing anything separately.

- codesight MCP: pre-maps the project structure (routes, schemas, dependencies) → eliminates the cost of repeated grep searches (measured ~11x on average for large codebases).
- LSP (cclsp MCP): precise search via `findReferences`/`getDiagnostics` → about 500 tokens instead of 2,000+ tokens for grep.
- Guidance is added to `flight-rules.md` and `CLAUDE.md` so that all skills and Squad subagents prefer these over grep.
- Language server binaries are auto-installed for the detected language during interactive install (or `carve install --lsp-servers`).

> Verification of savings figures via large fixture benchmarks is planned. For small one-off tasks, the effect may be small due to the fixed MCP cost.

## Safety

- Dangerous commands (`rm -rf /`, fork bombs, etc.) and secret files (`.env`, keys) are blocked by PreToolUse hooks with exit code 2.
- Lint before commit and test before push are enforced.
- Before installation, the auditor scans generated assets for secret exposure, excessive permissions, and hook injection (must pass to install).

## Architecture

```
analyzer → catalog → (wizard selection) → designer → generator → auditor → installer
```

It distinguishes two layers: Layer A is the carve CLI itself (`bin/`, `src/`, `assets/`, `vendor/`),
and Layer B is the artifacts carve installs into the target project (`/.claude/`).
For details see [ARCHITECTURE.md](./ARCHITECTURE.md), and for requirements see [requirement.md](./requirement.md).

## Development

Written in TypeScript (ESM), but with no build step during development. It runs `.ts` directly via type stripping in Node >=22.18.
(For distribution, type stripping is blocked in `node_modules`, so `prepack` compiles `.ts`→`.js` and ships it.)

```bash
npm test # Unit + E2E (node --test)
npm run test:cov # Coverage gate (>=80)
npm run check # Type check (tsc --noEmit)
npm run build # Distribution compile (tsconfig.build.json, in-place .js)
```

Milestone progress log: [docs/milestones/](./docs/milestones/)

## Release (npm publishing)

Publishing is **automatically published from main by GitHub Actions when a version tag (`vX.Y.Z`) is pushed** (`.github/workflows/release.yml`).
`npm publish` automatically runs `prepublishOnly` (type check + tests) and `prepack` (build), so it will not publish if tests fail.

For the full order (develop development → main promotion → tag publishing), see **[docs/release/RELEASE.md](./docs/release/RELEASE.md)**.

## Quantitative evaluation (internal measurement)

Internally measured against 6 axes ([carve-harness-benchmark-criteria.md](./docs/guide/carve-harness-benchmark-criteria.md)).
Deterministic items are reproduced via `node bench/run.mjs`. Measured 2026-05-31 · **at v1.1.0** (not yet re-measured for later versions; the measured axes are architecture-level, so largely still valid — figure re-validation is planned).

**Evaluation axes**

| Axis | What is measured | carve differentiator |
|----|-----------|-------------|
| 1. Speed/efficiency | Token, time, $, KV-cache, context injection cost | ★ Core — "carved lightweight" |
| 2. Control/safety | Block accuracy, permission leak rate, false blocks, determinism | Deterministic hooks vs advisory (0% vs N% leakage) |
| 3. Prompt verification | Trigger accuracy, false firing, routing, instruction adherence | Borrows the Squad test-router pattern |
| 4. Context verification | Occupancy, compaction retention rate, premature completion, on-demand loading | Adheres to the 40% rule |
| 5. Functional E2E | Skill firing, hook triggering, E2E pass, regression safety | Playwright verification |
| 6. Composition quality | Composition accuracy (F1), over-generation, omission, idempotency, audit | ★ carve-specific — competing harnesses have nothing to measure here at all |

**Measurement results**

| Axis | Score | Measured value |
|----|:--:|--------|
| 1. Speed/efficiency | Pending | Install footprint full 49 → minimal selection 7 files (**85.7% reduction**) |
| 2. Control/safety | **100** | Block 100% · leakage 0% · false block 0% · determinism 100% |
| 3. Prompt verification | **100** | Keyword routing 100% · false firing 0% |
| 4. Context verification | Pending | 14 on-demand skills split into individual files |
| 5. Functional E2E | **100** | Tests 96/96 · hook firing 8/8 |
| 6. Composition quality | **100** | Type-detection F1 100% · audit 0 issues · idempotency 100% · no over-generation |

### Score rationale (why these results)

- **2. Control/safety = 100**: 13 dangerous seeds (8 destructive commands + 5 secret files) were injected and all blocked with `exit 2` (block 100%, leakage 0%);
9 safe seeds had 0% false blocks, and `rm -rf /` repeated 5 times was blocked every time (determinism 100%). Because they are **deterministic code hooks** rather than advisory, leakage is structurally 0.
- **3. Prompt verification = 100**: 8 keyword seeds (review, test, debug, security, refactoring, planning, docs, commit) were fed into the Squad router and
all delegated to the correct agent (routing 100%), with 0% false firing for 3 non-trigger seeds. (Instruction adherence requires an LLM session → only routing and false firing measured.)
- **5. Functional E2E = 100**: all 96 unit+E2E tests pass, with syntax and `exit code` firing verified for 8 hooks. Includes PoC pass scenarios.
(Playwright live-app verification was replaced with harness-behavior E2E since there is no target app.)
- **6. Composition quality = 100**: type-detection F1 100% across 5 fixtures (cli/web/mobile/desktop/batch), 0 auditor ERRORs in generated assets,
identical `settings.json` on reinstall (idempotency 100%), and no over-generation since only what was chosen with `--only` is installed.
- **1. Speed/efficiency = Pending**: the structural basis of the carving effect (recommended 49 files → minimal selection 7 files, 85.7% reduction) was measured, but
the core metrics (token, time, $, KV-cache) require running the same task with other harnesses via an LLM to compare → score pending.
- **4. Context = Pending**: the on-demand loading structure (14 skills in individual files) was measured, but occupancy, the 40% rule, compaction retention, and premature completion
require live-session measurement → score pending.

> Honest disclosure: the self-measurable axes 2, 3, 5, and 6 are deterministically full marks. The comparative/live metrics of axes 1 and 4 are
> withheld without estimation (criteria §10). Proving comparative advantage still requires running `bench/` against other harnesses.
> Per-metric one-line evaluation table: [carve-harness-benchmark-results.md](./docs/guide/carve-harness-benchmark-results.md).

### Live cross-harness measurement (n=5, CRUD, same model, `claude -p`)

| harness | $/task (median) | tokens (median) | E2E success | leak rate (axis 2) |
|---------|:--:|:--:|:--:|:--:|
| no-harness | $0.101 | 3,554 | 5/5 | 100% |
| squad | $0.148 | 6,106 | 5/5 | 100% |
| **carve** | $0.159 | 7,076 | 5/5 | **0%** |
| ecc | $0.382 | 13,314 | 5/5 | — |

> **Scope/interpretation**: this is a measurement of one-off, simple CRUD (n=5) in a small project. In this range the fixed harness overhead (context, MCP) is large, so **it is normal for the harness to be at a disadvantage on token/$** (small projects rarely show token gains from a harness), and the token advantage appears in **medium-to-large codebases**. What carve proves here is not token savings but **safety (0% leakage), equal success, and being lighter than ECC**. Leak rate = the proportion of dangerous commands that pass unblocked (only carve has a deterministic blocking hook; ecc's "—" is not comparable due to a different mechanism).

- **carve vs ECC**: cost **58%↓** · tokens **47%↓** · equal success (5/5) — ECC injects globally (129 skills + rules), while carve carves and installs only what is needed → empirically proving "tailored lightweight."
- **carve vs no-harness**: for one-off CRUD, context injection raises cost (1.57×), but carve's advantage is not tokens but **safety (0% leakage vs 100% — only carve has deterministic hooks)**.
- The v1.0 codesight/LSP token-efficiency savings are an addition made after the above measurement, to be re-measured with large fixtures (small one-offs show little effect due to the fixed MCP cost).
- Measurement method and all 28 metrics: [carve-harness-benchmark-results.md](./docs/guide/carve-harness-benchmark-results.md).

## Credits

Some additional skills (`tdd`, `caveman`, `write-a-skill`, `zoom-out`) were inspired by patterns from [mattpocock/skills](https://github.com/mattpocock/skills) (MIT) and
rewritten into the carve format.

## License

MIT