https://github.com/joshft/correctless
The agent that writes the code never reviews it. Spec-driven TDD with agent separation, adversarial QA, and dynamic rigor for Claude Code.
https://github.com/joshft/correctless
ai-development ai-workflow claude claude-code code-quality developer-tools security tdd
Last synced: about 20 hours ago
JSON representation
The agent that writes the code never reviews it. Spec-driven TDD with agent separation, adversarial QA, and dynamic rigor for Claude Code.
- Host: GitHub
- URL: https://github.com/joshft/correctless
- Owner: joshft
- License: mit
- Created: 2026-03-27T17:54:37.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-06-21T16:42:34.000Z (7 days ago)
- Last Synced: 2026-06-21T18:20:41.378Z (7 days ago)
- Topics: ai-development, ai-workflow, claude, claude-code, code-quality, developer-tools, security, tdd
- Language: Shell
- Homepage: https://joshft.github.io/correctless/
- Size: 4.69 MB
- Stars: 64
- Watchers: 0
- Forks: 2
- Open Issues: 31
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
[](https://scorecard.dev/viewer/?uri=github.com/joshft/correctless)
[](https://github.com/joshft/correctless/actions/workflows/ci.yml)
[](https://opensource.org/licenses/MIT)
[](docs/skills/)
[](CHANGELOG.md)
Composable [Claude Code](https://docs.anthropic.com/en/docs/claude-code) skills that enforce a correctness-oriented development workflow. Spec before you code. Test before you implement. Never let an agent grade its own work.
Built with Correctless.
## The Problem
AI coding assistants are fast but sloppy. They write code that works for the happy path, skip edge cases, and silently introduce bugs that don't surface until production. The same model that wrote the code will review it and say "looks good" — because it's confirming its own decisions.
Correctless fixes this by structuring the workflow so that **every phase is executed by a different agent with a different lens**:
- The **spec agent** asks "what does correct mean?" and researches current best practices before any code exists
- The **review agent** reads the spec cold and checks for security gaps, unstated assumptions, and untestable rules
- The **test agent** writes tests from the spec without knowing the implementation plan
- The **test auditor** checks whether those tests would actually catch bugs or just pass against mocks
- The **implementation agent** makes the tests pass without having written them
- The **QA agent** hunts for bugs with neither the test author's nor the implementer's blind spots
- The **verification agent** checks spec-to-code correspondence without insider knowledge
Same model — but the framing determines what the agent finds.
## Quick Start
You need [Claude Code](https://docs.anthropic.com/en/docs/claude-code) and a Claude Max subscription ($100-200/mo).
### Install
```
/plugin marketplace add joshft/correctless
/plugin install correctless
/csetup
```
Alternative: Git clone
```bash
git clone https://github.com/joshft/correctless.git .claude/skills/workflow
.claude/skills/workflow/setup
/csetup
```
Standard intensity by default. To increase: add `"intensity": "high"` or `"critical"` to the `workflow` section of `.correctless/config/workflow-config.json`.
### First Feature
```
git checkout -b feature/my-feature
/cspec
```
### Update
```
/plugin uninstall correctless
/plugin marketplace remove correctless
/plugin marketplace add joshft/correctless
/plugin install correctless
```
Then restart Claude Code. Git clone users: `cd .claude/skills/workflow && git pull && ./setup`
## One Plugin, Three Intensity Levels
Correctless ships as a single plugin with 33 skills. You choose the intensity that matches your project's risk profile. Seven skills are gated behind intensity thresholds — they check your project's `workflow.intensity` setting and warn if invoked below their minimum.
| Intensity | Overhead | What You Get | Best For |
|-----------|----------|--------------|----------|
| **standard** | ~10-15 min | 19 core skills: spec, review, TDD, verify, docs, debug, refactor, release | SaaS, APIs, CLI tools, content sites |
| **high** | ~30-60 min | + adversarial spec review, convergence auditing, architecture tracking | Auth, payments, sensitive data |
| **critical** | ~1-2 hours | + Alloy formal modeling, live red team assessment | Security infrastructure, crypto, proxies |
Skills like `/cpostmortem` and `/cdevadv` are available at all intensity levels — they're about learning from the past, not adding rigor to the present.
**Put another way:** Standard intensity is like having someone next to you going through a checklist to make sure your project has some sanity. Critical intensity is like taking your Claude Max subscription tokens, setting them on fire, collecting the ash, and using it to create a tiny diamond.
## How It Works
### The Standard Workflow
```mermaid
graph LR
A["/cspec
Write spec"] --> B["/creview
Skeptical review"]
B --> C["/ctdd"]
C --> D["/cverify
Rule coverage"]
D --> E["/cdocs
Update docs"]
E --> F["Merge"]
subgraph "/ctdd — Enforced TDD"
direction LR
C1["RED
Write tests"] --> C2["Test Audit
Would tests catch bugs?"]
C2 --> C3["GREEN
Implement"]
C3 --> C4["/simplify"]
C4 --> C5["QA
Hostile review"]
C5 -.->|"Issues found"| C3
end
C --- C1
style A fill:#339af0,color:#fff
style B fill:#e599f7,color:#000
style C1 fill:#ff6b6b,color:#fff
style C3 fill:#51cf66,color:#fff
style C5 fill:#ffd43b,color:#000
style F fill:#51cf66,color:#fff
```
Each box is a separate agent. The test writer doesn't know the implementation plan. The QA agent didn't write the tests. A PreToolUse hook blocks source code edits until tests exist — this isn't a suggestion, it's enforced by bash. See the [Standard Workflow Guide](docs/standard-workflow.md) for state machine diagrams, hook architecture, and phase gating details.
### The Critical Workflow
```mermaid
graph LR
A["/cspec
Typed invariants"] --> B["/cmodel
Alloy modeling"]
B --> C["/creview-spec
6-agent adversarial"]
C --> D["/ctdd
+ mutation testing"]
D --> E["/cverify
+ drift detection"]
E --> F["/cupdate-arch"]
F --> G["/cdocs"]
G --> H["/caudit
Olympics"]
style B fill:#ff922b,color:#fff
style C fill:#e599f7,color:#000
style H fill:#ff922b,color:#fff
```
### Intensity Detection
You don't have to pick intensity manually for every feature. `/cspec` evaluates signals in your spec — file paths touching auth/payments, STRIDE threat keywords, compliance references, antipattern history — and recommends the right intensity. You confirm or override.
```mermaid
graph LR
A["Feature request"] --> B["/cspec"]
B --> C{"Intensity
detection"}
C -->|"CRUD endpoint"| D["standard"]
C -->|"Auth + payments"| E["high"]
C -->|"Crypto + HIPAA"| F["critical"]
D --> G["Review + TDD"]
E --> H["+ /caudit, /creview-spec"]
F --> I["+ /cmodel, /credteam"]
style D fill:#51cf66,color:#fff
style E fill:#ffd43b,color:#000
style F fill:#ff6b6b,color:#fff
```
### Defense in Depth
Prompt-level instructions fade as context fills — enforcement that depends on the model is a suggestion. Correctless uses four independent layers:
```mermaid
graph TD
A["Agent wants to
edit source file
during QA phase"] --> B["Layer 1: Gate
PreToolUse hook"]
B -->|"BLOCK"| C["Edit prevented
Model can't bypass bash"]
B -->|"ALLOW
(Bash slip-through)"| D["File modified"]
D --> E["Layer 2: Audit Trail
PostToolUse hook"]
E --> F["Logged with phase context
Alert shown to user"]
F --> G["/cwtf reads trail
reports deviations"]
P["Layer 3: Path-scoped rules
(higher-adherence advisory)"] -.->|"Loaded when a
scoped file is opened"| A
H["Layer 4: Skill Instructions
(prompt-level, advisory)"] -.->|"Subject to
context fade"| A
style B fill:#ff6b6b,color:#fff
style C fill:#ff6b6b,color:#fff
style E fill:#ffd43b,color:#000
style P fill:#ffa94d,color:#000
style H fill:#dee2e6,color:#000
```
**Fresh agents per phase** add resilience: each phase spawns a new agent at 0% context with fresh instructions via `context: fork`. A QA agent at 0% follows hostile-lens instructions perfectly. A single agent at 70% may have forgotten it was supposed to be hostile.
### The Compounding Effect
Escaped bugs become antipatterns. Antipatterns become spec rules. Spec rules become tests. Six months in, the workflow knows your project's failure modes better than any individual developer.
```mermaid
graph LR
A["Bug escapes"] --> B["/cpostmortem"]
B --> C["antipatterns.md
(class fix)"]
B --> D["CLAUDE.md
(learning)"]
C --> E["/cspec reads
antipatterns"]
D --> F["Every future
session loads"]
E --> G["Feature N+1
prevents
same bug class"]
F --> G
style B fill:#ff922b,color:#fff
style G fill:#51cf66,color:#fff
```
## Skills
Correctless provides 33 skills across the workflow. Each is a slash command you invoke directly; the tables below group them by purpose.
### Core Workflow
| Skill | When to Use | What It Does |
|-------|------------|--------------|
| [`/csetup`](docs/skills/csetup.md) | First run, or re-run for health check | 19-point health check, convention mining, project scaffolding |
| [`/cspec`](docs/skills/cspec.md) | Starting a new feature | Testable rules with research agent, intensity detection |
| [`/creview`](docs/skills/creview.md) | After /cspec | Skeptical review + OWASP security checklist |
| [`/ctdd`](docs/skills/ctdd.md) | After review approves spec | RED, test audit, GREEN, /simplify, QA, probes, mini-audit — all enforced |
| [`/cverify`](docs/skills/cverify.md) | After /ctdd completes | Spec-to-code verification, drift detection |
| [`/cdocs`](docs/skills/cdocs.md) | After /cverify | Update README, AGENT_CONTEXT, ARCHITECTURE, feature docs |
| [`/carchitect`](docs/skills/carchitect.md) | New project or missing architecture doc | Structured ARCHITECTURE.md — reverse-engineer or greenfield, entrypoints YAML |
| [`/cauto`](docs/skills/cauto.md) | After spec review approved | Orchestrate full pipeline: TDD, verify, docs, PR — with flexible phase resume |
| [`/crelease`](docs/skills/crelease.md) | Ready to tag a version | Version bump, changelog, sanity checks, annotated tag |
### Code Quality
| Skill | When to Use | What It Does |
|-------|------------|--------------|
| [`/cquick`](docs/skills/cquick.md) | Small, well-understood changes | TDD without the ceremony — scope-guarded at 50 LOC / 3 files |
| [`/crefactor`](docs/skills/crefactor.md) | Restructuring without behavior change | Characterization tests, behavioral equivalence, agent separation |
| [`/cdebug`](docs/skills/cdebug.md) | Stuck on a bug | Root cause, hypothesis, git bisect, TDD fix, class fix |
| [`/cpr-review`](docs/skills/cpr-review.md) | Reviewing an incoming PR | Architecture, security, tests, antipatterns, dep bumps |
### Open Source
| Skill | When to Use | What It Does |
|-------|------------|--------------|
| [`/ccontribute`](docs/skills/ccontribute.md) | Contributing to another project | Learn conventions first, match patterns, pre-flight, generate PR |
| [`/cmaintain`](docs/skills/cmaintain.md) | Reviewing a contribution | Scope check, conventions, maintenance burden, pre-written comments |
### Observability
| Skill | When to Use | What It Does |
|-------|------------|--------------|
| [`/cstatus`](docs/skills/cstatus.md) | Anytime | Current phase, next steps, problem detection |
| [`/chelp`](docs/skills/chelp.md) | Need a quick reference | Workflow pipeline, all commands |
| [`/csummary`](docs/skills/csummary.md) | After a feature or mid-feature | What the workflow caught, by phase |
| [`/cmetrics`](docs/skills/cmetrics.md) | Monthly or for ROI analysis | Token cost, bugs caught, session analytics, trends |
| [`/cwtf`](docs/skills/cwtf.md) | Suspect agents took shortcuts | Did agents actually follow instructions? |
| [`/cexplain`](docs/skills/cexplain.md) | Onboarding or exploring a codebase | Guided mermaid diagrams, prose walkthroughs, HTML export |
| [`/cdashboard`](docs/skills/cdashboard.md) | Visualize project health | HTML dashboard with metrics + artifact browser |
| [`/ctriage`](docs/skills/ctriage.md) | Deferred findings piling up | Wizard-style bulk triage of deferred review findings |
| [`/cprune`](docs/skills/cprune.md) | Periodic maintenance | Archive stale docs, clean orphaned artifacts, fix count drift |
| [`/cchores`](docs/skills/cchores.md) | Autonomous backlog grooming | Picks one suitable open issue, fixes it via /cdebug TDD, opens a single PR |
### Analysis
| Skill | When to Use | What It Does |
|-------|------------|--------------|
| [`/cpostmortem`](docs/skills/cpostmortem.md) | After a bug escapes | Trace which phase missed it, add antipattern + class fix |
| [`/cdevadv`](docs/skills/cdevadv.md) | Periodic deep analysis | Devil's advocate — challenge architecture and strategy |
| [`/cmodelupgrade`](docs/skills/cmodelupgrade.md) | After a model upgrade or `version_bumped` advisory | Per-feature regression report comparing pipeline metrics against the baseline for the current `{model}+HARNESS_VERSION` |
### Intensity-Gated
| Skill | Min Intensity | What It Does |
|-------|---------------|--------------|
| [`/caudit`](docs/skills/caudit.md) | high | Olympics convergence audit (QA / Hacker / Performance / UX presets) |
| [`/creview-spec`](docs/skills/creview-spec.md) | high | 6-agent adversarial spec review (+ optional cross-model codex review) |
| [`/cupdate-arch`](docs/skills/cupdate-arch.md) | high | Keep ARCHITECTURE.md current after features land |
| [`/cmodel`](docs/skills/cmodel.md) | critical | Alloy formal modeling for state machines and protocols |
| [`/credteam`](docs/skills/credteam.md) | critical | Live adversarial red team with source code access |
## Platform Integration
Correctless hooks into Claude Code's infrastructure for real-time feedback and long-term learning. All features below are **automatic** after `/csetup`.
### Hooks
```mermaid
graph TB
subgraph "Claude Code Hooks"
A["PreToolUse"] --> H["sensitive-file-guard.sh
Edit/Write tool-path guard"]
A --> B["workflow-gate.sh
Phase enforcement"]
C["PostToolUse"] --> D["audit-trail.sh
Adherence feedback"]
C --> G["auto-format.sh
Auto-formatting"]
E["Statusline"] --> F["statusline.sh
Phase + cost + context"]
end
H -->|"block/allow Edit/Write only"| M[".env, keys, credentials
(Bash writes unguarded)"]
B -->|"block/allow"| I["Every file edit"]
D -->|"alerts"| J["Real-time violations"]
G -->|"formats"| L["Edited files"]
F -->|"live display"| K["Always visible"]
style H fill:#e64980,color:#fff
style B fill:#ff6b6b,color:#fff
style D fill:#ffd43b,color:#000
style G fill:#74c0fc,color:#000
style F fill:#51cf66,color:#fff
```
| Hook | Runs | Purpose |
|------|------|---------|
| **sensitive-file-guard.sh** | Before every Edit/Write tool call (Bash is never inspected) | Edit/Write tool-path guardrail: blocks `Edit`/`Write`/`MultiEdit`/`NotebookEdit`/`CreateFile` writes to `.env`, credentials, keys, certificates. Catches accidental/naive Edit/Write tool calls; ALL Bash-mediated writes (redirects, writer commands, interpreters, git) are unguarded accepted non-goals (AP-040). The input-parse path fails closed; an unparsable `custom_patterns` config degrades to DEFAULTS-only matching (never fully open) |
| **workflow-gate.sh** | Before every file edit | Blocks writes that violate the current phase (RED blocks source, QA blocks everything) |
| **audit-trail.sh** | After every tool call | Logs modifications with phase context, alerts on violations |
| **token-tracking.sh** | After Agent tool completion | Logs subagent token usage, cost, and duration to JSONL for `/cmetrics` analysis |
| **auto-format.sh** | After Edit/Write/MultiEdit | Runs project formatter (Prettier, Black, gofmt, etc.) with allowlist validation |
| **statusline.sh** | Continuously | Shows phase, QA round, cost, context %, lines delta |
| **workflow-advance.sh** | On command | State machine — validates transitions, enforces gates |
### Statusline
The statusline shows your workflow state at a glance:
```
project/ feature/auth Opus 34% RED QA:R0 $0.42 +87/-12
```
Phase (color-coded), QA round count, session cost, feature cost (background-cached), lines delta, context usage with warnings at 70%.
### Real-Time Adherence Feedback
The audit trail hook monitors every modification and alerts immediately:
- `tdd-qa: Source file modified — middleware.ts (this phase should be read-only)`
- `GREEN: Test file edited — auth.test.ts (prohibited — TEST_BUG escalation expected)`
- `QA: Read middleware.ts (3 of 7 modified files reviewed)` (high+ intensity)
### Session Analytics
[`/cmetrics`](docs/skills/cmetrics.md) reads Claude Code's session data for exact token costs, outcome rates, and a **Correctless vs Freeform** comparison table.
### Compounding Learning
Postmortem findings, conventions, and audit learnings append to CLAUDE.md and load into every future session automatically. The spec agent just *knows* that "auth features in this project need middleware ordering checks" without being told.
### Git Integration (opt-in)
- **Git trailers** in commit messages: `Spec:`, `Rules-covered:`, `Verified-by:`
- **Git notes** attaching verification summaries to commits
- **Git bisect** in `/cdebug` for automated regression finding
### MCP Servers (opt-in)
`/csetup` offers to configure two MCP servers that improve analysis across all skills:
- **Serena** — symbol-level code queries (call graphs, references, symbol lookup). 15 skills use it for precise analysis with 40-60% token savings on larger projects.
- **Context7** — current library documentation on demand. The `/cspec` research agent gets real docs for real versions.
Both are free, open source, run locally, and fall back silently when unavailable.
### Output Redaction
External-facing skills ([`/cpr-review`](docs/skills/cpr-review.md), [`/ccontribute`](docs/skills/ccontribute.md), [`/cmaintain`](docs/skills/cmaintain.md)) automatically redact paths, credentials, hostnames, and session IDs before posting.
## Project Health Check
`/csetup` runs 19 checks across 6 categories on first run:
| Category | Checks |
|----------|--------|
| **Security** | Hardcoded secrets, API keys in source, .env committed, missing .gitignore patterns |
| **Code Quality** | Linter configured, formatter configured, dependency audit |
| **Testing** | Test runner detected, test files exist, coverage configured |
| **CI/CD** | CI pipeline exists, runs tests, runs linter |
| **Documentation** | README exists, ARCHITECTURE.md, CONTRIBUTING.md |
| **Git Hygiene** | .gitignore present, no large binaries, branch protection |
For existing projects, setup also mines your codebase for conventions (commit message style, test patterns, import ordering) and bootstraps architecture documentation.
## State Management
Check your workflow with [`/cstatus`](docs/skills/cstatus.md) or the statusline. For advanced debugging:
```bash
.correctless/hooks/workflow-advance.sh diagnose "file" # Why a file is blocked
.correctless/hooks/workflow-advance.sh override "why" # Temporary gate bypass (10 tool calls)
.correctless/hooks/workflow-advance.sh spec-update "why" # Spec was wrong mid-TDD
.correctless/hooks/workflow-advance.sh reset # Remove all state for current branch
```
### Quick Fixes During an Active Workflow
If the gate is blocking a typo fix:
```bash
.correctless/hooks/workflow-advance.sh override "quick bugfix: fixing typo in error message"
```
Bypasses the gate for 10 tool calls. When no workflow is active, the gate allows all edits freely.
## Language Support
| Language | Test Runner | Mutation Tool | PBT Library |
|----------|-------------|---------------|-------------|
| Go | `go test` | go-mutesting | rapid |
| TypeScript | jest/vitest | Stryker | fast-check |
| Python | pytest | mutmut | hypothesis |
| Rust | cargo test | cargo-mutants | proptest |
Mutation testing and PBT helpers are available at high+ intensity. Standard intensity works with any language that has a test runner.
## Requirements
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) CLI
- A **Claude Max subscription** ($100/mo or $200/mo plan). Correctless spawns multiple agents per feature — $200/mo is recommended at high+ intensity.
- A project with a test runner
- **jq** (JSON processor) — required for all hooks. Install: `brew install jq` (macOS), `apt install jq` (Ubuntu)
- **Bash 4+** — required for hooks. macOS ships Bash 3.2 by default; install modern bash: `brew install bash`
Optional (high+/critical):
- [Alloy Analyzer](https://alloytools.org/) for formal modeling
- Mutation testing tool for your language
- Isolated environment (Docker/VPS) for red team assessments
## Good to Know
**Opt-in per feature.** Correctless is passive when no workflow is active. Start a workflow with `/cspec` on a feature branch — skip it on branches where you don't need it. All normal Claude Code behavior is unchanged outside active workflows.
**CI unchanged.** Correctless runs entirely inside Claude Code sessions via local hooks. It does not modify your CI pipeline, add CI steps, or require any CI changes.
**After merge.** Delete the feature branch and start fresh with a new branch + `/cspec`. Workflow state files in `.correctless/artifacts/` are branch-scoped and harmless — they provide history for `/cmetrics` and `/csummary`.
### Uninstall
To fully remove Correctless from a project, follow these steps **in order** (the order matters — removing `.correctless/` before cleaning settings.json will lock you out of Claude Code because the fail-closed hooks point to scripts inside `.correctless/`):
```bash
# 1. Remove the plugin registration
/plugin uninstall correctless
# 2. Clean .claude/settings.json FIRST — remove all hook entries that reference
# .correctless/hooks/ or hooks/ (workflow-gate, sensitive-file-guard,
# audit-trail, auto-format, statusline, token-tracking, workflow-advance).
# Edit manually with a text editor, or delete .claude/settings.json entirely
# if you have no other Claude Code hooks.
# 3. Remove the "## Correctless" and "## Correctless Learnings" sections from CLAUDE.md
# 4. Remove Correctless-added entries from .mcp.json (serena, context7) if present
# — keep the file if you have other MCP servers configured
# 5. Remove .serena.yml if it exists (created by /csetup Serena integration)
# 6. Remove Correctless .gitignore entries (.correctless/artifacts/*, .serena/, etc.)
# 7. NOW remove the project files
rm -rf .correctless/
```
## Glossary
| Term | Meaning |
|------|---------|
| **Agent separation** | Each workflow phase runs in a fresh Claude session. The test writer doesn't know the implementation plan; the QA agent didn't write the tests. Prevents confirmation bias. |
| **Instance fix** | Fix the one bug here and now. |
| **Class fix** | Fix the entire category of this bug — add a structural test that prevents recurrence. |
| **Convergence** | Run multiple audit rounds until findings stabilize (no new critical/high issues). |
| **Drift** | Code that no longer matches documented architecture. Detected by `/cverify`, tracked in drift-debt.json. |
| **Antipattern** | A known bug class from your project's history. Stored in `.correctless/antipatterns.md`, checked by every future spec and review. |
| **Spec** | A document defining what "correct" means for a feature: testable rules, edge cases, security assumptions. A spec that can't be tested is incomplete. |
| **Invariant** | A rule that must always be true: "auth tokens expire after 24 hours." Specs are lists of invariants. |
| **Intensity** | The configured rigor level: standard, high, or critical. Higher intensity unlocks more skills but costs more tokens and time. |
| **Mini-audit** | After QA clears, six default adversarial specialist agents (cross-component interaction, hostile input, resource bounds, upgrade compatibility, UX review, integration depth) hunt for issues the QA lens misses — plus up to 2 recommended lenses from the review phase tailored to the feature's risk profile. Runs at the end of `/ctdd`. |
| **Mutation testing** | Introduce small bugs into code and check if tests catch them. If a test passes with a mutation, that test is weak. Available at high+ intensity via the adversarial probe round. |
| **STRIDE** | Threat modeling framework: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege. |
| **RED / GREEN** | TDD phases. RED = write tests that fail. GREEN = write code to make tests pass. |
## Status
**Correctless 3.0.0 — Early release.** 33 skills, 3 intensity levels, ~5,000 automated tests, 8 hooks. Real-world usage ongoing — [file issues as you find them](https://github.com/joshft/correctless/issues).
## License
MIT