{"id":46550709,"url":"https://github.com/brianruggieri/prompt-review","last_synced_at":"2026-03-07T03:13:23.131Z","repository":{"id":340898489,"uuid":"1166703003","full_name":"brianruggieri/prompt-review","owner":"brianruggieri","description":null,"archived":false,"fork":false,"pushed_at":"2026-02-27T02:10:01.000Z","size":325,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-27T09:40:18.362Z","etag":null,"topics":["ai-review","claude","code-quality","learning-system","prompt-engineering","self-improving"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brianruggieri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-25T14:12:15.000Z","updated_at":"2026-02-25T15:31:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/brianruggieri/prompt-review","commit_stats":null,"previous_names":["brianruggieri/prompt-review"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/brianruggieri/prompt-review","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianruggieri%2Fprompt-review","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianruggieri%2Fprompt-review/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianruggieri%2Fprompt-review/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianruggieri%2Fprompt-review/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brianruggieri","download_url":"https://codeload.github.com/brianruggieri/prompt-review/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianruggieri%2Fprompt-review/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30206356,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T19:07:06.838Z","status":"online","status_checked_at":"2026-03-07T02:00:06.765Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-review","claude","code-quality","learning-system","prompt-engineering","self-improving"],"created_at":"2026-03-07T03:13:22.686Z","updated_at":"2026-03-07T03:13:23.113Z","avatar_url":"https://github.com/brianruggieri.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# prompt-review\n\n**A self-improving multi-specialist prompt review system for Claude Code.**\n\nAutomatically review your prompts for clarity, security, testing completeness, and domain requirements using parallel AI specialists. The system learns from your decisions and continuously improves its feedback.\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Node Version](https://img.shields.io/badge/node-%3E%3D22-brightgreen)](https://nodejs.org/)\n[![Tests](https://img.shields.io/badge/tests-17%2F17%20passing-brightgreen)](./tests/)\n[![Claude SDK](https://img.shields.io/badge/Claude-SDK%20%3E%3D0.30.0-blueviolet)](https://github.com/anthropics/anthropic-sdk-python)\n\n---\n\n## What It Does\n\n`prompt-review` is a Claude Code plugin that reviews your prompts with a team of specialist LLM reviewers. Instead of a single generic review, you get structured feedback from security, testing, clarity, and domain experts—all working in parallel.\n\nThe system doesn't just review once. It tracks which suggestions you accept or reject, learns your preferences, and **adapts its review weights over time** to better match your needs. Optionally, it can engage reviewers in structured debates when they disagree, using an LLM judge to extract quality signals that improve future reviews.\n\n### Key Features\n\n- **Clarity Gate** — Pre-screening runs only the Clarity reviewer first, rejecting/warning on ambiguous prompts before full review (saves time \u0026 resources)\n- **Parallel Specialist Review** — Six specialist roles evaluate your prompt simultaneously (security, testing, clarity, domain expertise, UX, documentation)\n- **Adaptive Learning** — System learns from your accept/reject decisions and reweights reviewer importance accordingly\n- **Conflict Resolution** — Optional debate mode when reviewers disagree, with LLM judge extracting quality insights\n- **Zero Dependencies** — CommonJS, Node.js built-ins only, no framework overhead\n- **Comprehensive Testing** — 16 passing tests covering all features and edge cases\n- **Local Audit Trail** — Full review history stored locally with findings, stats, and outcomes\n\n---\n\n## How It Works\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│  YOUR PROMPT                                                │\n│  (submitted via hook or skill)                              │\n└────────────────┬────────────────────────────────────────────┘\n                 │\n        ┌────────▼────────┐\n        │  FAN-OUT        │\n        │  6 Specialists  │ (parallel)\n        └────────┬────────┘\n                 │\n     ┌───────────┼───────────┐\n     │           │           │\n  ┌──▼──┐   ┌───▼───┐   ┌──▼──┐\n  │Security       │Testing     │Clarity\n  └──┬──┘         └───┬───┘    └──┬──┘\n     │               │            │\n     │        ┌──────▼──────┐     │\n     │        │   DEBATE    │ (optional, if conflicts)\n     │        │  [Judge]    │     │\n     │        └──────┬──────┘     │\n     │               │            │\n     └───────────────┼────────────┘\n                     │\n            ┌────────▼────────┐\n            │  MERGE          │\n            │  Composite      │\n            │  Findings       │\n            └────────┬────────┘\n                     │\n        ┌────────────▼────────────┐\n        │  PRESENT DIFF           │\n        │  (User accepts/edits)   │\n        └────────────┬────────────┘\n                     │\n        ┌────────────▼────────────┐\n        │  AUDIT LOG              │\n        │  (findings + stats +    │\n        │   user decision)        │\n        └────────────┬────────────┘\n                     │\n        ┌────────────▼────────────────────┐\n        │  LEARNING SYSTEM (runs monthly) │\n        │  ├─ Phase 1: Analyze metrics   │\n        │  ├─ Phase 2: Adapt weights     │\n        │  └─ Phase 3: Improve prompts   │\n        └────────────────────────────────┘\n```\n\n### Clarity Gate (Optional Pre-Screening)\n\nBy default, the system uses a **two-stage review process**:\n\n1. **Stage 1: Clarity Gate** (fast, 5-10 seconds)\n   - Runs only the Clarity reviewer\n   - Checks for ambiguity, vagueness, undefined terms\n   - Blocks prompts with **blocker** severity\n   - Warns on **major** severity (user can choose to refine or proceed)\n   - Proceeds to full review on **minor/nit** severity\n\n2. **Stage 2: Full Review** (30-40 seconds if gate passes)\n   - All 6 specialist reviewers run in parallel\n   - Merge findings and present diff to user\n\n**Why the gate?** Saves time and resources by catching vague prompts early. A prompt like \"do stuff with stuff\" gets rejected at the gate with a clear message: \"Please specify what 'stuff' is and what you want to do.\" The user refines it, and the second attempt passes the gate and gets the full review.\n\n**Configuration:**\n```json\n{\n  \"clarity_gate\": {\n    \"enabled\": true,\n    \"strict_mode\": false,\n    \"reject_on\": [\"blocker\"],\n    \"warn_on\": [\"major\"],\n    \"auto_refine\": true,\n    \"show_reasoning\": true\n  }\n}\n```\n\nTo disable the gate, set `enabled: false` in `config.json`.\n\n### The Three-Phase Learning System\n\nThe tool continuously improves through three integrated learning phases:\n\n#### Phase 1: Audit Logging\nRecords detailed findings from each review with acceptance/rejection tracking. This creates the data foundation for all learning.\n\n```json\n{\n  \"timestamp\": \"2026-02-25T09:15:30Z\",\n  \"reviewers_active\": [\"security\", \"testing\", \"clarity\"],\n  \"findings_detail\": [\n    {\n      \"reviewer_role\": \"security\",\n      \"severity\": \"blocker\",\n      \"issue\": \"Prompt allows reading .env file\",\n      \"op\": \"AddGuardrail\"\n    }\n  ],\n  \"suggestions_accepted\": [\"SEC-001\"],\n  \"suggestions_rejected\": [\"CLR-003\"],\n  \"outcome\": \"approved\",\n  \"composite_score\": 7.8\n}\n```\n\n#### Phase 2: GEA Reflection (Adaptive Weights)\nAnalyzes your acceptance patterns and automatically reweights reviewers based on their precision:\n\n```bash\n$ node adapt.cjs 30\n\nReviewer Effectiveness (last 30 days)\n  security      precision 0.91  (10/11 accepted)  ← trusted\n  testing       precision 0.25  (2/8 accepted)   ← below threshold\n  clarity       precision 0.44  (4/9 accepted)   ← below threshold\n  domain_sme    precision 0.83  (5/6 accepted)   ← trusted\n\nWeight Suggestions:\n  security:      1.0 → 1.77 (↑ increase)\n  testing:       1.0 → 0.53 (↓ decrease)\n  clarity:       1.0 → 0.70 (↓ decrease)\n  domain_sme:    1.0 → 1.58 (↑ increase)\n\nApply changes? [y/n]\n```\n\nNext reviews automatically use improved weights, resulting in better recommendations.\n\n#### Phase 3: CoMAS Debate (Policy Learning)\nWhen reviewers disagree, a structured debate extracts quality signals that improve reviewer prompts:\n\n```\nDebate: Security vs Testing on environment variable handling\n\nSecurity's argument:  \"Env vars are attack vectors. Every access should\n                       be validated against a whitelist.\"\n\nTesting's counter:    \"Exact? But developers need flexible access\n                       during testing phases.\"\n\nJudge's feedback (to Security):\n  Quality: 8.2/10\n  Signals: \"Your precision is excellent. Maintain current calibration.\"\n\nJudge's feedback (to Testing):\n  Quality: 4.1/10\n  Signals: \"Add specific test examples to arguments. Currently too vague.\"\n```\n\nPolicy proposals are written to `reviewers/prompts/\u003crole\u003e.txt` for human review before adoption. No automatic changes to code.\n\n---\n\n## Scoring System \u0026 Validation\n\n### Understanding Composite Scores\n\nEach review produces a **composite score** (0–10) combining individual specialist scores:\n\n```\ncomposite = Σ(score_i × weight_i) / Σ(weight_i)\n```\n\n**Example:** Security scores 8.0 (weight 1.2), Clarity scores 6.5 (weight 1.0):\n```\ncomposite = (8.0 × 1.2 + 6.5 × 1.0) / (1.2 + 1.0) = 7.32 / 10\n```\n\n### Score Interpretation\n\n| Range | Meaning | Action |\n|-------|---------|--------|\n| **0–3** | Poor | Prompt is entirely vague or ambiguous |\n| **4–6** | Needs Work | Significant issues affecting output quality |\n| **7–9** | Good | Minor improvements possible |\n| **10** | Excellent | Precise verbs, clear scope, output specified |\n\n### What Affects Scores?\n\nEach specialist rates on different criteria:\n\n- **Security** (weight 2.0): Is authentication, encryption, input validation addressed?\n- **Testing** (weight 1.5): Are acceptance criteria and test cases defined?\n- **Domain SME** (weight 1.5): Does the prompt match domain best practices?\n- **Documentation** (weight 1.0): Are output formats and usage documented?\n- **UX** (weight 1.0, conditional): If UI/component involved—accessibility, responsiveness?\n- **Clarity** (weight 1.0): Is scope defined? Are outputs specified? Are terms unambiguous?\n\nHigher-weight reviewers (security, testing) have more influence on the composite.\n\n### Validating Score Accuracy\n\n**Run a calibration check:**\n\n```bash\nnode adapt.cjs 30 --benchmark\n```\n\nThis compares your current adapted weights against equal weighting (all 1.0). Shows which specialist roles have highest real impact on acceptance rate.\n\n**Check post-adaptation impact:**\n\n```bash\nnode adapt.cjs --history\n```\n\nShows before/after precision for each weight change. Helps verify whether adapted weights actually improved feedback quality.\n\n### How Precision Is Measured\n\n**Precision** = `accepted findings / proposed findings`\n\nExample: If Security found 5 issues and user accepted 3:\n```\nprecision = 3 / 5 = 0.60 (60%)\n```\n\n**Important limitations:**\n- Does NOT measure recall (how many issues Security missed)\n- Does NOT measure finding importance (all findings weighted equally)\n- May be influenced by user accepting findings for reasons other than correctness\n\n**For better precision signals, look at:**\n\n1. **Acceptance rate by severity**\n   - Major findings: What % of \"blocker\" findings did users actually fix?\n   - Minor findings: What % of \"nit\" suggestions were helpful?\n\n2. **Coverage ratio** (recall proxy)\n   - In what % of reviews did this reviewer find at least one accepted issue?\n   - Low coverage + high precision = \"plays it safe\" (risky pattern)\n\n3. **Rejection reason** (when tracked)\n   - \"invalid\" — Finding was wrong (true precision miss)\n   - \"deferred\" — Valid but out of scope (not precision miss)\n   - \"conflict\" — Conflicted with another finding (not precision miss)\n\n### Fairness \u0026 Bias Detection\n\nIf one specialist dominates (\u003e40% of composite score), you'll see a warning:\n\n```\n⚠ Fairness: Security dominates composite (\u003e40%)\n```\n\nThis means one reviewer's opinion outweighs the others. To rebalance:\n\n```bash\n# Check contribution share\nnode index.cjs --stats\n\n# Then adjust weights in config.json:\n\"weights\": {\n  \"security\": 1.2,    # was 2.0, reduce\n  \"clarity\": 1.2      # was 1.0, increase\n}\n```\n\n### Running Evaluations\n\n**Check system health over last 30 days:**\n\n```bash\nnode adapt.cjs 30\n```\n\nShows:\n- Reviews analyzed\n- Precision per role\n- Weight suggestions based on performance\n\n**See historical trends:**\n\n```bash\nnode index.cjs --stats\n```\n\nShows:\n- Score trend by week\n- Most common issues\n- Reviewer effectiveness\n- Acceptance rates by severity\n\n---\n\n## Quick Start\n\n### 1. Installation (1 minute)\n\n```bash\n# Install the plugin\ncp -r ~/.claude/plugins/prompt-review ~/git/prompt-review\ncd ~/git/prompt-review\n\n# Activate Node.js\nsource ~/.nvm/nvm.sh \u0026\u0026 nvm use\n\n# Verify it works\nnpm test\n# Output: 12 passed, 0 failed, 12 total\n```\n\n### 2. First Review (30 seconds)\n\nUse the `/prompt-review:review` skill in Claude Code:\n\n```\n/prompt-review:review\n```\n\nSubmit a prompt with `!!!` at the end:\n\n```\nWrite a function that validates email addresses. It should:\n- Check for proper format (RFC 5322)\n- Verify domain exists via DNS\n- Return detailed error messages\n!!!\n```\n\nYou'll see:\n\n```\n📋 Prompt Review Results\n\nSecurity (8/10):\n  🔴 BLOCKER: No rate limiting on DNS checks\n    Add a cache or request throttle to prevent DoS\n\nTesting (4/10):\n  🟡 IMPORTANT: No mention of test cases\n    Suggest testing: invalid formats, timeout handling, unicode domains\n\nClarity (6/10):\n  🟡 IMPORTANT: \"Verify domain exists\" is vague\n    Clarify: synchronous vs async? cached? timeout threshold?\n\nDomain SME (9/10):\n  ✓ Good requirements\n\nComposite Score: 6.75/10\n\nAccept changes? [Accept] [Edit] [Reject]\n```\n\n### 3. Learn From Your Decisions\n\nOver the next month, accept 5+ reviews and the system will adapt:\n\n```bash\n# See effectiveness metrics\nnode index.cjs --stats\n\n# Preview weight changes (dry run)\nnode adapt.cjs 30\n\n# Apply improvements\nnode adapt.cjs 30 --apply\n```\n\nYour next reviews will use improved weights, better matching your preferences.\n\n---\n\n## Understanding Triggers: Hook vs Skill vs API\n\nThe `!!!` trigger has three different behaviors depending on how you invoke it. Understanding these modes is key to using the tool effectively.\n\n### Hook Mode (Automatic)\n\nWhen you type a prompt in Claude Code with `!!!` at the end:\n\n```\nWrite a function that validates email addresses !!!\n```\n\n**What happens:**\n1. The hook automatically triggers on any message (UserPromptSubmit)\n2. Claude receives instructions to run `/prompt-review:review` skill\n3. You see a notification to proceed with review\n4. You must explicitly invoke the skill to start\n\n**Key points:**\n- ✓ Automatic trigger (you don't need to do anything)\n- ✓ Works in all modes (no API key needed)\n- ✓ Safe and predictable (never executes code without your approval)\n- ✗ Requires manual skill invocation (not truly automatic execution)\n- ✗ Cannot run full async pipeline (uses subscription mode)\n\n### Skill Mode (Manual)\n\nWhen you explicitly invoke the skill with a prompt:\n\n```\n/prompt-review:review \"Write a function that validates email addresses\"\n```\n\n**What happens:**\n1. You explicitly request the review\n2. If `ANTHROPIC_API_KEY` is set and `config.mode='api'`, reviewers run in parallel and complete inline\n3. If no API key or `config.mode='subscription'`, you see instructions to use the skill\n4. Full debate mode can run if conflicts are detected (Phase 3)\n\n**Key points:**\n- ✓ Explicit, clear intent\n- ✓ Can use API mode if configured\n- ✓ Full async pipeline available\n- ✗ Manual invocation (you must remember to use it)\n\n### API Mode (Full Async)\n\nWhen you have `ANTHROPIC_API_KEY` set and `config.mode='api'`:\n\n```bash\nexport ANTHROPIC_API_KEY=\"sk-...\"\nnode index.cjs \"Write a function that validates email addresses\"\n```\n\n**What happens:**\n1. All 6 reviewers run in parallel\n2. Results are merged with conflict detection\n3. Optional: Debate phase runs if conflicts exist\n4. Full output (refined prompt + findings) is returned\n5. Review is logged to audit trail\n\n**Key points:**\n- ✓ True async execution (all reviewers run in parallel)\n- ✓ Full debate mode support\n- ✓ Direct CLI/programmatic access\n- ✗ Requires API key\n- ✗ Not available via hook\n\n### Mode Comparison Table\n\n| Trigger | Mode | API Key | Execution | Debate | Best For |\n|---------|------|---------|-----------|--------|----------|\n| Hook | Subscription | Optional | Async (via Claude) | Yes* | Quick feedback while coding |\n| Skill | Subscription | Optional | Async (via Claude) | Yes* | Explicit reviews without API key |\n| Skill/CLI | API | Required | Sync (inline) | Yes | Full async pipeline in CI/automation |\n\n*Debate mode available if conflicts detected; proposals logged for review\n\n### Configuration\n\nTo control which mode is used, edit `~/.claude/plugins/prompt-review/config.json`:\n\n```json\n{\n  \"mode\": \"subscription\",  // or \"api\" for full async\n  \"api_fallback\": true,    // Fall back to subscription if no API key\n  \"reviewers\": {\n    \"security\": { \"enabled\": true, \"conditional\": false },\n    ...\n  }\n}\n```\n\n**`mode` options:**\n- `\"subscription\"` (default) — Use Claude Code's skill system for reviews\n- `\"api\"` — Use direct async pipeline (requires ANTHROPIC_API_KEY)\n\n---\n\n## Installation\n\n### Option 1: For Users (Recommended)\n\nInstall from Claude Code's plugin marketplace (when available).\n\n### Option 2: For Local Development\n\n```bash\n# Clone the repository\ngit clone https://github.com/yourusername/prompt-review.git\ncd prompt-review\n\n# Install Node.js dependencies (optional, none required but SDK recommended)\nnpm install @anthropic-ai/sdk\n\n# Activate the plugin in Claude Code\n# In ~/.claude/plugins/prompt-review, create:\nln -s /path/to/prompt-review ~/.claude/plugins/prompt-review\n\n# Verify installation\nsource ~/.nvm/nvm.sh \u0026\u0026 nvm use\nnpm test\n```\n\n### Option 3: Manual Integration\n\nCopy the plugin directory to `~/.claude/plugins/prompt-review`:\n\n```bash\ncp -r prompt-review ~/.claude/plugins/prompt-review\n```\n\nThe plugin auto-loads in Claude Code on next restart.\n\n---\n\n## Usage\n\n### Basic Review (Hook)\n\nSubmit any prompt with `!!!` at the end:\n\n```\nI need a function that parses CSV files with:\n- Comma and tab delimiters\n- Header row detection\n- Type inference for columns\n- Memory efficiency for large files\n!!!\n```\n\nThe review runs automatically.\n\n### Skills\n\n```bash\n# Review current prompt\n/prompt-review:review\n\n# Show statistics and effectiveness dashboard\n/prompt-review:stats\n\n# Preview weight changes (Phase 2)\n/prompt-review:adapt [days=30]\n\n# Apply weight changes\n/prompt-review:adapt 30 --apply\n```\n\n### CLI Commands\n\n```bash\n# Run tests\nnpm test\n\n# Show statistics dashboard\nnode index.cjs --stats\n\n# Preview adaptation (dry run)\nnode adapt.cjs\nnode adapt.cjs 60        # Use last 60 days\nnode adapt.cjs 30        # Use last 30 days\n\n# Apply weight adaptation\nnode adapt.cjs --apply\nnode adapt.cjs 30 --apply\n\n# Check review history\nls logs/\ncat logs/20260225.jsonl  # See reviews from Feb 25\n```\n\n---\n\n## Configuration\n\nEdit `config.json` to customize reviewer behavior:\n\n```json\n{\n  \"scoring\": {\n    \"weights\": {\n      \"security\": 1.0,\n      \"testing\": 1.0,\n      \"clarity\": 1.0,\n      \"domain_sme\": 1.0,\n      \"frontend_ux\": 0.8,\n      \"documentation\": 0.6\n    },\n    \"weights_history\": []\n  },\n  \"reflection\": {\n    \"min_reviews_for_adaptation\": 5,\n    \"precision_threshold\": 0.70,\n    \"weight_clamp_min\": 0.5,\n    \"weight_clamp_max\": 3.0,\n    \"auto_adapt\": false,\n    \"auto_adapt_interval_days\": 30\n  },\n  \"debate\": {\n    \"enabled\": false,\n    \"max_pairs\": 2,\n    \"model\": \"claude-haiku-4-5\",\n    \"judge_model\": \"claude-sonnet-4-6\",\n    \"min_conflicts_to_trigger\": 1\n  }\n}\n```\n\n### Key Settings\n\n| Setting | Default | Purpose |\n|---------|---------|---------|\n| `weights.*` | 0.6-1.0 | Importance of each reviewer (1.0 = baseline) |\n| `min_reviews_for_adaptation` | 5 | Minimum reviews before weights adapt |\n| `precision_threshold` | 0.70 | Flag reviewers below this acceptance % |\n| `debate.enabled` | false | Enable optional debate and policy learning |\n| `auto_adapt` | false | Automatically apply weight changes monthly |\n\n---\n\n## Real-World Examples\n\n### Example 1: Security Issue Detection\n\n**Prompt:**\n```\nFunction to authenticate users. Accepts username/password,\ncompares to stored hash, returns auth token.\n```\n\n**Security Reviewer (8/10):**\n```\n🔴 BLOCKER: No constant-time comparison\n  Use crypto.timingSafeEqual to prevent timing attacks\n\n🔴 BLOCKER: Token generation not shown\n  Specify: random? signed? expiration?\n\n🟡 IMPORTANT: No rate limiting mentioned\n  Add: max attempts, backoff, IP blocking\n```\n\n**User Result:** Accepts all 3 findings → Security marked as high-precision (0.89) → Weight increases to 1.65 over time\n\n### Example 2: Learning From Your Preferences\n\n**Month 1:** 8 reviews, security precision 0.75, testing precision 0.40\n- System learns: you trust security more than testing\n- Weights: security 1.2, testing 0.6\n\n**Month 2:** 12 reviews with updated weights\n- Reviews now emphasize security findings more\n- You approve 85% of security findings\n- Testing findings marked less important (less friction)\n- Result: Better aligned with your preferences\n\n### Example 3: Debate Resolution\n\n**Conflict:** Security wants strict environment variable validation. Testing wants flexibility for mocking.\n\n**Judge Output:**\n- Security argument quality: 8.2/10 (specific, evidence-backed)\n- Testing argument quality: 4.1/10 (vague, missing examples)\n- Policy signal: \"Testing role should add concrete examples to arguments\"\n\n**Outcome:** Proposal generated to improve testing prompt. Once adopted, testing arguments become more specific.\n\n---\n\n## Architecture\n\nSee [`ARCHITECTURE.md`](./.claude/ARCHITECTURE.md) for detailed system design including:\n\n- Single-run flow (complete pipeline)\n- Multi-run learning loop (how learning accumulates)\n- Knowledge flow (data → patterns → metrics → adaptation)\n- Three-phase system explanation with diagrams\n- Detailed file structure and responsibilities\n\n### System Layers\n\n```\nUSER INTERFACE (hook + skills)\n    ↓\nPIPELINE (orchestrator, editor, renderer)\n    ↓\nLEARNING SYSTEM (audit logging → reflection → debate)\n    ↓\nCONFIGURATION (weights, reflection settings, debate config)\n    ↓\nAUDIT LOGS (local, gitignored, never shared)\n```\n\n### Core Modules\n\n| Module | Purpose | Key Functions |\n|--------|---------|---|\n| `index.cjs` | Main entry point | handleHook, handleSkill, runFullPipeline |\n| `orchestrator.cjs` | Fan-out logic | runReviewersApi (parallel specialist calls) |\n| `editor.cjs` | Merge \u0026 scoring | mergeCritiques, computeCompositeScore |\n| `cost.cjs` | Audit logging | writeAuditLog, updateAuditOutcome |\n| `stats.cjs` | Analytics | generateStats, renderDashboard |\n| `reflection.cjs` | Phase 2 | generateReflectionReport, computeWeightSuggestions |\n| `adapt.cjs` | Phase 2 CLI | previewAdaptation, applyAdaptation |\n| `debate.cjs` | Phase 3 | selectDebatePairs, runDebatePhase |\n| `judge.cjs` | Phase 3 | runJudge, buildJudgePrompt |\n| `policy.cjs` | Phase 3 | generatePromptProposal, computePolicyInsights |\n\n---\n\n## Performance \u0026 Costs\n\n### Token Usage Per Review\n\nTypical review with 4 reviewers:\n\n```\nFAN-OUT:        ~1,200 input tokens  + ~450 output tokens\nMERGE:          ~  300 input tokens  + ~150 output tokens\nJUDGE (optional): ~1,000 input tokens  + ~400 output tokens\n─────────────────────────────────────────────────────────\nTOTAL:          ~2,500 input tokens  + ~1,000 output tokens ≈ $0.05 per review\n```\n\nAudit logs track exact costs per review.\n\n### Scalability\n\n- Designed for 1-2 reviews per day per developer\n- Audit logs accumulate ~30 reviews/month\n- Reflection analysis takes \u003c1 second\n- Debate rounds run in parallel (2-4 seconds per pair)\n- No external dependencies, runs locally\n\n---\n\n## Troubleshooting\n\n### \"No reviewers ran, all failed\"\n\n**Cause:** API key not configured or invalid\n**Solution:** Set `ANTHROPIC_API_KEY` environment variable:\n```bash\nexport ANTHROPIC_API_KEY=\"sk-...\"\n```\n\n### \"Cannot read findings_detail\"\n\n**Cause:** Audit logs from Phase 1 implementation\n**Solution:** Ensure you're running the latest version:\n```bash\ncd ~/.claude/plugins/prompt-review\ngit pull origin main\nnpm test\n```\n\n### \"Insufficient data\" for adaptation\n\n**Cause:** Fewer than 5 reviews with outcomes\n**Solution:** Complete 5+ reviews and record your accept/reject decisions:\n```bash\nnode adapt.cjs 30\n# Will show: sufficient_data: false\n# After 5 reviews with outcomes → sufficient_data: true\n```\n\n### \"Weight changes don't affect reviews\"\n\n**Cause:** Changes applied but not yet in use\n**Solution:** Weights take effect on next review cycle. Run:\n```bash\nnode adapt.cjs 30 --apply\n# Now run your next review to see updated weights in use\n```\n\n### \"Debate proposals aren't being used\"\n\n**Cause:** `debate.enabled: false` (default)\n**Solution:** Enable debate mode in `config.json`:\n```json\n{\n  \"debate\": {\n    \"enabled\": true\n  }\n}\n```\n\nThen review proposals in `reviewers/prompts/\u003crole\u003e.txt` before adopting.\n\n---\n\n## FAQ\n\n**Q: Will this slow down my prompting workflow?**\nA: No. Reviews are optional (triggered by `!!!` or skill). Without explicitly requesting a review, the plugin doesn't run.\n\n**Q: Can it access my prompts outside review time?**\nA: No. The plugin only runs when you explicitly trigger it. Audit logs are stored locally in `~/.claude/plugins/prompt-review/logs/` (gitignored).\n\n**Q: What if I don't like the feedback?**\nA: Reject it. Your rejections are tracked and influence future reviewer weights. The system learns your preferences.\n\n**Q: Can I customize reviewer prompts?**\nA: Yes. Edit files in `reviewers/` (security.cjs, testing.cjs, etc.). Changes take effect immediately.\n\n**Q: What if reviewers keep disagreeing?**\nA: Enable debate mode. The judge extracts quality signals that improve reviewer prompts over time.\n\n**Q: Does this work offline?**\nA: The review phase requires Claude API (online). Adaptation (Phase 2) works fully offline using local audit logs. Debate requires API.\n\n**Q: Can I reset my weights?**\nA: Yes. Edit `config.json` and set all weights to 1.0, or delete the `weights_history` array.\n\n**Q: How often should I run adaptation?**\nA: Monthly is recommended. Run `node adapt.cjs 30` after 20-30 reviews for stable data.\n\n---\n\n## Contributing\n\nContributions are welcome! Areas for enhancement:\n\n- [ ] Additional specialist reviewers (accessibility, performance, etc.)\n- [ ] Web UI for reviewing and managing suggestions\n- [ ] Integration with version control for automatic reviews on PR\n- [ ] Feedback export (JSON, CSV) for analysis\n- [ ] Custom reviewer prompt templates\n- [ ] Comparative analysis (this prompt vs similar prompts)\n\n### Development\n\n```bash\n# Clone and set up\ngit clone https://github.com/yourusername/prompt-review.git\ncd prompt-review\nsource ~/.nvm/nvm.sh \u0026\u0026 nvm use\nnpm test\n\n# Create a branch\ngit checkout -b feat/your-feature\n\n# Make changes\n# Test: npm test (must pass all 12 tests)\n\n# Commit and push\ngit commit -m \"Add: your feature description\"\ngit push origin feat/your-feature\n\n# Open a PR\n```\n\n### Code Standards\n\n- **No external dependencies** — Node.js built-ins + optional Anthropic SDK only\n- **CommonJS** — require/module.exports, no import/export\n- **Testing** — New features require tests using assert only\n- **Tabs** — 2-width indentation\n- **Comments** — Only where logic isn't self-evident\n\n---\n\n## License\n\nMIT License — See [LICENSE](./LICENSE) file for details.\n\n---\n\n## Resources\n\n- **Architecture Deep-Dive:** [.claude/ARCHITECTURE.md](./.claude/ARCHITECTURE.md)\n- **Phase 1 Spec:** [.claude/phase-1-audit-logging.md](./.claude/phase-1-audit-logging.md)\n- **Phase 2 Spec:** [.claude/phase-2-gea-reflection.md](./.claude/phase-2-gea-reflection.md)\n- **Phase 3 Spec:** [.claude/phase-3-comas-debate.md](./.claude/phase-3-comas-debate.md)\n- **Test Suite:** [tests/run.cjs](./tests/run.cjs) (12 passing tests)\n\n---\n\n## Support\n\nFor issues, questions, or suggestions:\n\n1. Check [Troubleshooting](#troubleshooting) section above\n2. Review [ARCHITECTURE.md](./.claude/ARCHITECTURE.md) for detailed system explanation\n3. Open an [Issue](https://github.com/yourusername/prompt-review/issues) with:\n   - Error message or unexpected behavior\n   - Steps to reproduce\n   - Your `config.json` (with API key removed)\n   - Output of `npm test`\n\n---\n\n**Built with ❤️ for prompt engineers and developers who want their prompts to be better.**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrianruggieri%2Fprompt-review","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrianruggieri%2Fprompt-review","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrianruggieri%2Fprompt-review/lists"}