{"id":45241142,"url":"https://github.com/kevinrabun/judges","last_synced_at":"2026-04-05T14:02:01.443Z","repository":{"id":339449426,"uuid":"1161966307","full_name":"KevinRabun/judges","owner":"KevinRabun","description":"MCP server with specialized judges to evaluate AI-generated code for security, cost, scalability, cloud readiness, and best practices.","archived":false,"fork":false,"pushed_at":"2026-04-01T01:47:44.000Z","size":12661,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-04-01T22:15:50.932Z","etag":null,"topics":["ai-agent","ai-code-review","ai-generated-code","code-quality","code-review","judge","mcp","mcp-server","security","software-architecture","static-analysis","typescript"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KevinRabun.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-19T18:11:43.000Z","updated_at":"2026-04-01T01:47:32.000Z","dependencies_parsed_at":"2026-02-25T01:00:30.399Z","dependency_job_id":null,"html_url":"https://github.com/KevinRabun/judges","commit_stats":null,"previous_names":["kevinrabun/judges"],"tags_count":246,"template":false,"template_full_name":null,"purl":"pkg:github/KevinRabun/judges","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KevinRabun%2Fjudges","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KevinRabun%2Fjudges/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KevinRabun%2Fjudges/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KevinRabun%2Fjudges/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KevinRabun","download_url":"https://codeload.github.com/KevinRabun/judges/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KevinRabun%2Fjudges/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31437927,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T13:13:19.330Z","status":"ssl_error","status_checked_at":"2026-04-05T13:13:17.778Z","response_time":75,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agent","ai-code-review","ai-generated-code","code-quality","code-review","judge","mcp","mcp-server","security","software-architecture","static-analysis","typescript"],"created_at":"2026-02-20T21:27:47.764Z","updated_at":"2026-04-05T14:02:01.407Z","avatar_url":"https://github.com/KevinRabun.png","language":"TypeScript","funding_links":[],"categories":["カテゴリ"],"sub_categories":["💻 \u003ca name=\"code--ide\"\u003e\u003c/a\u003eコード・IDE"],"readme":"# Judges Panel\n\nAn MCP (Model Context Protocol) server that provides a panel of **45 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Combines **deterministic pattern matching \u0026 AST analysis** (instant, offline, zero LLM calls) with **LLM-powered deep-review prompts** that let your AI assistant perform expert-persona analysis across all 45 domains.\n\n**Highlights:**\n\n- Includes an **App Builder Workflow (3-step)** demo for release decisions, plain-language risk summaries, and prioritized fixes — see [Try the Demo](#2-try-the-demo).\n- Includes **V2 context-aware evaluation** with policy profiles, evidence calibration, specialty feedback, confidence scoring, and uncertainty reporting.\n- Includes **public repository URL reporting** to clone a repo, run the full tribunal, and output a consolidated markdown report.\n- **200+ deterministic auto-fix patches** (see `src/patches/index.ts`) plus LLM-powered deep review.\n\n\u003e 🧪 Many commands in `printHelp` are experimental/roadmap. By default, we show GA commands only. Set `JUDGES_SHOW_EXPERIMENTAL=1` to reveal stubs; these may not be wired yet.\n\n[![CI](https://github.com/KevinRabun/judges/actions/workflows/ci.yml/badge.svg)](https://github.com/KevinRabun/judges/actions/workflows/ci.yml)\n[![npm](https://img.shields.io/npm/v/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)\n[![npm downloads](https://img.shields.io/npm/dw/@kevinrabun/judges)](https://www.npmjs.com/package/@kevinrabun/judges)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Tests](https://img.shields.io/badge/tests-2482-brightgreen)](https://github.com/KevinRabun/judges/actions)\n\n\u003e 🔰 **Packages**\n\u003e - **CLI**: `@kevinrabun/judges-cli` → binary `judges` (use `npx @kevinrabun/judges-cli eval --file app.ts`).\n\u003e - **MCP/API**: `@kevinrabun/judges` → programmatic API + MCP server (`npm install @kevinrabun/judges`).\n\u003e - **VS Code extension**: see [`vscode-extension/`](vscode-extension/README.md).\n\u003e - **GitHub Action**: `uses: KevinRabun/judges@main` (see [CI quickstart](#github-action)).\n\n---\n\n## Quickstart\n\n### CLI (one-off)\n```bash\n# Using the CLI package (recommended)\nnpx @kevinrabun/judges-cli eval --file src/app.ts\n\n# Show GA commands only (default)\nnpx @kevinrabun/judges-cli --help\n\n# Show experimental/roadmap commands\necho \"JUDGES_SHOW_EXPERIMENTAL=1\" \u003e\u003e $GITHUB_ENV\nnpx @kevinrabun/judges-cli --help\n\n# License scan (supply-chain \u0026 license compliance)\nnpx @kevinrabun/judges-cli license-scan --dir .\n```\n\n\u003e **CLI vs API:** If you want to embed Judges in your app (MCP/API), install `@kevinrabun/judges`. For the command-line, use `@kevinrabun/judges-cli` (binary `judges`).\n\n### GitHub Action\n```yaml\nname: Judges\non: [pull_request, push]\njobs:\n  judges:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: KevinRabun/judges@main\n        with:\n          path: .\n          diff-only: true           # evaluate only changed lines in PRs (default true)\n          fail-on-findings: true    # fail on critical/high findings\n          upload-sarif: true        # upload SARIF to GitHub Code Scanning\n```\n\n### Programmatic API (MCP server included)\n```bash\nnpm install @kevinrabun/judges\n```\n```ts\nimport { evaluateCode } from \"@kevinrabun/judges/api\";\nconst verdict = evaluateCode(\"const password = 'ProdSecret';\", \"typescript\");\nconsole.log(verdict.overallVerdict, verdict.overallScore);\n```\n\n### MCP server\n\nThe MCP server runs on stdio and is started by your MCP client (VS Code, Claude Desktop, etc.).\nConfigure it in your MCP settings (e.g. `mcp.json`):\n\n```json\n{\n  \"servers\": {\n    \"judges\": {\n      \"type\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@kevinrabun/judges\"]\n    }\n  }\n}\n```\n\nOr run the server directly:\n```bash\nnpx @kevinrabun/judges\n# Starts the MCP server on stdio\n```\n\n\u003e Config file: `.judgesrc.json` (supports `${ENV_VAR}` substitution via `expandEnvPlaceholders`). See [Configuration](#configuration).\n\n---\n\n## Why Judges?\n\nAI code generators (Copilot, Cursor, Claude, ChatGPT, etc.) write code fast — but they routinely produce **insecure defaults, missing auth, hardcoded secrets, and poor error handling**. Human reviewers catch some of this, but nobody reviews 45 dimensions consistently.\n\n| | ESLint / Biome | SonarQube | Semgrep / CodeQL | **Judges** |\n|---|---|---|---|---|\n| **Scope** | Style + some bugs | Bugs + code smells | Security patterns | **45 domains**: security, cost, compliance, a11y, API design, cloud, UX, … |\n| **AI-generated code focus** | No | No | Partial | **Purpose-built** for AI output failure modes |\n| **Setup** | Config per project | Server + scanner | Cloud or local | **One command**: `npx @kevinrabun/judges-cli eval file.ts` |\n| **Auto-fix patches** | Some | No | No | **200+ deterministic patches** — instant, offline |\n| **Non-technical output** | No | Dashboard | No | **Plain-language findings** with What/Why/Next |\n| **MCP native** | No | No | No | **Yes** — works inside Copilot, Claude, Cursor |\n| **SARIF output** | No | Yes | Yes | **Yes** — upload to GitHub Code Scanning |\n| **Cost** | Free | $$$$ | Free/paid | **Free / MIT** |\n\n**Judges doesn't replace linters** — it covers the dimensions linters don't: authentication strategy, data sovereignty, cost patterns, accessibility, framework-specific anti-patterns, and architectural issues across multiple files.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/terminal-output.svg\" alt=\"Judges — Terminal Output\" width=\"680\" /\u003e\n\u003c/p\u003e\n\n---\n\n## Quick Start\n\n\u003e Prereqs: Node.js **\u003e=18** (\u003e=20 recommended), `npx` available. The `judges` CLI binary ships with **@kevinrabun/judges-cli** (preferred) and also works via `npx @kevinrabun/judges`.\n\u003e\n\u003e Packages:\n\u003e - **CLI:** `npm install -g @kevinrabun/judges-cli` (or `npx @kevinrabun/judges-cli ...`)\n\u003e - **MCP/API:** `npm install @kevinrabun/judges`\n\nUse `@kevinrabun/judges` for the MCP server and programmatic API. Use `@kevinrabun/judges-cli` when you want the `judges` terminal command.\n\n### Try it now (no clone needed)\n\n```bash\n# Install the CLI globally\nnpm install -g @kevinrabun/judges-cli\n\n# Evaluate any file\njudges eval src/app.ts\n\n# Pipe from stdin\ncat api.py | judges eval --language python\n\n# Single judge\njudges eval --judge cybersecurity server.ts\n\n# SARIF output for CI\njudges eval --file app.ts --format sarif \u003e results.sarif\n\n# HTML report with severity filters and dark/light theme\njudges eval --file app.ts --format html \u003e report.html\n\n# Fail CI on findings (exit code 1)\njudges eval --fail-on-findings src/api.ts\n\n# Suppress known findings via baseline\njudges eval --baseline baseline.json src/api.ts\n\n# Use a named preset\njudges eval --preset security-only src/api.ts\n\n# Use a config file\njudges eval --config .judgesrc.json src/api.ts\n\n# Set a minimum score threshold (exit 1 if below)\njudges eval --min-score 80 src/api.ts\n\n# One-line summary for scripts\njudges eval --summary src/api.ts\n\n# Agentic skills (orchestrated judge sets)\njudges skill ai-code-review --file src/app.ts\njudges skill security-review --file src/api.ts --format json\njudges skill release-gate --file src/app.ts\njudges skills   # list available skills\n\n\u003e Full catalog: [`docs/skills.md`](docs/skills.md)\n\n\n# List all 45 judges\njudges list\n```\n\n### Additional CLI Commands\n\n```bash\n# Interactive project setup wizard\njudges init\n\n# Preview auto-fix patches (dry run)\njudges fix src/app.ts\n\n# Apply patches directly\njudges fix src/app.ts --apply\n\n# License compliance scan (copyleft/unknown detection)\njudges license-scan --format json --risk high\n\n# Watch mode — re-evaluate on file save\njudges watch src/\n\n# Project-level report (local directory)\njudges report . --format html --output report.html\n\n# Evaluate a unified diff (pipe from git diff)\ngit diff HEAD~1 | judges diff\n\n# Analyze dependencies for supply-chain risks\njudges deps --path . --format json\n\n# Run GitHub App server (zero-config PR reviews)\njudges app serve --port 4567\n\n# Run GitHub PR review (gh CLI required)\njudges review --pr 123 --repo owner/name --diff-only\n\n# Auto-tune presets and configs\njudges tune --dir . --apply\n\n# Create a baseline file to suppress known findings\njudges baseline create --file src/api.ts -o baseline.json\n\n# Generate CI template files\njudges ci-templates --provider github\njudges ci-templates --provider gitlab\njudges ci-templates --provider azure\njudges ci-templates --provider bitbucket\n\n# Generate per-judge rule documentation\njudges docs\njudges docs --judge cybersecurity\njudges docs --output docs/\n\n# Install shell completions\njudges completions bash   # eval \"$(judges completions bash)\"\njudges completions zsh\njudges completions fish\njudges completions powershell\n\n# Install pre-commit hook\njudges hook install\n\n# Uninstall pre-commit hook\njudges hook uninstall\n```\n\n\u003e 🔎 Tip: The CLI help now defaults to **GA commands only**. To see experimental/roadmap commands, run:\n\u003e\n\u003e ```bash\n\u003e JUDGES_SHOW_EXPERIMENTAL=1 judges --help\n\u003e ```\n\n### GitHub App (self-hosted webhook)\n\nRun a zero-config PR reviewer as a GitHub App:\n\n```bash\n# Run the webhook server locally\njudges app serve --port 4567\n```\n\n**Required env vars:**\n- `JUDGES_APP_ID` – GitHub App ID\n- `JUDGES_PRIVATE_KEY` or `JUDGES_PRIVATE_KEY_PATH` – PEM private key\n- `JUDGES_WEBHOOK_SECRET` – signature verification secret\n\nOptional:\n- `JUDGES_MIN_SEVERITY` (default: `medium`)\n- `JUDGES_MAX_COMMENTS` (default: 25)\n- `JUDGES_TEST_DRY_RUN=1` to avoid live network calls during tests\n\nFor local testing, you can expose \u003ccode\u003ehttp://localhost:4567/webhook\u003c/code\u003e via \u003ccode\u003engrok http 4567\u003c/code\u003e and configure the GitHub App webhook URL accordingly.\n\n### Use in GitHub Actions\n\nAdd Judges to your CI pipeline with zero configuration:\n\n```yaml\n# .github/workflows/judges.yml\nname: Judges Code Review\non: [pull_request]\n\njobs:\n  judges:\n    runs-on: ubuntu-latest\n    permissions:\n      contents: read\n      security-events: write  # only if using upload-sarif\n    steps:\n      - uses: actions/checkout@v4\n      - uses: KevinRabun/judges@main\n        with:\n          path: src/api.ts        # file or directory\n          format: text             # text | json | sarif | markdown\n          upload-sarif: true       # upload to GitHub Code Scanning\n          fail-on-findings: true   # fail CI on critical/high findings\n```\n\n**Outputs** available for downstream steps: `verdict`, `score`, `findings`, `critical`, `high`, `sarif-file`.\n\n### Use with Docker (no Node.js required)\n\n```bash\n# Build the image\ndocker build -t judges .\n\n# Evaluate a local file\ndocker run --rm -v $(pwd):/code judges eval --file /code/app.ts\n\n# Pipe from stdin\ncat api.py | docker run --rm -i judges eval --language python\n\n# List judges\ndocker run --rm judges list\n```\n\n### Or use as an MCP server\n\n### 1. Install and Build\n\n```bash\ngit clone https://github.com/KevinRabun/judges.git\ncd judges\nnpm install\nnpm run build\n```\n\n### 2. Try the Demo\n\nRun the included demo to see all 45 judges evaluate a purposely flawed API server:\n\n```bash\nnpm run demo\n```\n\nThis evaluates [`examples/sample-vulnerable-api.ts`](examples/sample-vulnerable-api.ts) — a file intentionally packed with security holes, performance anti-patterns, and code quality issues — and prints a full verdict with per-judge scores and findings.\n\nThe demo now also includes an **App Builder Workflow (3-step)** section. In a single run, you get both tribunal output and workflow output:\n- Release decision (`Ship now` / `Ship with caution` / `Do not ship`)\n- Plain-language summaries of top risks\n- Prioritized remediation tasks and AI-fixable `P0/P1` items\n\n**Sample workflow output (truncated):**\n\n```text\n╔══════════════════════════════════════════════════════════════╗\n║             App Builder Workflow Demo (3-Step)             ║\n╚══════════════════════════════════════════════════════════════╝\n\n  Decision       : Do not ship\n  Verdict        : FAIL (47/100)\n  Risk Counts    : Critical 24 | High 27 | Medium 55\n\n  Step 2 — Plain-Language Findings:\n  - [CRITICAL] DATA-001: Hardcoded password detected\n      What: ...\n      Why : ...\n      Next: ...\n\n  Step 3 — Prioritized Tasks:\n  - P0 | DEVELOPER | Effort L | DATA-001\n      Task: ...\n      Done: ...\n\n  AI-Fixable Now (P0/P1):\n  - P0 DATA-001: ...\n```\n\n**Sample tribunal output (truncated):**\n\n```\n╔══════════════════════════════════════════════════════════════╗\n║           Judges Panel — Full Tribunal Demo                 ║\n╚══════════════════════════════════════════════════════════════╝\n\n  Overall Verdict : FAIL\n  Overall Score   : 43/100\n  Critical Issues : 15\n  High Issues     : 17\n  Total Findings  : 83\n  Judges Run      : 33\n\n  Per-Judge Breakdown:\n  ────────────────────────────────────────────────────────────────\n  ❌ Judge Data Security              0/100    7 finding(s)\n  ❌ Judge Cybersecurity              0/100    7 finding(s)\n  ❌ Judge Cost Effectiveness        52/100    5 finding(s)\n  ⚠️  Judge Scalability              65/100    4 finding(s)\n  ❌ Judge Cloud Readiness           61/100    4 finding(s)\n  ❌ Judge Software Practices        45/100    6 finding(s)\n  ❌ Judge Accessibility              0/100    8 finding(s)\n  ❌ Judge API Design                 0/100    9 finding(s)\n  ❌ Judge Reliability               54/100    3 finding(s)\n  ❌ Judge Observability             45/100    5 finding(s)\n  ❌ Judge Performance               27/100    5 finding(s)\n  ❌ Judge Compliance                 0/100    4 finding(s)\n  ⚠️  Judge Testing                  90/100    1 finding(s)\n  ⚠️  Judge Documentation            70/100    4 finding(s)\n  ⚠️  Judge Internationalization     65/100    4 finding(s)\n  ⚠️  Judge Dependency Health        90/100    1 finding(s)\n  ❌ Judge Concurrency               44/100    4 finding(s)\n  ❌ Judge Ethics \u0026 Bias             65/100    2 finding(s)\n  ❌ Judge Maintainability           52/100    4 finding(s)\n  ❌ Judge Error Handling            27/100    3 finding(s)\n  ❌ Judge Authentication             0/100    4 finding(s)\n  ❌ Judge Database                   0/100    5 finding(s)\n  ❌ Judge Caching                   62/100    3 finding(s)\n  ❌ Judge Configuration Mgmt         0/100    3 finding(s)\n  ⚠️  Judge Backwards Compat         80/100    2 finding(s)\n  ⚠️  Judge Portability              72/100    2 finding(s)\n  ❌ Judge UX                        52/100    4 finding(s)\n  ❌ Judge Logging Privacy            0/100    4 finding(s)\n  ❌ Judge Rate Limiting             27/100    4 finding(s)\n  ⚠️  Judge CI/CD                    80/100    2 finding(s)\n```\n\n### 3. Run the Tests\n\n```bash\nnpm test\n```\n\nRuns automated tests covering all judges, AST parsers, markdown formatters, and edge cases.\n\n### 4. Connect to Your Editor\n\n#### VS Code (recommended — zero config)\n\nInstall the **[Judges Panel](https://marketplace.visualstudio.com/items?itemName=kevinrabun.judges-panel)** extension from the Marketplace. It provides:\n\n- **Inline diagnostics \u0026 quick-fixes** on every file save\n- **`@judges` chat participant** — type `@judges` in Copilot Chat, or just ask for a \"judges panel review\" and Copilot routes automatically\n- **Auto-configured MCP server** — all 45 expert-persona prompts available to Copilot with zero setup\n\n```bash\ncode --install-extension kevinrabun.judges-panel\n```\n\n#### VS Code — manual MCP config\n\nIf you prefer explicit workspace config (or want teammates without the extension to benefit), create `.vscode/mcp.json`:\n\n```json\n{\n  \"servers\": {\n    \"judges\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@kevinrabun/judges\"]\n    }\n  }\n}\n```\n\n#### Claude Desktop\n\nAdd to `claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"judges\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@kevinrabun/judges\"]\n    }\n  }\n}\n```\n\n#### Cursor / other MCP clients\n\nUse the same `npx` command for any MCP-compatible client:\n\n```json\n{\n  \"command\": \"npx\",\n  \"args\": [\"-y\", \"@kevinrabun/judges\"]\n}\n```\n\n### 5. Use Judges in GitHub Copilot PR Reviews\n\nYes — users can include Judges as part of GitHub-based review workflows, with one important caveat:\n\n- The hosted `copilot-pull-request-reviewer` on GitHub does not currently let you directly attach arbitrary local MCP servers the same way VS Code does.\n- The practical pattern is to run Judges in CI on each PR, publish a report/check, and have Copilot + human reviewers use that output during review.\n\n#### Option A (recommended): PR workflow check + report artifact\n\nCreate `.github/workflows/judges-pr-review.yml`:\n\n```yaml\nname: Judges PR Review\n\non:\n  pull_request:\n    types: [opened, synchronize, reopened]\n\njobs:\n  judges:\n    runs-on: ubuntu-latest\n    permissions:\n      contents: read\n      pull-requests: write\n\n    steps:\n      - name: Checkout\n        uses: actions/checkout@v4\n\n      - name: Setup Node\n        uses: actions/setup-node@v4\n        with:\n          node-version: 20\n          cache: npm\n\n      - name: Install\n        run: npm ci\n\n      - name: Generate Judges report\n        run: |\n          npx tsx -e \"import { generateRepoReportFromLocalPath } from './src/reports/public-repo-report.ts';\n          const result = generateRepoReportFromLocalPath({\n            repoPath: process.cwd(),\n            outputPath: 'judges-pr-report.md',\n            maxFiles: 600,\n            maxFindingsInReport: 150,\n          });\n          console.log('Overall:', result.overallVerdict, result.averageScore);\"\n\n      - name: Upload report artifact\n        uses: actions/upload-artifact@v4\n        with:\n          name: judges-pr-report\n          path: judges-pr-report.md\n```\n\nThis gives every PR a reproducible Judges output your team (and Copilot) can reference.\n\n#### Option B: Add Copilot custom instructions in-repo\n\nAdd `.github/instructions/judges.instructions.md` with guidance such as:\n\n```markdown\nWhen reviewing pull requests:\n1. Read the latest Judges report artifact/check output first.\n2. Prioritize CRITICAL and HIGH findings in remediation guidance.\n3. If findings conflict, defer to security/compliance-related Judges.\n4. Include rule IDs (e.g., DATA-001, CYBER-004) in suggested fixes.\n```\n\nThis helps keep Copilot feedback aligned with Judges findings.\n\n---\n\n## CLI Reference\n\nAll commands support `--help` for usage details.\n\n### `judges eval`\n\nEvaluate a file with all 45 judges or a single judge.\n\n| Flag | Description |\n|------|-------------|\n| `--file \u003cpath\u003e` / positional | File to evaluate |\n| `--judge \u003cid\u003e` / `-j \u003cid\u003e` | Single judge mode |\n| `--language \u003clang\u003e` / `-l \u003clang\u003e` | Language hint (auto-detected from extension) |\n| `--format \u003cfmt\u003e` / `-f \u003cfmt\u003e` | Output format: `text`, `json`, `sarif`, `markdown`, `html`, `pdf`, `junit`, `codeclimate`, `github-actions` |\n| `--output \u003cpath\u003e` / `-o \u003cpath\u003e` | Write output to file |\n| `--fail-on-findings` | Exit with code 1 if verdict is FAIL |\n| `--baseline \u003cpath\u003e` / `-b \u003cpath\u003e` | JSON baseline file — suppress known findings |\n| `--summary` | Print a single summary line (ideal for scripts) |\n| `--config \u003cpath\u003e` | Load a `.judgesrc` / `.judgesrc.json` config file |\n| `--preset \u003cname\u003e` | Use a named preset (see [Named Presets](#named-presets) for all 22 options) |\n| `--min-score \u003cn\u003e` | Exit with code 1 if overall score is below this threshold |\n| `--verbose` | Print timing and debug information |\n| `--quiet` | Suppress non-essential output |\n| `--no-color` | Disable ANSI colors |\n\n### `judges init`\n\nInteractive wizard that generates project configuration:\n- `.judgesrc.json` — rule customization, disabled judges, severity thresholds\n- `.github/workflows/judges.yml` — GitHub Actions CI workflow\n- `.gitlab-ci.judges.yml` — GitLab CI pipeline (optional)\n- `azure-pipelines.judges.yml` — Azure Pipelines (optional)\n\n### `judges fix`\n\nPreview or apply auto-fix patches from deterministic findings.\n\n| Flag | Description |\n|------|-------------|\n| positional | File to fix |\n| `--apply` | Write patches to disk (default: dry run) |\n| `--judge \u003cid\u003e` | Limit to a single judge's findings |\n\n### `judges watch`\n\nContinuously re-evaluate files on save.\n\n| Flag | Description |\n|------|-------------|\n| positional | File or directory to watch (default: `.`) |\n| `--judge \u003cid\u003e` | Single judge mode |\n| `--fail-on-findings` | Exit non-zero if any evaluation fails |\n\n### `judges report`\n\nRun a full project-level tribunal on a local directory.\n\n| Flag | Description |\n|------|-------------|\n| positional | Directory path (default: `.`) |\n| `--format \u003cfmt\u003e` | Output format: `text`, `json`, `html`, `markdown` |\n| `--output \u003cpath\u003e` | Write report to file |\n| `--max-files \u003cn\u003e` | Maximum files to analyze (default: 600) |\n| `--max-file-bytes \u003cn\u003e` | Skip files larger than this (default: 300000) |\n\n### `judges hook`\n\nManage a Git pre-commit hook that runs Judges on staged files.\n\n```bash\njudges hook install    # add pre-commit hook\njudges hook uninstall  # remove pre-commit hook\n```\n\nDetects Husky (`.husky/pre-commit`) and falls back to `.git/hooks/pre-commit`. Uses marker-based injection so it won't clobber existing hooks.\n\n### `judges diff`\n\nEvaluate only the changed lines from a unified diff (e.g., `git diff` output).\n\n| Flag | Description |\n|------|-------------|\n| `--file \u003cpath\u003e` | Read diff from file instead of stdin |\n| `--format \u003cfmt\u003e` | Output format: `text`, `json`, `sarif`, `junit`, `codeclimate` |\n| `--output \u003cpath\u003e` | Write output to file |\n\n```bash\ngit diff HEAD~1 | judges diff\njudges diff --file changes.patch --format sarif\n```\n\n### `judges deps`\n\nAnalyze project dependencies for supply-chain risks.\n\n| Flag | Description |\n|------|-------------|\n| `--path \u003cdir\u003e` | Project root to scan (default: `.`) |\n| `--format \u003cfmt\u003e` | Output format: `text`, `json` |\n\n```bash\njudges deps --path .\njudges deps --path ./backend --format json\n```\n\n### `judges baseline`\n\nCreate a baseline file to suppress known findings in future evaluations.\n\n```bash\njudges baseline create --file src/api.ts\njudges baseline create --file src/api.ts -o .judges-baseline.json\n```\n\n### `judges ci-templates`\n\nGenerate CI/CD configuration templates for popular providers.\n\n```bash\njudges ci-templates --provider github   # .github/workflows/judges.yml\njudges ci-templates --provider gitlab   # .gitlab-ci.judges.yml\njudges ci-templates --provider azure    # azure-pipelines.judges.yml\njudges ci-templates --provider bitbucket # bitbucket-pipelines.yml (snippet)\n```\n\n### `judges docs`\n\nGenerate per-judge rule documentation in Markdown.\n\n| Flag | Description |\n|------|-------------|\n| `--judge \u003cid\u003e` | Generate docs for a single judge |\n| `--output \u003cdir\u003e` | Write individual `.md` files per judge |\n\n```bash\njudges docs                          # all judges to stdout\njudges docs --judge cybersecurity    # single judge\njudges docs --output docs/judges/    # write files to directory\n```\n\n### `judges completions`\n\nGenerate shell completion scripts.\n\n```bash\neval \"$(judges completions bash)\"        # Bash\neval \"$(judges completions zsh)\"         # Zsh\njudges completions fish | source         # Fish\njudges completions powershell            # PowerShell (Register-ArgumentCompleter)\n```\n\n### Named Presets\n\nUse `--preset` to apply pre-configured evaluation settings:\n\n| Preset | Description |\n|--------|-------------|\n| `strict` | All severities, all judges — maximum thoroughness |\n| `lenient` | Only high and critical findings — fast and focused |\n| `security-only` | Security-focused — disables non-security judges (cost, scalability, docs, a11y, i18n, UX, etc.) |\n| `startup` | Skip compliance, sovereignty, i18n judges — move fast |\n| `compliance` | Only compliance, data-sovereignty, authentication — regulatory focus |\n| `performance` | Only performance, scalability, caching, cost-effectiveness |\n| `react` | Tuned for React/Next.js apps — enables accessibility, XSS protection |\n| `express` | Tuned for Express.js APIs — middleware security, auth, CORS, rate limiting |\n| `fastapi` | Tuned for Python FastAPI — input validation, async patterns, API security |\n| `django` | Tuned for Django apps — template security, ORM misuse, CSRF |\n| `spring-boot` | Tuned for Java Spring Boot — injection, configuration, actuator security |\n| `rails` | Tuned for Ruby on Rails — mass assignment, CSRF, SQL injection |\n| `nextjs` | Tuned for Next.js — server/client security, API routes, SSR/ISR |\n| `terraform` | Tuned for Terraform/OpenTofu IaC — infrastructure security, compliance |\n| `kubernetes` | Tuned for K8s manifests — security contexts, RBAC, resource limits |\n| `onboarding` | Smart defaults for first-time adoption — suppresses noisy rules |\n| `fintech` | Financial services — PCI DSS, cryptography, authentication, audit |\n| `healthtech` | Healthcare — HIPAA compliance, data sovereignty, encryption, audit trails |\n| `saas` | Multi-tenant SaaS — tenant isolation, rate limiting, scalability |\n| `government` | Government/public sector — compliance, sovereignty, authentication |\n| `open-source` | Open-source projects — documentation, backwards compatibility, security, dependency health |\n| `ai-review` | AI-generated code review — hallucination detection, security, authentication, correctness |\n\n```bash\njudges eval --preset security-only src/api.ts\njudges eval --preset strict --format sarif src/app.ts \u003e results.sarif\n```\n\n### CI Output Formats\n\n#### JUnit XML\n\nGenerate JUnit XML for Jenkins, Azure DevOps, GitHub Actions, or GitLab test result viewers:\n\n```bash\njudges eval --format junit src/api.ts \u003e results.xml\n```\n\nEach judge maps to a `\u003ctestsuite\u003e`, each finding becomes a `\u003ctestcase\u003e` with `\u003cfailure\u003e` for critical/high severity.\n\n#### CodeClimate / GitLab Code Quality\n\nGenerate CodeClimate JSON for GitLab Code Quality or similar tools:\n\n```bash\njudges eval --format codeclimate src/api.ts \u003e codequality.json\n```\n\n#### Score Badges\n\nGenerate SVG or text badges for your README:\n\n```typescript\nimport { generateBadgeSvg, generateBadgeText } from \"@kevinrabun/judges/badge\";\n\nconst svg = generateBadgeSvg(85);          // shields.io-style SVG\nconst text = generateBadgeText(85);        // \"✓ judges 85/100\"\nconst svg2 = generateBadgeSvg(75, \"quality\"); // custom label\n```\n\n---\n\n## The Judge Panel\n\n\u003c!-- JUDGES_TABLE_START --\u003e\n| Judge | Domain | Rule Prefix | What It Evaluates |\n|-------|--------|-------------|-------------------|\n| **Data Security** | Data Security \u0026 Privacy | `DATA-` | Encryption, PII handling, secrets management, access controls |\n| **Cybersecurity** | Cybersecurity \u0026 Threat Defense | `CYBER-` | Injection attacks, XSS, CSRF, auth flaws, OWASP Top 10 |\n| **Cost Effectiveness** | Cost Optimization \u0026 Resource Efficiency | `COST-` | Algorithm efficiency, N+1 queries, memory waste, caching strategy |\n| **Scalability** | Scalability \u0026 Performance | `SCALE-` | Statelessness, horizontal scaling, concurrency, bottlenecks |\n| **Cloud Readiness** | Cloud-Native Architecture \u0026 DevOps | `CLOUD-` | 12-Factor compliance, containerization, graceful shutdown, IaC |\n| **Software Practices** | Software Engineering Best Practices \u0026 Secure SDLC | `SWDEV-` | SOLID principles, type safety, error handling, input validation |\n| **Accessibility** | Accessibility (a11y) | `A11Y-` | WCAG compliance, screen reader support, keyboard navigation, ARIA |\n| **API Design** | API Design \u0026 Contracts | `API-` | REST conventions, versioning, pagination, error responses |\n| **Reliability** | Reliability \u0026 Resilience | `REL-` | Error handling, timeouts, retries, circuit breakers |\n| **Observability** | Monitoring \u0026 Diagnostics | `OBS-` | Structured logging, health checks, metrics, tracing |\n| **Performance** | Runtime Performance | `PERF-` | N+1 queries, sync I/O, caching, memory leaks |\n| **Compliance** | Regulatory \u0026 License Compliance | `COMP-` | GDPR/CCPA, PII protection, consent, data retention, audit trails |\n| **Data Sovereignty** | Data, Technological \u0026 Operational Sovereignty | `SOV-` | Data residency, cross-border transfers, vendor key management, AI model portability, identity federation, circuit breakers, audit trails, data export |\n| **Testing** | Test Quality \u0026 Coverage | `TEST-` | Test coverage, assertions, test isolation, naming |\n| **Documentation** | Documentation \u0026 Developer Experience | `DOC-` | JSDoc/docstrings, magic numbers, TODOs, code comments |\n| **Internationalization** | i18n \u0026 Localization | `I18N-` | Hardcoded strings, locale handling, currency formatting |\n| **Dependency Health** | Supply Chain \u0026 Dependencies | `DEPS-` | Version pinning, deprecated packages, supply chain |\n| **Concurrency** | Concurrency \u0026 Thread Safety | `CONC-` | Race conditions, unbounded parallelism, missing await |\n| **Ethics \u0026 Bias** | AI/ML Fairness \u0026 Ethics | `ETHICS-` | Demographic logic, dark patterns, inclusive language |\n| **Maintainability** | Code Maintainability \u0026 Technical Debt | `MAINT-` | Any types, magic numbers, deep nesting, dead code, file length |\n| **Error Handling** | Error Handling \u0026 Fault Tolerance | `ERR-` | Empty catch blocks, missing error handlers, swallowed errors |\n| **Authentication** | Authentication \u0026 Authorization | `AUTH-` | Hardcoded creds, missing auth middleware, token in query params |\n| **Database** | Database Design \u0026 Query Efficiency | `DB-` | SQL injection, N+1 queries, connection pooling, transactions |\n| **Caching** | Caching Strategy \u0026 Data Freshness | `CACHE-` | Unbounded caches, missing TTL, no HTTP cache headers |\n| **Configuration Management** | Configuration \u0026 Secrets Management | `CFG-` | Hardcoded secrets, missing env vars, config validation |\n| **Backwards Compatibility** | Backwards Compatibility \u0026 Versioning | `COMPAT-` | API versioning, breaking changes, response consistency |\n| **Portability** | Platform Portability \u0026 Vendor Independence | `PORTA-` | OS-specific paths, vendor lock-in, hardcoded hosts |\n| **UX** | User Experience \u0026 Interface Quality | `UX-` | Loading states, error messages, pagination, destructive actions |\n| **Logging Privacy** | Logging Privacy \u0026 Data Redaction | `LOGPRIV-` | PII in logs, token logging, structured logging, redaction |\n| **Rate Limiting** | Rate Limiting \u0026 Throttling | `RATE-` | Missing rate limits, unbounded queries, backoff strategy |\n| **CI/CD** | CI/CD Pipeline \u0026 Deployment Safety | `CICD-` | Test infrastructure, lint config, Docker tags, build scripts |\n| **Code Structure** | Structural Analysis | `STRUCT-` | Cyclomatic complexity, nesting depth, function length, dead code, type safety |\n| **Agent Instructions** | Agent Instruction Markdown Quality \u0026 Safety | `AGENT-` | Instruction hierarchy, conflict detection, unsafe overrides, scope, validation, policy guidance |\n| **AI Code Safety** | AI-Generated Code Quality \u0026 Security | `AICS-` | Prompt injection, insecure LLM output handling, debug defaults, missing validation, unsafe deserialization of AI responses |\n| **Framework Safety** | Framework-Specific Security \u0026 Best Practices | `FW-` | React hooks ordering, Express middleware chains, Next.js SSR/SSG pitfalls, Angular/Vue lifecycle patterns, Django/Flask/FastAPI safety, Spring Boot security, ASP.NET Core auth \u0026 CORS, Go Gin/Echo/Fiber patterns |\n| **IaC Security** | Infrastructure as Code | `IAC-` | Terraform, Bicep, ARM template misconfigurations, hardcoded secrets, missing encryption, overly permissive network/IAM rules |\n| **Security** | General Security Posture | `SEC-` | Holistic security assessment — insecure data flows, weak cryptography, unsafe deserialization |\n| **Hallucination Detection** | AI-Hallucinated API \u0026 Import Validation | `HALLU-` | Detects hallucinated APIs, fabricated imports, and non-existent modules from AI code generators |\n| **Intent Alignment** | Code–Comment Alignment \u0026 Stub Detection | `INTENT-` | Detects mismatches between stated intent and implementation, placeholder stubs, TODO-only functions |\n| **API Contract Conformance** | API Design \u0026 REST Best Practices | `API-` | API endpoint input validation, REST conformance, request/response contract consistency |\n| **Multi-Turn Coherence** | Code Coherence \u0026 Consistency | `COH-` | Self-contradicting patterns, duplicate definitions, dead code, inconsistent naming |\n| **Model Fingerprint Detection** | AI Code Provenance \u0026 Model Attribution | `MFPR-` | Detects stylistic fingerprints characteristic of specific AI code generators |\n| **Over-Engineering** | Simplicity \u0026 Pragmatism | `OVER-` | Unnecessary abstractions, wrapper-mania, premature generalization, over-complex patterns |\n| **Logic Review** | Semantic Correctness \u0026 Logic Integrity | `LOGIC-` | Inverted conditions, dead code, name-body mismatch, off-by-one, incomplete control flow |\n| **False-Positive Review** | False Positive Detection \u0026 Finding Accuracy | `FPR-` | Meta-judge reviewing pattern-based findings for false positives: string literal context, comment/docstring matches, test scaffolding, IaC template gating |\n\u003c!-- JUDGES_TABLE_END --\u003e\n\n---\n\n## How It Works\n\nThe tribunal operates in three layers:\n\n1. **Pattern-Based Analysis** — All tools (`evaluate_code`, `evaluate_code_single_judge`, `evaluate_project`, `evaluate_diff`) perform heuristic analysis using regex pattern matching to catch common anti-patterns. This layer is instant, deterministic, and runs entirely offline with zero external API calls.\n\n2. **AST-Based Structural Analysis** — The Code Structure judge (`STRUCT-*` rules) uses real Abstract Syntax Tree parsing to measure cyclomatic complexity, nesting depth, function length, parameter count, dead code, and type safety with precision that regex cannot achieve. All supported languages — **TypeScript, JavaScript, Python, Rust, Go, Java, C#, and C++** — are parsed via **tree-sitter WASM grammars** (real syntax trees compiled to WebAssembly, in-process, zero native dependencies). A scope-tracking structural parser is kept as a fallback when WASM grammars are unavailable. No external AST server required.\n\n3. **LLM-Powered Deep Analysis (Prompts)** — The server exposes MCP prompts (e.g., `judge-data-security`, `judge-cybersecurity`) that provide each judge's expert persona as a system prompt. When used by an LLM-based client (Copilot, Claude, Cursor, etc.), the host LLM performs deeper, context-aware probabilistic analysis beyond what static patterns can detect. This is where the `systemPrompt` on each judge comes alive — Judges itself makes no LLM calls, but it provides the expert criteria so your AI assistant can act as 45 specialized reviewers.\n\n---\n\n## Composable by Design\n\nJudges Panel is a **dual-layer** review system: instant **deterministic tools** (offline, no API keys) for pattern and AST analysis, plus **45 expert-persona MCP prompts** that unlock LLM-powered deep analysis when connected to an AI client. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.\n\n### Built-in AST Analysis (v2.0.0+)\n\nUnlike earlier versions that recommended a separate AST MCP server, Judges Panel now includes **real AST-based structural analysis** out of the box:\n\n- **TypeScript, JavaScript, Python, Rust, Go, Java, C#, C++** — All parsed with a **unified tree-sitter WASM engine** for full syntax-tree analysis (functions, complexity, nesting, dead code, type safety). Falls back to a scope-tracking structural parser when WASM grammars are unavailable\n\nThe Code Structure judge (`STRUCT-*`) uses these parsers to accurately measure:\n\n| Rule | Metric | Threshold |\n|------|--------|-----------|\n| `STRUCT-001` | Cyclomatic complexity | \u003e 10 per function (high) |\n| `STRUCT-002` | Nesting depth | \u003e 4 levels (medium) |\n| `STRUCT-003` | Function length | \u003e 50 lines (medium) |\n| `STRUCT-004` | Parameter count | \u003e 5 parameters (medium) |\n| `STRUCT-005` | Dead code | Unreachable statements (low) |\n| `STRUCT-006` | Weak types | `any`, `dynamic`, `Object`, `interface{}`, `unsafe` (medium) |\n| `STRUCT-007` | File complexity | \u003e 40 total cyclomatic complexity (high) |\n| `STRUCT-008` | Extreme complexity | \u003e 20 per function (critical) |\n| `STRUCT-009` | Extreme parameters | \u003e 8 parameters (high) |\n| `STRUCT-010` | Extreme function length | \u003e 150 lines (high) |\n\n### Recommended MCP Stack\n\nWhen your AI coding assistant connects to multiple MCP servers, each one contributes its specialty:\n\n```\n┌─────────────────────────────────────────────────────────┐\n│                   AI Coding Assistant                   │\n│              (Claude, Copilot, Cursor, etc.)            │\n└──────┬──────────────────┬──────────┬───────────────────┘\n       │                  │          │\n       ▼                  ▼          ▼\n  ┌──────────────┐  ┌────────┐  ┌────────┐\n  │   Judges     │  │  CVE / │  │ Linter │\n  │   Panel      │  │  SBOM  │  │ Server │\n  │ ─────────────│  └────────┘  └────────┘\n  │ 44 Heuristic │   Vuln DB     Style \u0026\n  │   judges     │   scanning    correctness\n  │ + AST judge  │\n  └──────────────┘\n   Patterns +\n   structural\n   analysis\n```\n\n| Layer | What It Does | Example Servers |\n|-------|-------------|-----------------|\n| **Judges Panel** | 45-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance, AI code safety, framework safety | This server |\n| **CVE / SBOM** | Vulnerability scanning against live databases — known CVEs, license risks, supply chain | OSV, Snyk, Trivy, Grype MCP servers |\n| **Linting** | Language-specific style and correctness rules | ESLint, Ruff, Clippy MCP servers |\n| **Runtime Profiling** | Memory, CPU, latency measurement on running code | Custom profiling MCP servers |\n\n### What This Means in Practice\n\nWhen you ask your AI assistant *\"Is this code production-ready?\"*, the agent can:\n\n1. **Judges Panel** → Scan for hardcoded secrets, missing error handling, N+1 queries, accessibility gaps, compliance issues, **plus** analyze cyclomatic complexity, detect dead code, and flag deeply nested functions via AST\n2. **CVE Server** → Check every dependency in `package.json` against known vulnerabilities\n3. **Linter Server** → Enforce team style rules, catch language-specific gotchas\n\nEach server returns structured findings. The AI synthesizes everything into a single, actionable review — no single server needs to do it all.\n\n---\n\n## MCP Tools\n\n### `evaluate_v2`\nRun a **V2 context-aware tribunal evaluation** designed to raise feedback quality toward lead engineer/architect-level review:\n\n- Policy profile calibration (`default`, `startup`, `regulated`, `healthcare`, `fintech`, `public-sector`)\n- Context ingestion (architecture notes, constraints, standards, known risks, data-boundary model)\n- Runtime evidence hooks (tests, coverage, latency, error rate, vulnerability counts)\n- Specialty feedback aggregation by judge/domain\n- Confidence scoring and explicit uncertainty reporting\n\nSupports:\n- **Code mode**: `code` + `language`\n- **Project mode**: `files[]`\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `code` | string | conditional | Source code for single-file mode |\n| `language` | string | conditional | Programming language for single-file mode |\n| `files` | array | conditional | `{ path, content, language }[]` for project mode |\n| `context` | string | no | High-level review context |\n| `includeAstFindings` | boolean | no | Include AST/code-structure findings (default: true) |\n| `minConfidence` | number | no | Minimum finding confidence to include (0-1, default: 0) |\n| `policyProfile` | enum | no | `default`, `startup`, `regulated`, `healthcare`, `fintech`, `public-sector` |\n| `evaluationContext` | object | no | Structured architecture/constraint context |\n| `evidence` | object | no | Runtime/operational evidence for confidence calibration |\n\n### `evaluate_app_builder_flow`\nRun a **3-step app-builder workflow** for technical and non-technical stakeholders:\n\n1. Tribunal review (code/project/diff)\n2. Plain-language translation of top risks\n3. Prioritized remediation tasks with AI-fixable P0/P1 extraction\n\nSupports:\n- **Code mode**: `code` + `language`\n- **Project mode**: `files[]`\n- **Diff mode**: `code` + `language` + `changedLines[]`\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `code` | string | conditional | Full source content (code/diff mode) |\n| `language` | string | conditional | Programming language (code/diff mode) |\n| `files` | array | conditional | `{ path, content, language }[]` for project mode |\n| `changedLines` | number[] | no | 1-based changed lines for diff mode |\n| `context` | string | no | Optional business/technical context |\n| `maxFindings` | number | no | Max translated top findings (default: 10) |\n| `maxTasks` | number | no | Max generated tasks (default: 20) |\n| `includeAstFindings` | boolean | no | Include AST/code-structure findings (default: true) |\n| `minConfidence` | number | no | Minimum finding confidence to include (0-1, default: 0) |\n\n### `evaluate_public_repo_report`\nClone a **public repository URL**, run the full judges panel across eligible source files, and generate a consolidated markdown report.\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `repoUrl` | string | yes | Public repository URL (`https://...`) |\n| `branch` | string | no | Optional branch name |\n| `outputPath` | string | no | Optional path to write report markdown |\n| `maxFiles` | number | no | Max files analyzed (default: 600) |\n| `maxFileBytes` | number | no | Max file size in bytes (default: 300000) |\n| `maxFindingsInReport` | number | no | Max detailed findings in output (default: 150) |\n| `credentialMode` | string | no | Credential detection mode: `standard` (default) or `strict` |\n| `includeAstFindings` | boolean | no | Include AST/code-structure findings (default: true) |\n| `minConfidence` | number | no | Minimum finding confidence to include (0-1, default: 0) |\n| `enableMustFixGate` | boolean | no | Enable must-fix gate summary for high-confidence dangerous findings (default: false) |\n| `mustFixMinConfidence` | number | no | Confidence threshold for must-fix gate triggers (0-1, default: 0.85) |\n| `mustFixDangerousRulePrefixes` | string[] | no | Optional dangerous rule prefixes for gate matching (e.g., `AUTH`, `CYBER`, `DATA`) |\n| `keepClone` | boolean | no | Keep cloned repo on disk for inspection |\n\n**Quick examples**\n\nGenerate a report from CLI:\n\n```bash\nnpm run report:public-repo -- --repoUrl https://github.com/microsoft/vscode --output reports/vscode-judges-report.md\n\n# stricter credential-signal mode (optional)\nnpm run report:public-repo -- --repoUrl https://github.com/openclaw/openclaw --credentialMode strict --output reports/openclaw-judges-report-strict.md\n\n# judge findings only (exclude AST/code-structure findings)\nnpm run report:public-repo -- --repoUrl https://github.com/openclaw/openclaw --includeAstFindings false --output reports/openclaw-judges-report-no-ast.md\n\n# show only findings at 80%+ confidence\nnpm run report:public-repo -- --repoUrl https://github.com/openclaw/openclaw --minConfidence 0.8 --output reports/openclaw-judges-report-high-confidence.md\n\n# include must-fix gate summary in the generated report\nnpm run report:public-repo -- --repoUrl https://github.com/openclaw/openclaw --enableMustFixGate true --mustFixMinConfidence 0.9 --mustFixDangerousPrefix AUTH --mustFixDangerousPrefix CYBER --output reports/openclaw-judges-report-mustfix.md\n\n# opinionated quick-start mode (recommended first run)\nnpm run report:quickstart -- --repoUrl https://github.com/openclaw/openclaw --output reports/openclaw-quickstart.md\n```\n\nCall from MCP client:\n\n```json\n{\n  \"tool\": \"evaluate_public_repo_report\",\n  \"arguments\": {\n    \"repoUrl\": \"https://github.com/microsoft/vscode\",\n    \"branch\": \"main\",\n    \"maxFiles\": 400,\n    \"maxFindingsInReport\": 120,\n    \"credentialMode\": \"strict\",\n    \"includeAstFindings\": false,\n    \"minConfidence\": 0.8,\n    \"enableMustFixGate\": true,\n    \"mustFixMinConfidence\": 0.9,\n    \"mustFixDangerousRulePrefixes\": [\"AUTH\", \"CYBER\", \"DATA\"],\n    \"outputPath\": \"reports/vscode-judges-report.md\"\n  }\n}\n```\n\nTypical response summary includes:\n- overall verdict and average score\n- analyzed file count and total findings\n- per-judge score table\n- highest-risk findings and lowest-scoring files\n\nSample report snippet:\n\n```text\n# Public Repository Full Judges Report\n\nGenerated from https://github.com/microsoft/vscode on 2026-02-21T12:00:00.000Z.\n\n## Executive Summary\n- Overall verdict: WARNING\n- Average file score: 78/100\n- Total findings: 412 (critical 3, high 29, medium 114, low 185, info 81)\n```\n\n### `get_judges`\nList all available judges with their domains and descriptions.\n\n### `evaluate_code`\nSubmit code to the **full judges panel**. all 45 judges evaluate independently and return a combined verdict.\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `code` | string | yes | The source code to evaluate |\n| `language` | string | yes | Programming language (e.g., `typescript`, `python`) |\n| `context` | string | no | Additional context about the code |\n| `includeAstFindings` | boolean | no | Include AST/code-structure findings (default: true) |\n| `minConfidence` | number | no | Minimum finding confidence to include (0-1, default: 0) |\n| `config` | object | no | Inline configuration (see [Configuration](#configuration)) |\n\n### `evaluate_code_single_judge`\nSubmit code to a **specific judge** for targeted review.\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `code` | string | yes | The source code to evaluate |\n| `language` | string | yes | Programming language |\n| `judgeId` | string | yes | See [judge IDs](#judge-ids) below |\n| `context` | string | no | Additional context |\n| `minConfidence` | number | no | Minimum finding confidence to include (0-1, default: 0) |\n| `config` | object | no | Inline configuration (see [Configuration](#configuration)) |\n\n### `evaluate_project`\nSubmit multiple files for **project-level analysis**. all 45 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `files` | array | yes | Array of `{ path, content, language }` objects |\n| `context` | string | no | Optional project context |\n| `includeAstFindings` | boolean | no | Include AST/code-structure findings (default: true) |\n| `minConfidence` | number | no | Minimum finding confidence to include (0-1, default: 0) |\n| `config` | object | no | Inline configuration (see [Configuration](#configuration)) |\n\n### `evaluate_diff`\nEvaluate only the **changed lines** in a code diff. Runs all 45 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `code` | string | yes | The full file content (post-change) |\n| `language` | string | yes | Programming language |\n| `changedLines` | number[] | yes | 1-based line numbers that were changed |\n| `context` | string | no | Optional context about the change |\n| `includeAstFindings` | boolean | no | Include AST/code-structure findings (default: true) |\n| `minConfidence` | number | no | Minimum finding confidence to include (0-1, default: 0) |\n| `config` | object | no | Inline configuration (see [Configuration](#configuration)) |\n\n### `analyze_dependencies`\nAnalyze a dependency manifest file for supply-chain risks, version pinning issues, typosquatting indicators, and dependency hygiene. Supports `package.json`, `requirements.txt`, `Cargo.toml`, `go.mod`, `pom.xml`, and `.csproj` files.\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `manifest` | string | yes | Contents of the dependency manifest file |\n| `manifestType` | string | yes | File type: `package.json`, `requirements.txt`, etc. |\n| `context` | string | no | Optional context |\n\n### `evaluate_git_diff`\nEvaluate only **changed lines** from a git diff. Provide either `repoPath` for a live git diff or `diffText` for a pre-computed unified diff.\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `repoPath` | string | conditional | Absolute path to the git repository |\n| `base` | string | no | Git ref to diff against (default: `HEAD~1`) |\n| `diffText` | string | conditional | Pre-computed unified diff text |\n| `confidenceFilter` | number | no | Minimum confidence threshold for findings (0–1) |\n| `autoTune` | boolean | no | Apply feedback-driven auto-tuning (default: false) |\n| `maxPromptChars` | number | no | Max character budget for LLM prompts (default: 100000, 0 = unlimited) |\n| `config` | object | no | Inline configuration |\n\n### `re_evaluate_with_context`\nRe-run the tribunal with **prior findings as context** for iterative refinement. Supports dispute resolution, developer context injection, and focus-area filtering.\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `code` | string | yes | Source code to re-evaluate |\n| `language` | string | yes | Programming language |\n| `disputedRuleIds` | string[] | no | Rule IDs the developer disputes as false positives |\n| `acceptedRuleIds` | string[] | no | Rule IDs the developer accepts |\n| `developerContext` | string | no | Free-form explanation of developer intent |\n| `focusAreas` | string[] | no | Specific areas to focus on (e.g., `[\"security\"]`) |\n| `confidenceFilter` | number | no | Minimum confidence threshold (default: 0.5) |\n| `filePath` | string | no | File path for context-aware evaluation |\n| `deepReview` | boolean | no | Include LLM deep-review prompt section |\n| `relatedFiles` | array | no | Cross-file context `{ path, snippet, relationship? }[]` |\n| `maxPromptChars` | number | no | Max character budget for LLM prompts (default: 100000, 0 = unlimited) |\n\n### Additional MCP Tools\n\n| Tool | Description |\n|------|-------------|\n| `evaluate_file` | Read a file from disk and submit it to the full panel. Auto-detects language from extension. |\n| `evaluate_code_streaming` | Streaming evaluation — returns per-judge results as each judge completes with running aggregates. |\n| `evaluate_focused` | Run only specified judges. Use after an initial full evaluation to re-check specific areas. |\n| `evaluate_batch` | Evaluate multiple code files in a single call. Returns per-file verdicts plus aggregate statistics. |\n| `evaluate_then_fix` | Evaluate code and automatically generate fix patches for all findings with auto-fix support. |\n| `evaluate_with_progress` | Evaluate with progress callbacks for long-running evaluations. |\n| `evaluate_policy_aware` | Policy-aware evaluation with named profiles (startup, regulated, healthcare, fintech, public-sector). |\n| `fix_code` | Evaluate code and apply all available auto-fix patches. Returns fixed code with applied/remaining summary. |\n| `explain_finding` | Explain a finding in plain language with OWASP/CWE references, risk context, and remediation guidance. |\n| `triage_finding` | Set triage status of a finding (accepted-risk, deferred, wont-fix, false-positive) with attribution. |\n| `record_feedback` | Record user feedback (true-positive, false-positive, wont-fix) to calibrate confidence scores. |\n| `get_finding_stats` | Finding lifecycle statistics: open, fixed, recurring, and triaged counts plus trends. |\n| `get_suppression_analytics` | Analyze suppression patterns: FP rates by rule, suppression rates, auto-suppress candidates. |\n| `list_triaged_findings` | List triaged findings, optionally filtered by triage status. |\n| `benchmark_gate` | Run benchmarks against quality thresholds. Returns pass/fail with F1, precision, recall metrics. |\n| `run_benchmark` | Run the full benchmark suite with per-judge, per-category, per-difficulty breakdowns. |\n| `scaffold_judge` | Generate boilerplate files to add a new judge: definition, evaluator skeleton, and registration. |\n| `scaffold_plugin` | Generate a starter plugin template with custom rules, judges, and lifecycle hooks. |\n| `session_status` | Current evaluation session state: evaluation count, frameworks, verdict history, stability. |\n| `list_files` | List files and directories in the workspace for project exploration. |\n| `read_file` | Read file contents from the workspace. |\n\n#### Judge IDs\n\n`data-security` · `cybersecurity` · `security` · `cost-effectiveness` · `scalability` · `cloud-readiness` · `software-practices` · `accessibility` · `api-design` · `api-contract` · `reliability` · `observability` · `performance` · `compliance` · `data-sovereignty` · `testing` · `documentation` · `internationalization` · `dependency-health` · `concurrency` · `ethics-bias` · `maintainability` · `error-handling` · `authentication` · `database` · `caching` · `configuration-management` · `backwards-compatibility` · `portability` · `ux` · `logging-privacy` · `rate-limiting` · `ci-cd` · `code-structure` · `agent-instructions` · `ai-code-safety` · `framework-safety` · `iac-security` · `hallucination-detection` · `intent-alignment` · `multi-turn-coherence` · `model-fingerprint` · `over-engineering` · `logic-review` · `false-positive-review`\n\n---\n\n## MCP Prompts\n\nEach judge has a corresponding prompt for LLM-powered deep analysis:\n\n\u003c!-- PROMPTS_TABLE_START --\u003e\n| Prompt | Description |\n|--------|-------------|\n| `judge-data-security` | Deep data security review |\n| `judge-cybersecurity` | Deep cybersecurity review |\n| `judge-cost-effectiveness` | Deep cost optimization review |\n| `judge-scalability` | Deep scalability review |\n| `judge-cloud-readiness` | Deep cloud readiness review |\n| `judge-software-practices` | Deep software practices review |\n| `judge-accessibility` | Deep accessibility/WCAG review |\n| `judge-api-design` | Deep API design review |\n| `judge-reliability` | Deep reliability \u0026 resilience review |\n| `judge-observability` | Deep observability \u0026 monitoring review |\n| `judge-performance` | Deep performance optimization review |\n| `judge-compliance` | Deep regulatory compliance review |\n| `judge-data-sovereignty` | Deep data, technological \u0026 operational sovereignty review |\n| `judge-testing` | Deep testing quality review |\n| `judge-documentation` | Deep documentation quality review |\n| `judge-internationalization` | Deep i18n review |\n| `judge-dependency-health` | Deep dependency health review |\n| `judge-concurrency` | Deep concurrency \u0026 async safety review |\n| `judge-ethics-bias` | Deep ethics \u0026 bias review |\n| `judge-maintainability` | Deep maintainability \u0026 tech debt review |\n| `judge-error-handling` | Deep error handling review |\n| `judge-authentication` | Deep authentication \u0026 authorization review |\n| `judge-database` | Deep database design \u0026 query review |\n| `judge-caching` | Deep caching strategy review |\n| `judge-configuration-management` | Deep configuration \u0026 secrets review |\n| `judge-backwards-compatibility` | Deep backwards compatibility review |\n| `judge-portability` | Deep platform portability review |\n| `judge-ux` | Deep user experience review |\n| `judge-logging-privacy` | Deep logging privacy review |\n| `judge-rate-limiting` | Deep rate limiting review |\n| `judge-ci-cd` | Deep CI/CD pipeline review |\n| `judge-code-structure` | Deep AST-based structural analysis review |\n| `judge-agent-instructions` | Deep review of agent instruction markdown quality and safety |\n| `judge-ai-code-safety` | Deep review of AI-generated code risks: prompt injection, insecure LLM output handling, debug defaults, missing validation |\n| `judge-framework-safety` | Deep review of framework-specific safety: React hooks, Express middleware, Next.js SSR/SSG, Angular/Vue, Django, Spring Boot, ASP.NET Core, Flask, FastAPI, Go frameworks |\n| `judge-iac-security` | Deep review of infrastructure-as-code security: Terraform, Bicep, ARM template misconfigurations |\n| `judge-security` | Deep holistic security posture review: insecure data flows, weak cryptography, unsafe deserialization |\n| `judge-hallucination-detection` | Deep review of AI-hallucinated APIs, fabricated imports, non-existent modules |\n| `judge-intent-alignment` | Deep review of code–comment alignment, stub detection, placeholder functions |\n| `judge-api-contract` | Deep review of API contract conformance, input validation, REST best practices |\n| `judge-multi-turn-coherence` | Deep review of code coherence: self-contradictions, duplicate definitions, dead code |\n| `judge-model-fingerprint` | Deep review of AI code provenance and model attribution fingerprints |\n| `judge-over-engineering` | Deep review of unnecessary abstractions, wrapper-mania, premature generalization |\n| `judge-logic-review` | Deep review of logic correctness, semantic mismatches, and dead code in AI-generated code |\n| `judge-false-positive-review` | Meta-judge review of pattern-based findings for false positive detection and accuracy |\n\u003c!-- PROMPTS_TABLE_END --\u003e\n\n---\n\n## Configuration\n\nCreate a `.judgesrc.json` (or `.judgesrc`) file in your project root to customize evaluation behavior. See [`.judgesrc.example.json`](.judgesrc.example.json) for a copy-paste-ready template, or reference the [JSON Schema](judgesrc.schema.json) for full IDE autocompletion.\n\n```json\n{\n  \"$schema\": \"https://github.com/KevinRabun/judges/blob/main/judgesrc.schema.json\",\n  \"preset\": \"strict\",\n  \"minSeverity\": \"medium\",\n  \"disabledRules\": [\"COST-*\", \"I18N-001\"],\n  \"disabledJudges\": [\"accessibility\", \"ethics-bias\"],\n  \"ruleOverrides\": {\n    \"SEC-003\": { \"severity\": \"critical\" },\n    \"DOC-*\": { \"disabled\": true }\n  },\n  \"languages\": [\"typescript\", \"python\"],\n  \"format\": \"text\",\n  \"failOnFindings\": false,\n  \"baseline\": \"\"\n}\n```\n\n| Field | Type | Default | Description |\n|-------|------|---------|-------------|\n| `$schema` | `string` | — | JSON Schema URL for IDE validation |\n| `preset` | `string` | — | Named preset (see [Named Presets](#named-presets) for all 22 options) |\n| `minSeverity` | `string` | `\"info\"` | Minimum severity to report: `critical` · `high` · `medium` · `low` · `info` |\n| `disabledRules` | `string[]` | `[]` | Rule IDs or prefix wildcards to suppress (e.g. `\"COST-*\"`, `\"SEC-003\"`) |\n| `disabledJudges` | `string[]` | `[]` | Judge IDs to skip entirely (e.g. `\"cost-effectiveness\"`) |\n| `ruleOverrides` | `object` | `{}` | Per-rule overrides keyed by rule ID or wildcard — `{ disabled?: boolean, severity?: string }` |\n| `languages` | `string[]` | `[]` | Restrict analysis to specific languages (empty = all) |\n| `format` | `string` | `\"text\"` | Default output format: `text` · `json` · `sarif` · `markdown` · `html` · `pdf` · `junit` · `codeclimate` · `github-actions` |\n| `failOnFindings` | `boolean` | `false` | Exit code 1 when verdict is `fail` — useful for CI gates |\n| `baseline` | `string` | `\"\"` | Path to a baseline JSON file — matching findings are suppressed |\n\nAll evaluation tools (CLI and MCP) accept the same configuration fields via `--config \u003cpath\u003e` or inline `config` parameter.\n\n---\n\n## Advanced Features\n\n### Inline Suppressions\n\nSuppress specific findings directly in source code using comment directives:\n\n```typescript\nconst x = eval(input); // judges-ignore SEC-001\n// judges-ignore-next-line CYBER-002\nconst y = dangerousOperation();\n// judges-file-ignore DOC-*    ← suppress globally for this file\n```\n\nSupported comment styles: `//`, `#`, `/* */`. Supports comma-separated rule IDs and wildcards (`*`, `SEC-*`).\n\n### Auto-Fix Patches\n\nCertain findings include machine-applicable patches in the `patch` field:\n\n| Pattern | Auto-Fix |\n|---------|----------|\n| `new Buffer(x)` | → `Buffer.from(x)` |\n| `http://` URLs (non-localhost) | → `https://` |\n| `Math.random()` | → `crypto.randomUUID()` |\n\nPatches include `oldText`, `newText`, `startLine`, and `endLine` for automated application.\n\n### Cross-Evaluator Deduplication\n\nWhen multiple judges flag the same issue (e.g., both Data Security and Cybersecurity detect SQL injection on line 15), findings are automatically deduplicated. The highest-severity finding wins, and the description is annotated with cross-references (e.g., *\"Also identified by: CYBER-003\"*).\n\n### Taint Flow Analysis\n\nThe engine performs inter-procedural taint tracking to trace data from user-controlled sources (e.g., `req.body`, `process.env`) through transformations to security-sensitive sinks (e.g., `eval()`, `exec()`, SQL queries). Taint flows are used to boost confidence on true-positive findings and suppress false positives where sanitization is detected.\n\n### Positive Signal Detection\n\nCode that demonstrates good practices receives score bonuses (capped at +15):\n\n| Signal | Bonus |\n|--------|-------|\n| Parameterized queries | +3 |\n| Security headers (helmet) | +3 |\n| Auth middleware (passport, etc.) | +3 |\n| Proper error handling | +2 |\n| Input validation libs (zod, joi, etc.) | +2 |\n| Rate limiting | +2 |\n| Structured logging (pino, winston) | +2 |\n| CORS configuration | +1 |\n| Strict mode / strictNullChecks | +1 |\n| Test patterns (describe/it/expect) | +1 |\n\n### Framework-Aware Rules\n\nJudges include framework-specific detection for Express, Django, Flask, FastAPI, Spring, ASP.NET, Rails, and more. Framework middleware (e.g., `helmet()`, `express-rate-limit`, `passport.authenticate()`) is recognized as mitigation, reducing false positives.\n\n### Cross-File Import Resolution\n\nIn project-level analysis, imports are resolved across files. If one file imports a security middleware module from another file in the project, findings about missing security controls are automatically adjusted with reduced confidence.\n\n---\n\n## Scoring\n\nEach judge scores the code from **0 to 100**:\n\n| Severity | Score Deduction |\n|----------|----------------|\n| Critical | −30 points |\n| High | −18 points |\n| Medium | −10 points |\n| Low | −5 points |\n| Info | −2 points |\n\n**Verdict logic:**\n- **FAIL** — Any critical finding, or score \u003c 60\n- **WARNING** — Any high finding, any medium finding, or score \u003c 80\n- **PASS** — Score ≥ 80 with no critical, high, or medium findings\n\nThe **overall tribunal score** is the average of all 45 judges. The overall verdict fails if **any** judge fails.\n\n---\n\n## Project Structure\n\n```\njudges/\n├── src/\n│   ├── index.ts              # MCP server entry point — tools, prompts, transport\n│   ├── api.ts                # Programmatic API entry point\n│   ├── cli.ts                # CLI argument parser and command router\n│   ├── types.ts              # TypeScript interfaces (Finding, JudgeEvaluation, etc.)\n│   ├── config.ts             # .judgesrc configuration parser and validation\n│   ├── errors.ts             # Custom error types (ConfigError, EvaluationError, ParseError)\n│   ├── language-patterns.ts  # Multi-language regex pattern constants and helpers\n│   ├── judge-registry.ts     # Unified JudgeRegistry — single source of truth for all judges\n│   ├── plugins.ts            # Plugin API façade (delegates to JudgeRegistry)\n│   ├── scoring.ts            # Confidence scoring and calibration\n│   ├── dedup.ts              # Finding deduplication engine\n│   ├── fingerprint.ts        # Finding fingerprint generation\n│   ├── comparison.ts         # Tool comparison benchmark data\n│   ├── cache.ts              # Evaluation result caching\n│   ├── calibration.ts        # Confidence calibration from feedback data\n│   ├── fix-history.ts        # Auto-fix application history tracking\n│   ├── ast/                  # AST analysis engine (built-in, no external deps)\n│   │   ├── index.ts          # analyzeStructure() — routes to correct parser\n│   │   ├── types.ts          # FunctionInfo, CodeStructure interfaces\n│   │   ├── tree-sitter-ast.ts    # Tree-sitter WASM parser (all 8 languages)\n│   │   ├── structural-parser.ts  # Fallback scope-tracking parser\n│   │   ├── cross-file-taint.ts   # Cross-file taint propagation analysis\n│   │   └── taint-tracker.ts      # Single-file taint flow tracking\n│   ├── evaluators/           # Analysis engine for each judge\n│   │   ├── index.ts          # evaluateWithJudge(), evaluateWithTribunal(), evaluateProject(), etc.\n│   │   ├── shared.ts         # Scoring, verdict logic, markdown formatters\n│   │   └── *.ts              # One analyzer per judge (45 files)\n│   ├── formatters/           # Output formatters\n│   │   ├── sarif.ts              # SARIF 2.1.0 output\n│   │   ├── html.ts               # Self-contained HTML report (dark/light theme, filters)\n│   │   ├── junit.ts              # JUnit XML output (Jenkins, Azure DevOps, GitHub Actions)\n│   │   ├── codeclimate.ts        # CodeClimate/GitLab Code Quality JSON\n│   │   ├── diagnostics.ts        # Diagnostics formatter\n│   │   └── badge.ts              # SVG and text badge generator\n│   ├── commands/             # CLI subcommands\n│   │   ├── init.ts               # Interactive project setup wizard\n│   │   ├── fix.ts                # Auto-fix patch preview and application\n│   │   ├── watch.ts              # Watch mode — re-evaluate on save\n│   │   ├── report.ts             # Project-level local report\n│   │   ├── hook.ts               # Pre-commit hook install/uninstall\n│   │   ├── ci-templates.ts       # GitLab, Azure, Bitbucket CI templates\n│   │   ├── diff.ts               # Evaluate unified diff (git diff)\n│   │   ├── deps.ts               # Dependency supply-chain analysis\n│   │   ├── baseline.ts           # Create baseline for finding suppression\n│   │   ├── completions.ts        # Shell completions (bash/zsh/fish/PowerShell)\n│   │   ├── docs.ts               # Per-judge rule documentation generator\n│   │   ├── feedback.ts           # False-positive tracking \u0026 finding feedback\n│   │   ├── benchmark.ts          # Detection accuracy benchmark suite\n│   │   ├── rule.ts               # Custom rule authoring wizard\n│   │   ├── language-packs.ts     # Language-specific rule pack presets\n│   │   └── config-share.ts       # Shareable team/org configuration\n│   ├── presets.ts            # Named evaluation presets (strict, lenient, security-only, …)\n│   ├── patches/\n│   │   └── index.ts              # 201 deterministic auto-fix patch rules\n│   ├── tools/                # MCP tool registrations\n│   │   ├── register.ts           # Tool registration orchestrator\n│   │   ├── register-evaluation.ts    # Evaluation tools (evaluate_code, etc.)\n│   │   ├── register-workflow.ts      # Workflow tools (app builder, reports, etc.)\n│   │   ├── prompts.ts            # MCP prompt registrations (per-judge prompts)\n│   │   └── schemas.ts            # Zod schemas for tool parameters\n│   ├── reports/\n│   │   └── public-repo-report.ts   # Public repo clone + full tribunal report generation\n│   └── judges/               # Judge definitions (id, name, domain, system prompt)\n│       ├── index.ts          # Side-effect imports + re-exports (JUDGES, getJudge, getJudgeSummaries)\n│       └── *.ts              # One self-registering definition per judge (45 files)\n├── scripts/\n│   ├── generate-public-repo-report.ts  # Run: npm run report:public-repo -- --repoUrl \u003curl\u003e\n│   ├── daily-popular-repo-autofix.ts   # Run: npm run automation:daily-popular\n│   └── debug-fp.ts                     # Debug false-positive findings\n├── examples/\n│   ├── sample-vulnerable-api.ts  # Intentionally flawed code (triggers all judges)\n│   ├── demo.ts                   # Run: npm run demo\n│   └── quickstart.ts             # Quick-start evaluation example\n├── tests/\n│   ├── judges.test.ts            # Core judge evaluation tests\n│   ├── negative.test.ts          # Negative / FP-avoidance tests\n│   ├── subsystems.test.ts        # Subsystem integration tests\n│   ├── extension-logic.test.ts   # VS Code extension logic tests\n│   └── tool-routing.test.ts      # MCP tool routing tests\n├── grammars/                 # Tree-sitter WASM grammar files\n│   ├── tree-sitter-typescript.wasm\n│   ├── tree-sitter-cpp.wasm\n│   ├── tree-sitter-python.wasm\n│   ├── tree-sitter-go.wasm\n│   ├── tree-sitter-rust.wasm\n│   ├── tree-sitter-java.wasm\n│   └── tree-sitter-c_sharp.wasm\n├── judgesrc.schema.json      # JSON Schema for .judgesrc config files\n├── server.json               # MCP Registry manifest\n├── package.json\n├── tsconfig.json\n└── README.md\n```\n\n---\n\n## Scripts\n\n| Command | Description |\n|---------|-------------|\n| `npm run build` | Compile TypeScript to `dist/` |\n| `npm run dev` | Watch mode — recompile on save |\n| `npm test` | Run the full test suite |\n| `npm run demo` | Run the sample tribunal demo |\n| `npm run report:public-repo -- --repoUrl \u003curl\u003e` | Generate a full tribunal report for a public repository URL |\n| `npm run report:quickstart -- --repoUrl \u003curl\u003e` | Run opinionated high-signal report defaults for fast adoption |\n| `npm run automation:daily-popular` | Analyze up to 10 rotating popular repos/day and open up to 5 remediation PRs per repo |\n| `npm start` | Start the MCP server |\n| `npm run clean` | Remove `dist/` |\n| `judges init` | Interactive project setup wizard |\n| `judges fix \u003cfile\u003e` | Preview auto-fix patches (add `--apply` to write) |\n| `judges watch \u003cdir\u003e` | Watch mode — re-evaluate on file save |\n| `judges report \u003cdir\u003e` | Full tribunal report on a local directory |\n| `judges hook install` | Install a Git pre-commit hook |\n| `judges diff` | Evaluate changed lines from unified diff |\n| `judges deps` | Analyze dependencies for supply-chain risks |\n| `judges baseline create` | Create baseline for finding suppression |\n| `judges ci-templates` | Generate CI pipeline templates |\n| `judges docs` | Generate per-judge rule documentation |\n| `judges completions \u003cshell\u003e` | Shell completion scripts |\n| `judges feedback submit` | Mark findings as true positive, false positive, or won't fix |\n| `judges feedback stats` | Show false-positive rate statistics |\n| `judges benchmark run` | Run detection accuracy benchmark suite |\n| `judges rule create` | Interactive custom rule creation wizard |\n| `judges rule list` | List custom evaluation rules |\n| `judges pack list` | List available language packs |\n| `judges config export` | Export config as shareable package |\n| `judges config import \u003csrc\u003e` | Import a shared configuration |\n| `judges compare` | Compare judges against other code review tools |\n| `judges list` | List all 45 judges with domains and descriptions |\n\n---\n\n## Daily Popular Repo Automation\n\nThis repo includes a scheduled workflow at `.github/workflows/daily-popular-repo-autofix.yml` that:\n- selects up to 10 repositories per day from a default pool of 100+ popular repos (or a manually supplied target),\n- runs the full Judges evaluation across supported source languages,\n- applies only conservative, single-line remediations that reduce matching finding counts,\n- opens up to 5 PRs per repository with attribution to both Judges and the target repository,\n- skips repositories unless they are public and PR creation is possible with existing GitHub auth (no additional auth flow).\n- enforces hard runtime caps of 10 repositories/day and 5 PRs/repository.\n\nEach run writes `daily-autofix-summary.json` (or `SUMMARY_PATH`) with per-repository telemetry, including:\n- `runAggregate` — compact run-level totals and cross-repo top prioritized rules,\n- `runAggregate.totalCandidatesDiscovered` and `runAggregate.totalCandidatesAfterLocationDedupe` — signal how much overlap was removed before attempting fixes,\n- `runAggregate.totalCandidatesAfterPriorityThreshold` — candidates that remain after applying minimum priority score,\n- `runAggregate.dedupeReductionPercent` — percent reduction from location dedupe for quick runtime-efficiency tracking,\n- `runAggregate.priorityThresholdReductionPercent` — percent reduction from minimum-priority filtering after dedupe,\n- `priorityRulePrefixesUsed` — dangerous rule prefixes used during prioritization,\n- `minPriorityScoreUsed` — minimum `candidatePriorityScore` applied for candidate inclusion,\n- `candidatesDiscovered`, `candidatesAfterLocationDedupe`, and `candidatesAfterPriorityThreshold` — per-repo candidate counts after each filter stage,\n- `topPrioritizedRuleCounts` — most common rule IDs among ranked candidates,\n- `topPrioritizedCandidates` — top ranked candidate samples (rule, severity, confidence, file, line, priority score).\n\nOptional runtime control:\n- `AUTOFIX_MIN_PRIORITY_SCORE` — minimum candidate priority score required after dedupe (default: `0`, disabled).\n\nRequired secret:\n- `JUDGES_AUTOFIX_GH_TOKEN` — GitHub token with permission to fork/push/create PRs for target repositories.\n\nManual run:\n```bash\ngh workflow run \"Judges Daily Full-Run Autofix PRs\" -f targetRepoUrl=https://github.com/owner/repo\n```\n\n---\n\n## Programmatic API\n\nJudges can be consumed as a library (not just via MCP). Import from `@kevinrabun/judges/api`:\n\n```typescript\nimport {\n  evaluateCode,\n  evaluateProject,\n  evaluateCodeSingleJudge,\n  getJudge,\n  JUDGES,\n  findingsToSarif,\n} from \"@kevinrabun/judges/api\";\n\n// Full tribunal evaluation\nconst verdict = evaluateCode(\"const x = eval(input);\", \"typescript\");\nconsole.log(verdict.overallScore, verdict.overallVerdict);\n\n// Single judge\nconst result = evaluateCodeSingleJudge(\"cybersecurity\", code, \"typescript\");\n\n// SARIF output for CI integration\nconst sarif = findingsToSarif(verdict.evaluations.flatMap(e =\u003e e.findings));\n```\n\n### Package Exports\n\n| Entry Point | Description |\n|---|---|\n| `@kevinrabun/judges/api` | Programmatic API (default) |\n| `@kevinrabun/judges/server` | MCP server entry point |\n| `@kevinrabun/judges/sarif` | SARIF 2.1.0 formatter |\n| `@kevinrabun/judges/junit` | JUnit XML formatter |\n| `@kevinrabun/judges/codeclimate` | CodeClimate/GitLab Code Quality JSON |\n| `@kevinrabun/judges/badge` | SVG and text badge generator |\n| `@kevinrabun/judges/diagnostics` | Diagnostics formatter |\n| `@kevinrabun/judges/plugins` | Plugin system API (see [Plugin Guide](docs/plugin-guide.md)) |\n| `@kevinrabun/judges/fingerprint` | Finding fingerprint utilities |\n| `@kevinrabun/judges/comparison` | Tool comparison benchmarks |\n\n### SARIF Output\n\nConvert findings to [SARIF 2.1.0](https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html) for GitHub Code Scanning, Azure DevOps, and other CI/CD tools:\n\n```typescript\nimport { findingsToSarif, evaluationToSarif, verdictToSarif } from \"@kevinrabun/judges/sarif\";\n\nconst sarif = verdictToSarif(verdict, \"src/app.ts\");\nfs.writeFileSync(\"results.sarif\", JSON.stringify(sarif, null, 2));\n```\n\n---\n\n## Custom Error Types\n\nAll thrown errors extend `JudgesError` with a machine-readable `code` property:\n\n| Error Class | Code | When |\n|---|---|---|\n| `ConfigError` | `JUDGES_CONFIG_INVALID` | Malformed `.judgesrc` or invalid inline config |\n| `EvaluationError` | `JUDGES_EVALUATION_FAILED` | Unknown judge, analyzer crash |\n| `ParseError` | `JUDGES_PARSE_FAILED` | Unparseable source code or input data |\n\n```typescript\nimport { ConfigError, EvaluationError } from \"@kevinrabun/judges/api\";\ntry {\n  evaluateCode(code, \"typescript\");\n} catch (e) {\n  if (e instanceof ConfigError) console.error(\"Config issue:\", e.code);\n}\n```\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkevinrabun%2Fjudges","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkevinrabun%2Fjudges","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkevinrabun%2Fjudges/lists"}