{"id":50879453,"url":"https://github.com/cdeust/prd-spec-generator","last_synced_at":"2026-06-15T12:30:55.665Z","repository":{"id":354366170,"uuid":"1208915722","full_name":"cdeust/prd-spec-generator","owner":"cdeust","description":"Stateless reducer that turns a feature description into a 9-file PRD. 17 MCP tools · multi-judge verification with weighted-average + Bayesian consensus · deterministic Hard Output Rules · research-evidence-backed strategy selection · 248 tests · part of the ai-architect ecosystem.","archived":false,"fork":false,"pushed_at":"2026-06-02T20:39:47.000Z","size":2040,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-02T21:26:24.382Z","etag":null,"topics":["ai-architect","anthropic","claude-code","consensus","llm-tools","mcp","mcp-server","multi-judge","prd","product-requirements","stateless-reducer","typescript","validation","verification","zetetic"],"latest_commit_sha":null,"homepage":"https://github.com/cdeust/prd-spec-generator","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cdeust.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-12T22:52:48.000Z","updated_at":"2026-06-02T20:39:44.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/cdeust/prd-spec-generator","commit_stats":null,"previous_names":["cdeust/prd-spec-generator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cdeust/prd-spec-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeust%2Fprd-spec-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeust%2Fprd-spec-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeust%2Fprd-spec-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeust%2Fprd-spec-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cdeust","download_url":"https://codeload.github.com/cdeust/prd-spec-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cdeust%2Fprd-spec-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34363537,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-architect","anthropic","claude-code","consensus","llm-tools","mcp","mcp-server","multi-judge","prd","product-requirements","stateless-reducer","typescript","validation","verification","zetetic"],"created_at":"2026-06-15T12:30:55.584Z","updated_at":"2026-06-15T12:30:55.650Z","avatar_url":"https://github.com/cdeust.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/banner.svg\" alt=\"prd-spec-generator — a stateless reducer that turns a feature description into a PRD the rest of your pipeline can act on\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg\" alt=\"MIT License\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/TypeScript-5.9+-3178c6.svg\" alt=\"TypeScript 5.9+\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Node-20.x_·_22.x-339933.svg\" alt=\"Node 20/22\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Tests-583_passing-brightgreen\" alt=\"583 tests\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Packages-10-orange\" alt=\"10 packages\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/MCP_Tools-17-8A2BE2\" alt=\"17 MCP tools\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Validators-Hard_Output_Rules-red\" alt=\"Hard Output Rules\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Phase_4-Closed_Loop_Calibration-success\" alt=\"Phase 4 closed-loop calibration shipped\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#what-an-agent-can-ask-it\"\u003eWhat An Agent Asks\u003c/a\u003e · \u003ca href=\"#getting-started\"\u003eGetting Started\u003c/a\u003e · \u003ca href=\"#the-pipeline\"\u003ePipeline\u003c/a\u003e · \u003ca href=\"#the-mcp-tools\"\u003eTools\u003c/a\u003e · \u003ca href=\"#multi-judge-verification\"\u003eVerification\u003c/a\u003e · \u003ca href=\"#calibration--falsification\"\u003eCalibration\u003c/a\u003e · \u003ca href=\"#architecture\"\u003eArchitecture\u003c/a\u003e · \u003ca href=\"#the-zetetic-standard\"\u003eZetetic Standard\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eCompanion projects:\u003c/strong\u003e\u003cbr\u003e\n  \u003ca href=\"https://github.com/cdeust/Cortex\"\u003eCortex\u003c/a\u003e — persistent memory that injects past decisions into every PRD\u003cbr\u003e\n  \u003ca href=\"https://github.com/cdeust/zetetic-team-subagents\"\u003ezetetic-team-subagents\u003c/a\u003e — 97 genius reasoning patterns that judge each claim\u003cbr\u003e\n  \u003ca href=\"https://github.com/cdeust/automatised-pipeline\"\u003eautomatised-pipeline\u003c/a\u003e — the codebase intelligence layer this generator consumes upstream\n\u003c/p\u003e\n\n---\n\nEvery AI agent that drafts a PRD eventually invents a function that doesn't exist, claims latency it can't measure, or writes acceptance criteria that don't tie back to the requirements they're supposed to test. The output sounds confident. It is not actionable. The next stage in the pipeline — code generation, ticket import, sprint planning — silently inherits the hallucination, ships it, and pays for it later.\n\n**prd-spec-generator** is a TypeScript MCP server that fixes this at the structural level. The pipeline is a stateless reducer (`step(state, result?) → next_state, action`) driven by a host (Claude Code or any MCP-speaking agent). Sections are produced one at a time, validated by deterministic Hard Output Rules before the host ever sees them, and every load-bearing claim is judged by a panel of genius reasoning agents drawn from `zetetic-team-subagents` against the codebase graph from `automatised-pipeline`. **Phase 4** then closes the loop: per-judge reliability is calibrated from history, retry budgets are derived from survival statistics, KPI gates are tuned against frozen baselines, and held-out partitions are mechanically sealed so no calibration result can be peeked at before evaluation.\n\n**10 packages. 17 MCP tools. 10 pipeline steps. Multi-judge verification with consensus. Closed-loop calibration with externally-grounded falsifiers. 583 tests. Every numeric constant traces to a citation, a benchmark, or a `// source: provisional heuristic` admission.**\n\n---\n\n## Phase 4 — closed-loop reliability calibration (shipped)\n\nThe verification subsystem is no longer a one-shot pass/fail report. Every claim resolution can flush an observation back to a calibration repository, every consensus run can pull calibrated posteriors from history, and every closed loop runs an external control arm so the calibration's effect is *measured*, not assumed.\n\n- **Per-judge Bayesian reliability calibration** — Beta(7,3) prior with sensitivity / specificity split per `claim_type`. Posteriors stored in a SQLite-backed `ReliabilityRepository`; observations flushed on every claim resolution.\n- **MAX_ATTEMPTS retry calibration** — Kaplan-Meier survival math (`kmEstimate` / `kmMedianAttempts` / `logRankTest` with Greenwood + Brookmeyer-Crowley CIs); Schoenfeld sample-size derivation event-rate-corrected to ~519 (was 823) against the measured `event_rate=0.4762`, CP CI `[0.4456, 0.5069]`.\n- **KPI gate tuning** — Clopper-Pearson exact CIs; per-machine-class wall_time normalization with 5-bucket `detectMachineClass`; frozen-baseline content-hash assertion; `loadCalibratedGates` + `hold_provisional` ratchet protection.\n- **Plan-mismatch fire-rate** — measured via XmR control charts (Wheeler 1995, Western Electric 1956) with a synthetic injection round-trip pre-flight that catches drift between the diagnostic prefix and the regex matcher.\n- **Externally-grounded held-out subsets** — Ajv schema oracle, mathjs oracle, `tsc` subprocess code oracle, `validateSection` spec oracle. `OracleUnavailableError` typed throw replaces stub-mode fabrication. This is the layer that breaks annotator-circularity — judges and oracles share no inference path.\n- **CC-3 forced-exploration control arms** — every closed loop carves out a 20% partition that reverts to the prior. Without it, calibration-on-calibration looks like progress whether or not it actually is.\n- **Cross-arm comparison metrics** — `computeAblationComparison` / `computeReliabilityComparison` / `computeKpiGateComparison` produce paired-bootstrap CIs (Efron \u0026 Tibshirani 1993 §16.4; deterministic mulberry32 RNG; 12-decimal reproducibility pin). Outcome is a falsifiable recommendation: `calibrated_helps`, `prior_helps`, or `inconclusive_underpowered`.\n- **Mechanically-enforced held-out partition seals** — three sealed lock files (`maxattempts-heldout.lock.json`, `kpigates-heldout.lock.json`, `heldout-partition.lock.json`) commit a sha256 of the partition before evaluation. The `SEAL_VERIFIED` typeof sentinel is the only way to compute cross-arm metrics on a sealed partition; passing anything else is a type error at the boundary.\n- **Production-mode dispatcher** — `makeProductionDispatcher` + `AgentInvoker` interface. The CLI `--mode production|canned` flag selects whether calibration sees real verdicts or canned ones; the canned arm is preserved for offline reproducibility.\n\n---\n\n## What an agent can ask it\n\n```\nstart_pipeline(feature_description, codebase_path?)\n  → returns the first NextAction; the host executes it and feeds the result\n    back via submit_action_result. Nine steps later: 9 PRD files written.\n\nsubmit_action_result(run_id, result)\n  → drives the reducer one more step. The host sees only SUBSTANTIVE actions\n    (ask_user, call_pipeline_tool, call_cortex_tool, spawn_subagents,\n     write_file, done, failed). emit_message is coalesced into the\n     messages array; the host never has to \"advance past\" a banner.\n\nvalidate_prd_section(content, section_type)\n  → deterministic Hard Output Rules — zero LLM calls, pure regex/parsing.\n  → returns: violations[], hasCriticalViolations, totalScore.\n\nvalidate_prd_document(sections[])\n  → cross-section checks: SP arithmetic, AC numbering, FR-AC coverage,\n    test traceability. Catches what per-section validation misses.\n\ncoordinate_context_budget(prd_context, completed_sections[])\n  → per-section retrieval/generation token budgets so Cortex recall and\n    section drafting don't fight over the same context window.\n\nmap_failure_to_retrieval(violations[])\n  → closes the validation→retrieval feedback loop. When a section fails\n    validation, this returns the corrective Cortex query that would\n    have prevented the failure.\n```\n\n---\n\n## Getting started\n\n### Install (marketplace — recommended)\n\n```bash\nclaude plugin marketplace add cdeust/prd-spec-generator\nclaude plugin install prd-spec-generator\n```\n\nRestart your Claude Code session. The 17 MCP tools register on first\nstdio handshake. Then:\n\n```\n/generate-prd build OAuth login for the admin console\n```\n\nThe plugin's bundled MCP server at `mcp-server/index.js` is self-contained\n(only `better-sqlite3` is an optional native dependency for the evidence\nrepository — gracefully degrades to in-memory mode when absent).\n\n### Companion ecosystem\n\nFor full effect, install the three companion plugins so the pipeline can\nconsume codebase intelligence, persistent memory, and the genius-agent\npanel:\n\n```bash\nclaude plugin marketplace add cdeust/automatised-pipeline    # codebase graph intel\nclaude plugin marketplace add cdeust/Cortex                  # persistent memory\nclaude plugin marketplace add cdeust/zetetic-team-subagents  # the genius + team agents\n\nclaude plugin install automatised-pipeline\nclaude plugin install cortex\nclaude plugin install zetetic-team-subagents\n```\n\nEach plugin is independently useful; together they are the ai-architect\necosystem. See [Companion ecosystem](#companion-ecosystem) above.\n\n### Building from source\n\nFor development or to run the audit cycle locally:\n\n```bash\ngit clone https://github.com/cdeust/prd-spec-generator.git\ncd prd-spec-generator\npnpm install --frozen-lockfile\npnpm build      # builds all 9 buildable packages via tsc\npnpm bundle     # produces the standalone mcp-server/index.js\npnpm test       # 583 tests + 2 integration skipped (live MCP integration\n                # env-gated by AIPRD_PIPELINE_BIN)\n```\n\n`pnpm verify` runs all of the above (install + build + bundle + test) —\nsame as CI.\n\n**Prerequisites for source builds:** Node.js 20.x or 22.x, pnpm v10+\n(`corepack enable \u0026\u0026 corepack prepare pnpm@10`).\n\n### Smoke-test offline\n\n```bash\n# Reducer end-to-end without a real host (uses the canned dispatcher):\npnpm test --filter @prd-gen/orchestration smoke\n\n# Benchmark KPI run:\npnpm test --filter @prd-gen/benchmark pipeline-kpis\n```\n\nBoth run in \u003c2s on an M-series Mac. No LLM calls, no MCP traffic — the\nreducer is fully driven by canned ActionResults so you can audit behaviour\noffline.\n\n---\n\n## The pipeline\n\nThe reducer produces nine sequential steps. Each step emits at most one substantive action; the host executes it and feeds the result back. A typical trial-tier feature run (11 sections) takes ~62 host-visible iterations.\n\n| # | Step | What it produces |\n|---|------|------------------|\n| **1** | `banner` | Welcome banner with run ID + feature description + capability summary |\n| **2** | `context_detection` | Detects PRD type from trigger words; asks user when ambiguous |\n| **3** | `input_analysis` | Calls `index_codebase` (automatised-pipeline) when a path is provided; sets `codebase_graph_path` |\n| **4** | `feasibility_gate` | Detects epic-scope inputs (≥2 EPIC_SIGNALS); asks user to focus |\n| **5** | `clarification` | Compose-then-answer rounds (4–10 depending on tier); short-circuits on \"proceed\" |\n| **6** | `budget` | Per-section retrieval/generation token allocation via Cortex paper's 60/30/10 split |\n| **7** | `section_generation` | One section at a time: Cortex recall → engineer draft → validate → (retry up to 3) |\n| **8** | `jira_generation` | Synthesises JIRA tickets from requirements + user_stories + acceptance_criteria |\n| **9** | `file_export` | Writes 9 files (6 core + 3 companion) per SKILL.md Phase 4 |\n| **10** | `self_check` | Two-phase multi-judge verification (see below); typed `verification` field on `done` |\n\nEvery step is independently testable (`stepOnce(state, result?)` returns the same shape as the runner). The runner coalesces `emit_message` actions internally so the host never sees a no-op.\n\n---\n\n## The MCP tools\n\nThree surfaces. The reducer drives the full pipeline; the validation + verification\n+ budget tools can be consumed directly by other systems without entering the\npipeline; the diagnostics surface exposes config + health + history.\n\n```\nReducer (3):\n  start_pipeline             Initialize a run; returns first NextAction\n  submit_action_result       Drive the reducer one step; returns next NextAction\n  get_pipeline_state         Read-only state snapshot for diagnostics\n\nValidation (2):\n  validate_prd_section       Hard Output Rules — single section\n  validate_prd_document      Cross-section checks (SP/AC/FR/test traceability)\n\nVerification (3):\n  plan_section_verification  Extract claims + select judge panels\n  plan_document_verification Same, document-wide\n  conclude_verification      Aggregate JudgeVerdict[] → VerificationReport;\n                             accepts optional `claims` array carrying\n                             `external_grounding` so oracle-resolved ground\n                             truth can replace LLM-only consensus where\n                             schema/math/code/spec oracles are available\n\nBudget + feedback (2):\n  coordinate_context_budget  Per-section token allocation\n  map_failure_to_retrieval   Validation failure → corrective Cortex query\n\nDiagnostics (7):\n  get_config, read_skill_config, check_health, get_prd_context_info,\n  list_available_strategies, get_quality_history, get_strategy_effectiveness\n```\n\nEach tool takes structured Zod-validated arguments and returns a typed response. No tool calls an LLM — section drafts and judge verdicts come back via the host's `spawn_subagents` action so the same pipeline runs against any agent runtime.\n\n---\n\n## Multi-judge verification\n\nThe `self_check` step is a two-phase contract. Phase A plans the verification batch and persists a snapshot of `(claim_ids, judges)` to state. Phase B receives the verdicts, parses them against the snapshot, and aggregates via the consensus engine.\n\n```\nplan_document_verification(sections[])\n  → extracts atomic Claims (FR-001, AC-005, NFR-LATENCY-1, ...)\n  → selects a panel per claim type:\n      architecture        → liskov + alexander + dijkstra + architect\n      performance         → fermi + carnot + curie + erlang\n      security            → wu + ibnalhaytham + security-auditor\n      data_model          → mendeleev + dba + lavoisier\n      acceptance_criteria → toulmin + popper + test-engineer\n      ...\n\n[host spawns the panel; each agent returns a JSON verdict]\n\nconclude_verification(verdicts[])\n  → Per claim, runs consensus():\n      strategy: weighted_average (default) | bayesian\n      fail_threshold: 0.5  (≥50% confidence-weighted FAIL → forces FAIL)\n      precautionary tie-breaker: more-severe verdict wins\n  → distribution_suspicious flag fires when 100% PASS over ≥5 claims\n  → returns ConsensusVerdict[] with full distribution + dissenting list\n```\n\nThe verdict taxonomy is deliberately five-level — not binary. NFR claims (latency, fps, throughput, storage) **MUST NOT receive PASS**: they are SPEC-COMPLETE if a measurement method is specified, NEEDS-RUNTIME otherwise. Judges that default to PASS for everything are caught by the `distribution_suspicious` detector and flagged in the typed `done.verification` field.\n\n---\n\n## Calibration \u0026 falsification\n\nThe verification subsystem is itself a hypothesis: that *consensus weighted by historically-calibrated reliability* outperforms *consensus weighted by a uniform prior*. Phase 4 is the closed loop that tests it.\n\n1. **Observe.** Each verification run can flush per-judge observations (claim_id, claim_type, judge_id, verdict, oracle_truth?) to a SQLite reliability repository. Observations carry an `external_grounding` field that propagates from `Claim` through the orchestrator to the oracle resolution path; when an external oracle (Ajv schema, mathjs, `tsc`, `validateSection`) can resolve the claim, its truth replaces LLM-only consensus.\n2. **Calibrate.** On subsequent runs, calibrated posteriors weight consensus per judge, per `claim_type`. A 20% control-arm partition is forced-explored using the prior (`getReliabilityForRun` / `getRetryArmForRun` decide which arm a given run lands in deterministically from `run_id`). Without the control arm, calibration-on-calibration looks like progress whether or not it actually is.\n3. **Compare.** Cross-arm metrics (`computeAblationComparison`, `computeReliabilityComparison`, `computeKpiGateComparison`) run paired-bootstrap CIs (Efron \u0026 Tibshirani 1993 §16.4; deterministic mulberry32 RNG; 12-decimal reproducibility pin) and emit one of three falsifiable recommendations: `calibrated_helps`, `prior_helps`, or `inconclusive_underpowered`.\n4. **Seal.** Held-out partitions are committed to lock files (`maxattempts-heldout.lock.json` for §4.2, `kpigates-heldout.lock.json` for §4.5, `heldout-partition.lock.json` for the §4.1 50-claim externally-grounded corpus) with a sha256 hash of the partition. The cross-arm metric functions accept a `SEAL_VERIFIED` typeof sentinel as a parameter; the only way to obtain that sentinel is to verify the seal first. Peeking at a held-out partition before evaluation is a type error.\n5. **Ground.** Where an external oracle can resolve a claim deterministically, it does. Where it cannot, `OracleUnavailableError` is thrown rather than fabricating a stub-mode truth. This is the line that breaks annotator-circularity: judges trained against (or biased toward) LLM-style reasoning cannot poison calibration that uses non-LLM truth.\n\nThe lock files, the seal-verification dance, and the control-arm partition together mean: when a Phase 4 cross-arm comparison says \"calibrated_helps with 95% CI excluding zero,\" the claim is *measured*, not vibes-checked. When it says \"inconclusive_underpowered,\" that is also a falsifiable claim — you need more data, not more confidence.\n\n---\n\n## Architecture\n\nTen workspace packages, each independently buildable, with strict Clean Architecture layering enforced by package boundaries.\n\n```\ncore              ← domain types, schemas, agent identities\n                    │  no I/O, no infrastructure dependency\n                    │  Zod-validated; the only place where verdict /\n                    │  section_type / capability shapes are defined\n                    ▼\nvalidation        ← Hard Output Rules (per-section + cross-section)\n                    │  pure functions; no I/O\n                    ▼\nstrategy          ← thinking-strategy selector (genius pattern routing)\n                    │\nmeta-prompting    ← prompt builders for clarification / draft / jira\n                    │  pure string composition\n                    ▼\nverification      ← claim extraction + judge selection +\n                    │  consensus engine (weighted_average + Bayesian)\n                    │  + buildJudgePrompt\n                    ▼\norchestration     ← stateless reducer, 9 step handlers, runner\n                    │  step(state, result?) → next_state, action\n                    │  emit_message coalescing; canned-dispatcher utility\n                    ▼\necosystem-adapters← StdioMcpClient, AutomatisedPipelineClient, CortexClient\n                    │  the only package allowed to do I/O\n                    ▼\nmcp-server        ← composition root; 17 tools registered;\n                    │  evidence repository (better-sqlite3, optional)\n                    ▼\nbenchmark         ← pipeline KPI measurements + golden-fixture HOR scoring\n                    │  + calibration/ subtree (Phase 4):\n                    │    · ReliabilityRepository (SQLite, observation flush)\n                    │    · Kaplan-Meier + log-rank + Schoenfeld N\n                    │    · Clopper-Pearson exact CI + XmR control charts\n                    │    · paired-bootstrap (Efron-Tibshirani 1993 §16.4)\n                    │    · external oracles (Ajv / mathjs / tsc / validate)\n                    │    · machine-class detector + frozen-baseline gates\n                    │    · sealed held-out lock files (sha256 + SEAL_VERIFIED)\n                    │    · production-mode dispatcher + AgentInvoker seam\n                    │  Audit lineage: JSONL + .xmr sidecars per run.\nskill             ← SKILL.md + slash-command definitions for Claude Code\n```\n\n### Dependency rule (absolute)\n\nEvery package's `package.json` is checked: `core` depends only on `zod`; `verification` depends only on `core`; `orchestration` depends on `core`/`validation`/`verification`/`meta-prompting` (NOT on `ecosystem-adapters`); `ecosystem-adapters` depends on `core` + `verification`; `mcp-server` is the only place where everything composes.\n\nThe Phase 3+4 cross-audit found and fixed two layer violations:\n- `orchestration` was importing `extractJsonObject` and `buildJudgePrompt` from `ecosystem-adapters` — pure utilities lived in the wrong package; moved to `core` and `verification` respectively.\n- Pure domain types (`Claim`, `JudgeVerdict`, `JudgeRequest`, `AgentIdentity`) lived in `ecosystem-adapters/contracts/subagent.ts`; moved to `core/domain/agent.ts`. The infrastructure package now re-exports them as a backward-compat shim.\n\n---\n\n## What this fixes that previous PRD generators don't\n\n| Failure mode | What we do |\n|---|---|\n| **Section drift between turns** | Single immutable `PipelineState` snapshot per step; reducer is pure; host can replay any step |\n| **Hallucinated symbols** | `validate_prd_section` runs Hard Output Rules; symbols cross-checked against `automatised-pipeline` graph if `codebase_path` is set |\n| **NFRs claiming PASS without measurement** | Verdict taxonomy refuses PASS for latency/throughput/fps/storage; consensus engine forwards SPEC-COMPLETE / NEEDS-RUNTIME |\n| **Confirmatory bias (every judge says PASS)** | `distribution_suspicious` flag fires at 100% PASS over ≥5 claims; surfaced in typed `done.verification.distribution_suspicious` |\n| **Acceptance criteria not traceable to requirements** | Cross-document validator checks FR-AC coverage and AC numbering gaps |\n| **Tests claiming \"comprehensive\" without listing what they cover** | Test-traceability rule: every section's claimed test must reference an FR or AC ID |\n| **Retries that use the same context as the failure** | `map_failure_to_retrieval` closes the validator→Cortex feedback loop; corrective queries before retry |\n| **Magic-number budgets (\"we'll use 4K tokens for retrieval\")** | `coordinate_context_budget` produces per-section allocations from the canonical SECTIONS_BY_CONTEXT plan |\n\n---\n\n## How it composes with the rest of the ecosystem\n\n```\n                            ┌────────────────────────────────────┐\n                            │         Claude Code (host)         │\n                            └────────────────────┬───────────────┘\n                                                 │ stdio MCP\n              ┌──────────────────────────────────┼──────────────────────────────────┐\n              ▼                                  ▼                                  ▼\n   ┌───────────────────┐              ┌────────────────────┐              ┌───────────────────┐\n   │   automatised-    │   graph_path │   prd-spec-        │  recall      │      Cortex       │\n   │   pipeline        │ ───────────► │   generator        │ ◄─────────── │   (memory engine) │\n   │   (Rust MCP)      │              │   (TS MCP)         │              │   (Python MCP)    │\n   │                   │   symbols    │                    │  excerpts    │                   │\n   │   read-only       │ ◄──────────► │   stateless        │ ───────────► │   thermodynamic   │\n   │   intelligence    │              │   reducer          │              │   memory          │\n   └───────────────────┘              └─────────┬──────────┘              └───────────────────┘\n                                                │\n                                                │ spawn_subagents\n                                                ▼\n                                  ┌─────────────────────────────┐\n                                  │   zetetic-team-subagents    │\n                                  │   97 genius + 19 team       │\n                                  │   Each judge cites its      │\n                                  │   primary paper.            │\n                                  └─────────────────────────────┘\n```\n\nEach project owns one concern. `automatised-pipeline` knows what's true about the code. Cortex knows what we already decided. zetetic-team-subagents knows how to reason about a specific shape of claim. **prd-spec-generator** is the deterministic glue that turns those three signals into a PRD an agent can act on.\n\n---\n\n## The Zetetic Standard\n\nEvery load-bearing constant in this codebase carries a `// source:` annotation. Three forms are accepted:\n\n```typescript\n// source: \u003ccitation\u003e          // a paper, a spec, a referenced design doc\n// source: benchmark \u003cpath\u003e    // a committed benchmark whose output produced this value\n// source: provisional heuristic — \u003ccalibration plan\u003e\n                               // honest admission; tells the next reader\n                               // (a) why the value is what it is today and\n                               // (b) what evidence would change it\n```\n\nThe cross-audit found and tagged every previously bare constant. Examples:\n\n```typescript\n// pipeline-kpis.ts\nconst KPI_GATES = {\n  /** source: provisional heuristic. Smoke baseline = 62 iterations on\n   *  trial+codebase; cap is 100 (~60% headroom). dijkstra cross-audit\n   *  derived a structural max of 9 emit_message hops; the substantive-\n   *  action count builds on that. Phase 4.5 will replace with measured\n   *  P95 + 1σ. */\n  iteration_count_max: 100,\n  ...\n};\n\n// verification/consensus.ts\n/** source: provisional heuristic — Beta(7,3) (mean 0.7, ESS=10,\n *  moderately informative toward reliability). Phase 4.1 will replace\n *  with per-agent Beta(α+correct, β+incorrect) calibrated from history. */\nconst DEFAULT_RELIABILITY_PRIOR_MEAN = 0.7;\n```\n\nThe four pillars (consistent / true / useful / necessary) and the seven rules of zetetic inquiry are inherited from the [zetetic-team-subagents standard](https://github.com/cdeust/zetetic-team-subagents#the-zetetic-standard). Provisional values are not silently propagated as truth.\n\n---\n\n## What this system does not do\n\nThe same standard applied to itself.\n\n1. **It does not write code.** This generator produces a PRD. The downstream coding agent (separate system) reads the PRD, the graph, and Cortex memory; it writes the implementation. Symbols in the PRD are validated against the graph but never edited by us.\n2. **It does not validate prose quality.** Hard Output Rules check structural invariants (FR numbering, AC traceability, NFR shape, cross-references). They do not check whether a sentence is well-written or persuasive. That is what the multi-judge phase is for, and even there the judges return verdicts on *claims* — atomic assertions — not on style.\n3. **The judge phase is end-to-end testable but the judges are not deterministic.** In tests we use a canned dispatcher that returns 100% PASS by construction; the `distribution_suspicious` detector exists precisely because real judge panels can also degenerate into confirmatory consensus, and we do not pretend otherwise.\n4. **The KPI gates were provisional; Phase 4.5 has shipped.** `iteration_count_max`, `wall_time_ms_max`, and `mean_section_attempts_max` were originally canned-dispatcher baselines. They are now calibrated against the K=100 frozen baseline with Clopper-Pearson exact CIs, per-machine-class wall_time normalization, and `loadCalibratedGates` + `hold_provisional` ratchet protection. The §4.5 lock file commits a content-hash of the baseline; mutating it post hoc fails the seal verification. Where data is still thin, gates remain `hold_provisional` rather than locked. See [docs/PHASE_4_PLAN.md](docs/PHASE_4_PLAN.md) for the full pre-registration.\n5. **Citation presence ≠ citation validity.** A `// source: Knuth 1998` comment satisfies the convention whether or not Knuth 1998 exists or supports the value. We enforce that the citation IS THERE; the cross-audit cycle (genius + team review every phase) is what keeps it honest.\n\n---\n\n## Reproducing the audit cycle\n\nThe repo ships a multi-agent cross-audit workflow. After every non-trivial phase:\n\n```bash\n# Engineering team review:\n#   architect, code-reviewer, refactorer, test-engineer, security-auditor,\n#   devops-engineer, dba (when relevant)\n\n# Genius team review:\n#   feynman (integrity), curie (measurement), popper (falsifiability),\n#   dijkstra (correctness), shannon (signal), deming (variation),\n#   poincare (qualitative), ...\n```\n\nEach agent reads the current state of the code (not from memory) and produces a ranked finding list. The Phase 3+4 cycle generated 30 findings; 28 were closed in the same cycle (4 CRIT + 13 HIGH + 11 MED). Two are deferred to Phase 4 calibration with the evidence required to close them documented in docs/PHASE_4_PLAN.md.\n\n---\n\n## Project layout\n\n```\npackages/\n├── core/                  Domain types · schemas · agent identities · evidence repo\n├── validation/            Hard Output Rules · per-section + cross-section validators\n├── verification/          Claim extraction · judge selection · consensus engine\n├── meta-prompting/        Prompt builders (clarification / draft / jira)\n├── strategy/              Thinking-strategy selector\n├── orchestration/         Stateless reducer · 9 step handlers · runner · canned-dispatcher\n├── ecosystem-adapters/    StdioMcpClient · AutomatisedPipelineClient · CortexClient\n├── mcp-server/            Composition root · 17 MCP tools registered\n├── benchmark/             Pipeline KPI measurement · golden-fixture HOR scoring\n│   └── calibration/       Phase 4: ReliabilityRepository · KM survival ·\n│                          Clopper-Pearson · XmR · paired-bootstrap ·\n│                          external oracles · sealed held-out partitions ·\n│                          production-mode dispatcher\n└── skill/                 SKILL.md · slash-command definitions\n```\n\n---\n\n## License\n\nMIT.\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003cem\u003eDon't ship a PRD that hallucinates a function it can't measure.\u003cbr\u003e\n  Ship one whose every claim was judged by Pearl, Curie, Liskov, and a panel of seven others, validated against the call graph, and grounded in what Cortex remembers from yesterday.\u003c/em\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcdeust%2Fprd-spec-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcdeust%2Fprd-spec-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcdeust%2Fprd-spec-generator/lists"}