https://github.com/zscole/adversarial-spec
A Claude Code plugin that iteratively refines product specifications by debating between multiple LLMs until all models reach consensus.
https://github.com/zscole/adversarial-spec
anthropic claude-ai claude-code claude-code-plugin claude-skills llm orchestration
Last synced: about 2 months ago
JSON representation
A Claude Code plugin that iteratively refines product specifications by debating between multiple LLMs until all models reach consensus.
- Host: GitHub
- URL: https://github.com/zscole/adversarial-spec
- Owner: zscole
- License: mit
- Created: 2026-01-11T04:24:30.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-01-22T06:33:17.000Z (about 2 months ago)
- Last Synced: 2026-01-22T20:58:12.723Z (about 2 months ago)
- Topics: anthropic, claude-ai, claude-code, claude-code-plugin, claude-skills, llm, orchestration
- Language: Python
- Homepage:
- Size: 120 KB
- Stars: 453
- Watchers: 3
- Forks: 40
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-claude-code - **adversarial-spec**
README
# adversarial-spec
A Claude Code plugin that iteratively refines product specifications through multi-model debate until consensus is reached.
**Key insight:** A single LLM reviewing a spec will miss things. Multiple LLMs debating a spec will catch gaps, challenge assumptions, and surface edge cases that any one model would overlook. The result is a document that has survived rigorous adversarial review.
**Claude is an active participant**, not just an orchestrator. Claude provides independent critiques, challenges opponent models, and contributes substantive improvements alongside external models.
## Quick Start
```bash
# 1. Add the marketplace and install the plugin
claude plugin marketplace add zscole/adversarial-spec
claude plugin install adversarial-spec
# 2. Set at least one API key
export OPENAI_API_KEY="sk-..."
# Or use OpenRouter for access to multiple providers with one key
export OPENROUTER_API_KEY="sk-or-..."
# 3. Run it
/adversarial-spec "Build a rate limiter service with Redis backend"
```
## How It Works
```
You describe product --> Claude drafts spec --> Multiple LLMs critique in parallel
| |
| v
| Claude synthesizes + adds own critique
| |
| v
| Revise and repeat until ALL agree
| |
+--------------------------------------------->|
v
User review period
|
v
Final document output
```
1. Describe your product concept or provide an existing document
2. (Optional) Start with an in-depth interview to capture requirements
3. Claude drafts the initial document (PRD or tech spec)
4. Document is sent to opponent models (GPT, Gemini, Grok, etc.) for parallel critique
5. Claude provides independent critique alongside opponent feedback
6. Claude synthesizes all feedback and revises
7. Loop continues until ALL models AND Claude agree
8. User review period: request changes or run additional cycles
9. Final converged document is output
## Requirements
- Python 3.10+
- `litellm` package: `pip install litellm`
- API key for at least one LLM provider
## Supported Models
| Provider | Env Var | Example Models |
|------------|------------------------|----------------------------------------------|
| OpenAI | `OPENAI_API_KEY` | `gpt-4o`, `gpt-4-turbo`, `o1` |
| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-20250514`, `claude-opus-4-20250514` |
| Google | `GEMINI_API_KEY` | `gemini/gemini-2.0-flash`, `gemini/gemini-pro` |
| xAI | `XAI_API_KEY` | `xai/grok-3`, `xai/grok-beta` |
| Mistral | `MISTRAL_API_KEY` | `mistral/mistral-large`, `mistral/codestral` |
| Groq | `GROQ_API_KEY` | `groq/llama-3.3-70b-versatile` |
| OpenRouter | `OPENROUTER_API_KEY` | `openrouter/openai/gpt-4o`, `openrouter/anthropic/claude-3.5-sonnet` |
| Codex CLI | ChatGPT subscription | `codex/gpt-5.2-codex`, `codex/gpt-5.1-codex-max` |
| Gemini CLI | Google account | `gemini-cli/gemini-3-pro-preview`, `gemini-cli/gemini-3-flash-preview` |
| Deepseek | `DEEPSEEK_API_KEY` | `deepseek/deepseek-chat` |
| Zhipu | `ZHIPUAI_API_KEY` | `zhipu/glm-4`, `zhipu/glm-4-plus` |
Check which keys are configured:
```bash
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" providers
```
## AWS Bedrock Support
For enterprise users who need to route all model calls through AWS Bedrock (e.g., for security compliance or inference gateway requirements):
```bash
# Enable Bedrock mode
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock enable --region us-east-1
# Add models enabled in your Bedrock account
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock add-model claude-3-sonnet
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock add-model claude-3-haiku
# Check configuration
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock status
# Disable Bedrock mode
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" bedrock disable
```
When Bedrock is enabled, **all model calls route through Bedrock** - no direct API calls are made. Use friendly names like `claude-3-sonnet` which are automatically mapped to Bedrock model IDs.
Configuration is stored at `~/.claude/adversarial-spec/config.json`.
## OpenRouter Support
[OpenRouter](https://openrouter.ai) provides unified access to multiple LLM providers through a single API. This is useful for:
- Accessing models from multiple providers with one API key
- Comparing models across different providers
- Automatic fallback and load balancing
- Cost optimization across providers
**Setup:**
```bash
# Get your API key from https://openrouter.ai/keys
export OPENROUTER_API_KEY="sk-or-..."
# Use OpenRouter models (prefix with openrouter/)
python3 debate.py critique --models openrouter/openai/gpt-4o,openrouter/anthropic/claude-3.5-sonnet < spec.md
```
**Popular OpenRouter models:**
- `openrouter/openai/gpt-4o` - GPT-4o via OpenRouter
- `openrouter/anthropic/claude-3.5-sonnet` - Claude 3.5 Sonnet
- `openrouter/google/gemini-2.0-flash` - Gemini 2.0 Flash
- `openrouter/meta-llama/llama-3.3-70b-instruct` - Llama 3.3 70B
- `openrouter/qwen/qwen-2.5-72b-instruct` - Qwen 2.5 72B
See the full model list at [openrouter.ai/models](https://openrouter.ai/models).
## Codex CLI Support
[Codex CLI](https://github.com/openai/codex) allows ChatGPT Pro subscribers to use OpenAI models without separate API credits. Models prefixed with `codex/` are routed through the Codex CLI.
**Setup:**
```bash
# Install Codex CLI (requires ChatGPT Pro subscription)
npm install -g @openai/codex
# Use Codex models (prefix with codex/)
python3 debate.py critique --models codex/gpt-5.2-codex,gemini/gemini-2.0-flash < spec.md
```
**Reasoning effort:**
Control how much thinking time the model uses with `--codex-reasoning`:
```bash
# Available levels: low, medium, high, xhigh (default: xhigh)
python3 debate.py critique --models codex/gpt-5.2-codex --codex-reasoning high < spec.md
```
Higher reasoning effort produces more thorough analysis but uses more tokens.
**Available Codex models:**
- `codex/gpt-5.2-codex` - GPT-5.2 via Codex CLI
- `codex/gpt-5.1-codex-max` - GPT-5.1 Max via Codex CLI
Check Codex CLI installation status:
```bash
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" providers
```
## Gemini CLI Support
[Gemini CLI](https://github.com/google-gemini/gemini-cli) allows Google account holders to use Gemini models without separate API credits. Models prefixed with `gemini-cli/` are routed through the Gemini CLI.
**Setup:**
```bash
# Install Gemini CLI
npm install -g @google/gemini-cli && gemini auth
# Use Gemini CLI models (prefix with gemini-cli/)
python3 debate.py critique --models gemini-cli/gemini-3-pro-preview < spec.md
```
**Available Gemini CLI models:**
- `gemini-cli/gemini-3-pro-preview` - Gemini 3 Pro via CLI
- `gemini-cli/gemini-3-flash-preview` - Gemini 3 Flash via CLI
Check Gemini CLI installation status:
```bash
python3 "$(find ~/.claude -name debate.py -path '*adversarial-spec*' 2>/dev/null | head -1)" providers
```
## OpenAI-Compatible Endpoints
For models that expose an OpenAI-compatible API (local LLMs, self-hosted models, alternative providers), set `OPENAI_API_BASE`:
```bash
# Point to a custom endpoint
export OPENAI_API_KEY="your-key"
export OPENAI_API_BASE="https://your-endpoint.com/v1"
# Use with any model name
python3 debate.py critique --models gpt-4o < spec.md
```
This works with:
- Local LLM servers (Ollama, vLLM, text-generation-webui)
- OpenAI-compatible providers
- Self-hosted inference endpoints
## Usage
**Start from scratch:**
```
/adversarial-spec "Build a rate limiter service with Redis backend"
```
**Refine an existing document:**
```
/adversarial-spec ./docs/my-spec.md
```
You will be prompted for:
1. **Document type**: PRD (business/product focus) or tech spec (engineering focus)
2. **Interview mode**: Optional in-depth requirements gathering session
3. **Opponent models**: Comma-separated list (e.g., `gpt-4o,gemini/gemini-2.0-flash,xai/grok-3`)
More models = more perspectives = stricter convergence.
## Document Types
### PRD (Product Requirements Document)
For stakeholders, PMs, and designers.
**Sections:** Executive Summary, Problem Statement, Target Users/Personas, User Stories, Functional Requirements, Non-Functional Requirements, Success Metrics, Scope (In/Out), Dependencies, Risks
**Critique focuses on:** Clear problem definition, well-defined personas, measurable success criteria, explicit scope boundaries, no technical implementation details
### Technical Specification
For developers and architects.
**Sections:** Overview, Goals/Non-Goals, System Architecture, Component Design, API Design (full schemas), Data Models, Infrastructure, Security, Error Handling, Performance/SLAs, Observability, Testing Strategy, Deployment Strategy
**Critique focuses on:** Complete API contracts, data model coverage, security threat mitigation, error handling, specific performance targets, no ambiguity for engineers
## Core Features
### Interview Mode
Before the debate begins, opt into an in-depth interview session to capture requirements upfront.
**Covers:** Problem context, users/stakeholders, functional requirements, technical constraints, UI/UX, tradeoffs, risks, success criteria
The interview uses probing follow-up questions and challenges assumptions. After completion, Claude synthesizes answers into a complete spec before starting the adversarial debate.
### Claude's Active Participation
Each round, Claude:
1. Reviews opponent critiques for validity
2. Provides independent critique (what did opponents miss?)
3. States agreement/disagreement with specific points
4. Synthesizes all feedback into revisions
Display format:
```
--- Round N ---
Opponent Models:
- [GPT-4o]: critiqued: missing rate limit config
- [Gemini]: agreed
Claude's Critique:
Security section lacks input validation strategy. Adding OWASP top 10 coverage.
Synthesis:
- Accepted from GPT-4o: rate limit configuration
- Added by Claude: input validation, OWASP coverage
- Rejected: none
```
### Early Agreement Verification
If a model agrees within the first 2 rounds, Claude is skeptical. The model is pressed to:
- Confirm it read the entire document
- List specific sections reviewed
- Explain why it agrees
- Identify any remaining concerns
This prevents false convergence from models that rubber-stamp without thorough review.
### User Review Period
After all models agree, you enter a review period with three options:
1. **Accept as-is**: Document is complete
2. **Request changes**: Claude updates the spec, you iterate without a full debate cycle
3. **Run another cycle**: Send the updated spec through another adversarial debate
### Additional Review Cycles
Run multiple cycles with different strategies:
- First cycle with fast models (gpt-4o), second with stronger models (o1)
- First cycle for structure/completeness, second for security focus
- Fresh perspective after user-requested changes
### PRD to Tech Spec Flow
When a PRD reaches consensus, you're offered the option to continue directly into a Technical Specification based on the PRD. This creates a complete documentation pair in a single session.
## Advanced Features
### Critique Focus Modes
Direct models to prioritize specific concerns:
```bash
--focus security # Auth, input validation, encryption, vulnerabilities
--focus scalability # Horizontal scaling, sharding, caching, capacity
--focus performance # Latency targets, throughput, query optimization
--focus ux # User journeys, error states, accessibility
--focus reliability # Failure modes, circuit breakers, disaster recovery
--focus cost # Infrastructure costs, resource efficiency
```
### Model Personas
Have models critique from specific professional perspectives:
```bash
--persona security-engineer # Thinks like an attacker
--persona oncall-engineer # Cares about debugging at 3am
--persona junior-developer # Flags ambiguity and tribal knowledge
--persona qa-engineer # Missing test scenarios
--persona site-reliability # Deployment, monitoring, incidents
--persona product-manager # User value, success metrics
--persona data-engineer # Data models, ETL implications
--persona mobile-developer # API design for mobile
--persona accessibility-specialist # WCAG, screen readers
--persona legal-compliance # GDPR, CCPA, regulatory
```
Custom personas also work: `--persona "fintech compliance officer"`
### Context Injection
Include existing documents for models to consider:
```bash
--context ./existing-api.md --context ./schema.sql
```
Use cases:
- Existing API documentation the new spec must integrate with
- Database schemas the spec must work with
- Design documents or prior specs for consistency
- Compliance requirements documents
### Session Persistence and Resume
Long debates can crash or need to pause. Sessions save state automatically:
```bash
# Start a named session
echo "spec" | python3 debate.py critique --models gpt-4o --session my-feature-spec
# Resume where you left off
python3 debate.py critique --resume my-feature-spec
# List all sessions
python3 debate.py sessions
```
Sessions save:
- Current spec state
- Round number
- All configuration (models, focus, persona, etc.)
- History of previous rounds
Sessions are stored in `~/.config/adversarial-spec/sessions/`.
### Auto-Checkpointing
When using sessions, each round's spec is saved to `.adversarial-spec-checkpoints/`:
```
.adversarial-spec-checkpoints/
├── my-feature-spec-round-1.md
├── my-feature-spec-round-2.md
└── my-feature-spec-round-3.md
```
Use these to rollback if a revision makes things worse.
### Preserve Intent Mode
Convergence can sand off novel ideas when models interpret "unusual" as "wrong". The `--preserve-intent` flag makes removal expensive:
```bash
--preserve-intent
```
When enabled, models must:
1. **Quote exactly** what they want to remove or substantially change
2. **Justify the harm** - not just "unnecessary" but what concrete problem it causes
3. **Distinguish error from preference** - only remove things that are factually wrong, contradictory, or risky
4. **Ask before removing** unusual but functional choices: "Was this intentional?"
This shifts the default from "sand off anything unusual" to "add protective detail while preserving distinctive choices."
Use when:
- Your spec contains intentional unconventional choices
- You want models to challenge your ideas, not homogenize them
- Previous rounds removed things you wanted to keep
### Cost Tracking
Every critique round displays token usage and estimated cost:
```
=== Cost Summary ===
Total tokens: 12,543 in / 3,221 out
Total cost: $0.0847
By model:
gpt-4o: $0.0523 (8,234 in / 2,100 out)
gemini/gemini-2.0-flash: $0.0324 (4,309 in / 1,121 out)
```
### Saved Profiles
Save frequently used configurations:
```bash
# Create a profile
python3 debate.py save-profile strict-security \
--models gpt-4o,gemini/gemini-2.0-flash \
--focus security \
--doc-type tech
# Use a profile
python3 debate.py critique --profile strict-security < spec.md
# List profiles
python3 debate.py profiles
```
Profiles are stored in `~/.config/adversarial-spec/profiles/`.
### Diff Between Rounds
See exactly what changed between spec versions:
```bash
python3 debate.py diff --previous round1.md --current round2.md
```
### Export to Task List
Extract actionable tasks from a finalized spec:
```bash
cat spec-output.md | python3 debate.py export-tasks --models gpt-4o --doc-type prd
```
Output includes title, type, priority, description, and acceptance criteria.
Use `--json` for structured output suitable for importing into issue trackers.
## Telegram Integration (Optional)
Get notified on your phone and inject feedback during the debate.
**Setup:**
1. Message @BotFather on Telegram, send `/newbot`, follow prompts
2. Copy the bot token
3. Run: `python3 "$(find ~/.claude -name telegram_bot.py -path '*adversarial-spec*' 2>/dev/null | head -1)" setup`
4. Message your bot, run setup again to get your chat ID
5. Set environment variables:
```bash
export TELEGRAM_BOT_TOKEN="..."
export TELEGRAM_CHAT_ID="..."
```
**Features:**
- Async notifications when rounds complete (includes cost)
- 60-second window to reply with feedback (incorporated into next round)
- Final document sent to Telegram when debate concludes
## Output
Final document is:
- Complete, following full structure for document type
- Vetted by all models until unanimous agreement
- Ready for stakeholders without further editing
Output locations:
- Printed to terminal
- Written to `spec-output.md` (PRD) or `tech-spec-output.md` (tech spec)
- Sent to Telegram (if enabled)
Debate summary includes rounds completed, cycles run, models involved, Claude's contributions, cost, and key refinements made.
## CLI Reference
```bash
# Core commands
debate.py critique --models MODEL_LIST --doc-type TYPE [OPTIONS] < spec.md
debate.py critique --resume SESSION_ID
debate.py diff --previous OLD.md --current NEW.md
debate.py export-tasks --models MODEL --doc-type TYPE [--json] < spec.md
# Info commands
debate.py providers # List providers and API key status
debate.py focus-areas # List focus areas
debate.py personas # List personas
debate.py profiles # List saved profiles
debate.py sessions # List saved sessions
# Profile management
debate.py save-profile NAME --models ... [--focus ...] [--persona ...]
# Bedrock management
debate.py bedrock status # Show Bedrock configuration
debate.py bedrock enable --region REGION # Enable Bedrock mode
debate.py bedrock disable # Disable Bedrock mode
debate.py bedrock add-model MODEL # Add model to available list
debate.py bedrock remove-model MODEL # Remove model from list
debate.py bedrock list-models # List built-in model mappings
```
**Options:**
- `--models, -m` - Comma-separated model list (auto-detects from available API keys if not specified)
- `--doc-type, -d` - prd or tech
- `--codex-reasoning` - Reasoning effort for Codex models (low, medium, high, xhigh; default: xhigh)
- `--focus, -f` - Focus area (security, scalability, performance, ux, reliability, cost)
- `--persona` - Professional persona
- `--context, -c` - Context file (repeatable)
- `--profile` - Load saved profile
- `--preserve-intent` - Require justification for removals
- `--session, -s` - Session ID for persistence and checkpointing
- `--resume` - Resume a previous session
- `--press, -p` - Anti-laziness check
- `--telegram, -t` - Enable Telegram
- `--json, -j` - JSON output
## File Structure
```
adversarial-spec/
├── .claude-plugin/
│ └── plugin.json # Plugin metadata
├── README.md
├── LICENSE
└── skills/
└── adversarial-spec/
├── SKILL.md # Skill definition and process
└── scripts/
├── debate.py # Multi-model debate orchestration
└── telegram_bot.py # Telegram notifications
```
## License
MIT