https://github.com/homeofe/ai-security-arena

AI Security Arena: Interactive web interface for AI-powered Red Team vs Blue Team security battles. Model selection, live battle logs, scoring, and scenario building.
https://github.com/homeofe/ai-security-arena

ai arena blue-team cybersecurity elvatis llm multi-model nextjs red-team security

Last synced: 3 months ago
JSON representation

AI Security Arena: Interactive web interface for AI-powered Red Team vs Blue Team security battles. Model selection, live battle logs, scoring, and scenario building.

Host: GitHub
URL: https://github.com/homeofe/ai-security-arena
Owner: homeofe
License: apache-2.0
Created: 2026-03-18T09:39:26.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-30T17:37:26.000Z (3 months ago)
Last Synced: 2026-03-30T18:29:33.166Z (3 months ago)
Topics: ai, arena, blue-team, cybersecurity, elvatis, llm, multi-model, nextjs, red-team, security
Language: TypeScript
Size: 1010 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md

Awesome Lists containing this project

README

# AI Security Arena

> Interactive web interface for AI-powered Red Team vs Blue Team security battles.
> Combines [ai-red-team](https://github.com/homeofe/ai-red-team) and [ai-blue-team](https://github.com/homeofe/ai-blue-team) in one place.

---

## Screenshots

### Arena Setup
![Arena Setup](docs/screenshots/arena-setup-final.jpg)

### Battle Report (IoT Factory Floor: Claude Opus 4.6 vs Claude Opus 4.6)
![Battle Report](docs/screenshots/battle-report-full.jpg)

### Match History (with saved battles)
![Match History](docs/screenshots/history-with-match.jpg)

### Replay Viewer (VCR controls, step-by-step playback)
![Replay Viewer](docs/screenshots/replay-viewer.jpg)

### Leaderboard (model rankings after battles)
![Leaderboard](docs/screenshots/leaderboard-with-data.jpg)

### Scenario Builder
![Scenario Builder](docs/screenshots/scenario-builder.jpg)

---

## Concept

```
┌──────────────────────────────────────────────────────────────┐
│ AI SECURITY ARENA │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ RED TEAM │ ← sandbox → │ BLUE TEAM │ │
│ │ (Attacker) │ │ (Defender) │ │
│ │ │ live battle │ │ │
│ │ Model: ... │ ←──────────────→ │ Model: ... │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ WEB INTERFACE │ │
│ │ - Model selection per team │ │
│ │ - Custom prompts + example library │ │
│ │ - Live battle log (WebSocket) │ │
│ │ - Scoreboard + leaderboard │ │
│ │ - Scenario builder │ │
│ │ - Round-by-round replay │ │
│ │ - AAHP evolution tracker │ │
│ │ - Match export (JSON/PDF) │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
```

## Features

### Core
- **Model Selection:** Choose any LLM for each team (Claude, GPT, Gemini, Grok, local models)
- **Custom Prompts:** Write your own attack/defense strategies or use curated examples
- **Live Battle Log:** Real-time WebSocket stream, Red (left) vs Blue (right), every action timestamped
- **Scenario Builder:** Pick environments (web-server, database, cloud-infra, IoT) or create custom ones

### Scoring & History
- **Scoreboard:** Per-match scoring (attacks landed, attacks blocked, time to detect, etc.)
- **Leaderboard:** Which model pairs perform best? Historical rankings across all matches
- **Round-by-Round Replay:** Step through completed matches forensic-style

### Intelligence
- **AAHP Evolution Tracker:** Both teams self-improve via GitHub Issues. Track the evolution over time.
- **Match Export:** Download reports as JSON or PDF for documentation

### Safety
- **Sandbox Isolation:** All battles run in isolated sandboxes. Clear visual indicator.
- **Budget Limiter:** Cap API costs per match (both teams make LLM calls simultaneously)
- **Prompt Sanitization:** Custom prompts are validated to prevent sandbox escape

## Tech Stack

| Component | Technology |
|-----------|-----------|
| Frontend | Next.js 15 (App Router, React Server Components) |
| Styling | Tailwind CSS |
| Realtime | WebSocket (ws) + Server-Sent Events fallback |
| Backend | Next.js API Routes + ai-red-team/ai-blue-team as libraries |
| Database | SQLite (via better-sqlite3) for match history |
| Testing | Vitest |

## Getting Started

### Prerequisites

- Node.js 22+
- pnpm

### Installation

```bash
git clone https://github.com/homeofe/ai-security-arena.git
cd ai-security-arena
pnpm install
pnpm dev
```

Open http://localhost:3000

### API Key Configuration

**Option A: Via the Settings UI (recommended)**

Navigate to http://localhost:3000/settings and enter your API keys directly.
Keys are stored locally in `.data/config.json` (gitignored) and never exposed to the browser.

**Option B: Via environment variables**

```env
# At least one LLM provider key required
ANTHROPIC_API_KEY=sk-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=AIza...

# Optional: budget limit per match (USD)
MATCH_BUDGET_LIMIT=1.00
```

Keys set via the Settings UI take priority over environment variables.

### CLI Tools (for CLI mode)

CLI mode requires the respective CLI tools installed on your system:

| Provider | CLI Tool | Install |
|----------|---------|---------|
| Anthropic (Claude) | `claude` | `npm install -g @anthropic-ai/claude-code` |
| Google (Gemini) | `gemini` | `npm install -g @anthropic-ai/gemini-cli` |
| OpenAI (GPT/Codex) | `codex` | `npm install -g @openai/codex` |

Check the **Status** page (http://localhost:3000/status) to verify all connectors are healthy.

Mock mode works without any API keys or CLI tools installed.

## Project Structure

```
ai-security-arena/
├── src/
│ ├── app/ # Next.js App Router pages
│ │ ├── page.tsx # Landing / dashboard
│ │ ├── arena/
│ │ │ └── page.tsx # Battle setup + live view
│ │ ├── history/
│ │ │ └── page.tsx # Match history + replays
│ │ ├── leaderboard/
│ │ │ └── page.tsx # Model rankings
│ │ ├── settings/
│ │ │ └── page.tsx # API key management
│ │ ├── status/
│ │ │ └── page.tsx # System health dashboard
│ │ └── api/
│ │ ├── battle/ # Start/stop battles
│ │ ├── settings/ # API key CRUD
│ │ ├── status/ # System health check
│ │ ├── ws/ # WebSocket endpoint
│ │ └── matches/ # Match CRUD
│ ├── components/
│ │ ├── BattleLog.tsx # Split-screen live log
│ │ ├── BattleHeader.tsx # Round counter, timer, cost
│ │ ├── ModelPicker.tsx # Model selection per team
│ │ ├── PromptEditor.tsx # Tabbed prompt editor (examples + custom)
│ │ ├── ScenarioSelector.tsx # Horizontal scroll scenario cards
│ │ ├── ScoreBar.tsx # Animated score comparison bar
│ │ ├── PhaseIcon.tsx # Color-coded phase badges
│ │ ├── ReportHeader.tsx # Classified intel briefing header
│ │ ├── ScoreOverview.tsx # Large score display + stats
│ │ ├── ScoreChart.tsx # Pure CSS/SVG bar + line charts
│ │ ├── DecisionTimeline.tsx # Vertical alternating timeline
│ │ ├── ReasoningViewer.tsx # Expandable LLM reasoning per round
│ │ ├── StrategyBreakdown.tsx # Side-by-side strategy analysis
│ │ ├── TurningPoints.tsx # Momentum shift highlights
│ │ └── ExportButtons.tsx # JSON/PDF/share export
│ ├── lib/
│ │ ├── arena.ts # Arena controller (mock/cli/api modes)
│ │ ├── cli-provider.ts # CLI-based LLM provider (claude/gemini/codex)
│ │ ├── config.ts # Server-side config store (API keys)
│ │ ├── prompt-builder.ts # Battle prompt construction per round
│ │ ├── response-parser.ts # Parse LLM responses into BattleEvents
│ │ ├── report-generator.ts # Post-match analysis and report generation
│ │ ├── mock-battle.ts # Realistic mock battle events
│ │ ├── models.ts # Available model registry
│ │ ├── prompts.ts # Example prompt library
│ │ ├── scenarios.ts # Built-in scenarios
│ │ └── scoring.ts # Scoring engine
│ └── types/
│ └── index.ts # Shared type definitions
├── .ai/handoff/ # AAHP protocol files
├── docs/screenshots/ # UI screenshots
├── package.json
├── next.config.ts
├── postcss.config.mjs
├── tsconfig.json
└── vitest.config.ts
```

## Roadmap

| # | Feature | Status |
|---|---------|--------|
| 1 | Project setup (Next.js + Tailwind + dependencies) | ✅ Done |
| 2 | Model picker + scenario selector UI | ✅ Done |
| 3 | Arena controller (mock + CLI + API modes) | ✅ Done |
| 4 | Split-screen battle view (Red vs Blue) | ✅ Done |
| 5 | Scoring engine | ✅ Done |
| 6 | Custom prompt editor + example library | ✅ Done |
| 7 | CLI provider integration (claude/gemini/codex) | ✅ Done |
| 8 | Battle Report page (timeline, reasoning, strategy, export) | ✅ Done |
| 9 | Export (JSON/PDF) | ✅ Done |
| 10 | SSE real-time battle events | ✅ Done |
| 11 | ai-red-team SDK integration | ✅ Done |
| 12 | ai-blue-team SDK integration | ✅ Done |
| 13 | Match history + SQLite persistence | ✅ Done |
| 14 | Round-by-round replay viewer | ✅ Done |
| 15 | Leaderboard with model rankings | ✅ Done |
| 16 | Scenario builder | ✅ Done |
| 17 | Deployment (Docker + Vercel) | ✅ Done |
| 18 | System status page (health monitoring) | ✅ Done |
| 19 | Settings page (API key management via UI) | ✅ Done |
| 20 | Windows compatibility (CLI spawning, native modules) | ✅ Done |

## License

## Related Projects

- [ai-red-team](https://github.com/homeofe/ai-red-team) - Offensive security agent
- [ai-blue-team](https://github.com/homeofe/ai-blue-team) - Defensive security agent
- [AAHP](https://github.com/homeofe/AAHP) - Agent Handoff Protocol (self-evolution)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/homeofe/ai-security-arena

Awesome Lists containing this project

README