https://github.com/tim101010101/arena
Multi-agent AI arena for debates, code reviews, and red-team challenges via Model Context Protocol (MCP)
https://github.com/tim101010101/arena
adversarial-ai ai-agents ai-arena ai-debate claude code-review codex collective-intelligence gemini mcp mcp-server model-context-protocol multi-agent openai red-team
Last synced: about 1 month ago
JSON representation
Multi-agent AI arena for debates, code reviews, and red-team challenges via Model Context Protocol (MCP)
- Host: GitHub
- URL: https://github.com/tim101010101/arena
- Owner: tim101010101
- License: mit
- Created: 2026-03-10T08:50:33.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-10T13:51:00.000Z (3 months ago)
- Last Synced: 2026-03-10T20:17:16.409Z (3 months ago)
- Topics: adversarial-ai, ai-agents, ai-arena, ai-debate, claude, code-review, codex, collective-intelligence, gemini, mcp, mcp-server, model-context-protocol, multi-agent, openai, red-team
- Language: TypeScript
- Size: 58.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# Arena
```
█████╗ ██████╗ ███████╗███╗ ██╗ █████╗
██╔══██╗██╔══██╗██╔════╝████╗ ██║██╔══██╗
███████║██████╔╝█████╗ ██╔██╗ ██║███████║
██╔══██║██╔══██╗██╔══╝ ██║╚██╗██║██╔══██║
██║ ██║██║ ██║███████╗██║ ╚████║██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝ ╚═══╝╚═╝ ╚═╝
```
[](https://github.com/tim101010101/arena/actions/workflows/ci.yml)
[](https://opensource.org/licenses/MIT)
[](https://github.com/tim101010101/arena/releases)
**A position-driven adversarial arena for AI agents.** Host provides context and 2+ opposing positions; arena dispatches local CLI models (Claude, Codex, Gemini, OpenAI, Kimi) to argue each position over multiple rounds and returns the transcript.
A standalone CLI — invoke it from your shell, scripts, or any agent that can run shell commands.
## Mental model
- **Host doesn't fight.** The caller (Claude Code, Codex CLI, scripts) just supplies what should be argued and which positions to argue.
- **Position is the unit, not the model.** Adversarial value comes from clashing stances, not from "which model wins". Same model with two different system prompts is a valid pair if no other CLI is available.
- **Arena owns model dispatch.** It picks distinct models when multiple CLIs are healthy, falls back to reusing one when not.
## Subcommands
| Subcommand | Purpose |
|---|---|
| `arena challenge` | Core. Run N positions over R rounds against the supplied context. |
| `arena review` | Code-review preset over `arena challenge`. Spawns attacker positions (default: bug-hunter + security-auditor) on the supplied code/diff. |
| `arena health` | List agent CLIs and their availability. |
| `arena mcp` | Start arena as a stdio MCP server — exposes each scenario as a tool callable from any MCP client. |
## Install
```bash
# Required: at least one of these CLIs in $PATH
npm install -g @anthropic-ai/claude-cli # for "claude"
npm install -g @codex-ai/cli # for "codex" / "openai" / "gemini"
uv tool install kimi-cli # for "kimi" (or: pipx install kimi-cli)
```
### Shell (no npm/node required)
Downloads a self-contained native binary from the latest GitHub release. Supports macOS (arm64/x64) and Linux (arm64/x64).
```bash
curl -fsSL https://raw.githubusercontent.com/tim101010101/arena/main/install.sh | bash
```
Installs to `~/.local/bin/arena`. Override the directory with `ARENA_INSTALL_DIR`, or pin a version with `ARENA_VERSION`:
```bash
ARENA_INSTALL_DIR=/usr/local/bin ARENA_VERSION=v0.1.3 \
curl -fsSL https://raw.githubusercontent.com/tim101010101/arena/main/install.sh | bash
```
### npm
```bash
npm install -g arena-debate # or: npx arena-debate
```
## CLI usage
```bash
# Adversarial debate — supply your own positions
arena challenge \
--context "Should we use microservices or a monolith for a 10k-user product with 5 devs?" \
--position "Pro-microservices: team boundaries justify the split" \
--position "Pro-monolith: a 5-person team should not carry the ops burden" \
--rounds 3
# Adversarial code review (positions auto-derived from --focus)
arena review --git-ref feature/auth --focus bugs,security
arena review --files src/login.ts,src/session.ts --focus security
# Override which models to use (must already be healthy)
arena challenge --context "..." --position a --position b --models claude,codex
# Diagnostics
arena health
arena --version
arena --help
```
## MCP server
`arena mcp` starts a stdio MCP server. Each loaded scenario (`challenge`, `review`, and any user-defined ones) is exposed as an MCP tool; a `health` tool is also included.
Add it to your MCP client config (e.g. Claude Desktop or Claude Code `.mcp.json`):
```json
{
"mcpServers": {
"arena": {
"command": "arena",
"args": ["mcp"]
}
}
}
```
Once connected, your AI client can call:
- **`challenge`** — supply `context` (string) and `positions` (array of ≥2 strings); optional `rounds` and `models`.
- **`review`** — supply `sources` (array of source objects: `raw`, `git_ref`, `git_range`, `file_list`, or `patch_file`); optional `focus`, `rounds`, and `models`.
- **`health`** — returns availability of all local agent CLIs.
## Configuration (env vars)
| Variable | Default | Notes |
|---|---|---|
| `ARENA_TIMEOUT_MS` | `120000` | Per-fighter execution timeout |
| `ARENA_DEFAULT_ROUNDS` | `3` | Default rounds when not specified |
| `ARENA_DEFAULT_MODE` | `parallel` | Reserved (challenge runs sequentially) |
| `ARENA_MAX_CONTEXT_SIZE` | `1000000` | Max bytes from `sources` |
| `ARENA_CLAUDE_MODEL` / `ARENA_CODEX_MODEL` / `ARENA_GEMINI_MODEL` / `ARENA_OPENAI_MODEL` / `ARENA_KIMI_MODEL` | CLI default | Per-adapter model override |
## Dispatch behavior
```
positions = ["A", "B"]
available = healthCheckAll().filter(ok)
override = caller-supplied --models / models[]
pool = override ?? available
fighter[i].model = pool[i % pool.length]
```
- Prefers distinct models when `len(positions) ≤ len(pool)`.
- Cycles when positions outnumber the pool — same model, different prompts.
- Each fighter gets a unique id (`#`) so transcripts stay disambiguated.
## Development
```bash
bun install
bun test # full suite
bun run build # produces dist/index.js
```
## License
MIT