An open API service indexing awesome lists of open source software.

https://github.com/terrordrummer/symposium

An opinionated protocol for structured, sequential, adversarial multi-agent deliberation. Normative specification + v1.0.0 JSON Schemas.
https://github.com/terrordrummer/symposium

ai-agents deliberation json-schema llm multi-agent protocol reasoning specification

Last synced: 23 days ago
JSON representation

An opinionated protocol for structured, sequential, adversarial multi-agent deliberation. Normative specification + v1.0.0 JSON Schemas.

Awesome Lists containing this project

README

          


Symposium logo

Symposium


An opinionated protocol for structured, sequential, adversarial multi-agent deliberation.


Spec
Schemas
Reference impl
CI
License

---

## What is this?

Symposium is a **protocol specification** + a **reference Python
runtime** that orchestrates a small panel of LLM-backed agents through
a structured, turn-based deliberation, producing a single, replayable,
schema-validated artifact.

It is **not** a generic agent framework. It enforces exactly one
conversation topology — fixed panel, one primary turn per agent per
round, one structurally-separated coordinator, bounded forks — and
trades topology flexibility for **testable scheduler invariants** and
**byte-identical replay** of any past session.

Two things ship together in this repo:

1. **`docs/specification.md`** — the normative protocol. Implementable
in any language. The spec is what conformance means.
2. **`symposium/`** — the reference Python runtime. Today: full
scheduler, persistence, replay, the deterministic `FakeProvider`
adapter, an OpenAI-shaped HTTP adapter (real OpenAI plus
self-hosted OpenAI-compatible endpoints), and an Anthropic-shaped
HTTP adapter (real Anthropic plus self-hosted Anthropic-compatible
endpoints).

---

## Why one more protocol?

Most multi-agent stacks expose enough flexibility (group chat,
arbitrary handoffs, nested supervisors) that any two implementations
diverge on the parts that matter — when does the conversation stop, what
exactly is replayed, what fails the run, how is delegation routed. Each
implementation invents its own answers, and operators end up debugging
the framework instead of the agents.

Symposium goes the opposite way: **one opinionated topology, sharp
boundaries, closed enums.** What you get in exchange:

| | Symposium |
|---|---|
| **Topology** | Fixed `deliberation_panel`, one `primary_turn` per agent per round, single `coordination_turn` from a structurally-separated `coordinator_agent`. |
| **Inter-agent routing** | Schema-validated `direct_request` only. Inline `@AgentName` in prose is never routing — prompt-injection resistant by construction. |
| **Roles** | Three-way separation: `Selector` chooses *who*, `CoordinatorAgent` recommends *what next* (LLM, no executive power), `OrchestratorRuntime` schedules and terminates (deterministic code, sole party that decides when a session stops). |
| **Failure surface** | Closed 7-value termination-reason enum; closed 12-value adapter `error.kind` enum; closed 3-value `on_agent_failure` policy. |
| **Replayability** | Four distinct contracts documented separately: `transcript_replay` (unconditional byte identity), `execution_replay` (conditional on ten pinning conditions), golden-test byte identity, `fake_provider` determinism. No "it should be deterministic" hand-waving. |
| **Persistence** | Canonical `Artifact` (§5.10) with RFC-8785 JCS-canonicalized `transcript_digest` (SHA-256). Tamper-evident. |
| **Execution mode** | MVP is **batch-only** (ADR-004). Interactive / event-stream / async are explicitly v1+. |

Full discussion in §10 *Competitive Positioning* of the spec.

---

## Quick start

The reference runtime ships three adapters out of the box: the
deterministic `FakeProvider` (for tests and reproducible demos), an
OpenAI-shaped HTTP adapter (for real-model sessions against
`api.openai.com` or any OpenAI-Chat-Completions-compatible endpoint),
and an Anthropic-shaped HTTP adapter (for real-model sessions against
`api.anthropic.com` or any Anthropic-Messages-compatible endpoint).
Every flow produces a persisted, byte-identically replayable artifact.

The distribution name is `symposium-protocol`; the import package is
`symposium` (cf. scikit-learn → sklearn).

```bash
# Stable install (PyPI)
pip install symposium-protocol # then: import symposium

# Released tag, straight from GitHub (works without PyPI)
pip install "git+https://github.com/terrordrummer/symposium@v1.5.0"

# Development install (editable, from a clone)
git clone https://github.com/terrordrummer/symposium
cd symposium
pip install -e ".[test]"
```

### Fake-driven session (no API key, no network)

```bash
symposium run \
--config examples/configs/walking-skeleton.yaml \
--script examples/scripts/walking-skeleton.json \
--output runs/ \
examples/problem.md

# Replay (byte-identity check on the stored canonical_transcript)
symposium replay runs/demo-walking-skeleton-001

# Validate the artifact against the v1.0.0 JSON Schemas
symposium validate runs/demo-walking-skeleton-001/artifact.json
```

### OpenAI-driven session

```bash
export OPENAI_API_KEY=sk-...
# Optional: point at a self-hosted OpenAI-compatible endpoint
# export OPENAI_BASE_URL=https://my-llm-proxy.internal/v1

symposium run \
--config examples/configs/openai.yaml \
--output runs/ \
examples/problem.md
```

### Anthropic-driven session

```bash
export ANTHROPIC_API_KEY=sk-ant-...
# Optional: point at a self-hosted Anthropic-compatible endpoint
# export ANTHROPIC_BASE_URL=https://my-llm-proxy.internal/v1

symposium run \
--config examples/configs/anthropic.yaml \
--output runs/ \
examples/problem.md
```

### Selecting the panel

Before round 1 the §4.1 **selector** chooses the active deliberation
panel and binds the coordinator. `Config.selector.strategy` picks one of
three strategies, each emitting a schema-valid `SelectorOutput`
(§5.11) written to `/selector_output.json` on every run:

- **`fixed`** (default, MVP/R3) — degenerate: the panel is the declared
`default_deliberation_panel` and the coordinator is the declared
`coordinator_agent`. Makes **no** provider call.
- **`rules`** — pure, deterministic. Matches each agent's persona
metadata (`reasoning_scope` / `domain_scope`) against the
`problem_statement` via a transparent keyword table; records dropped
agents in `excluded_agents`. No provider call, so the same `(config)`
yields a byte-identical decision (and stays replayable under §7.6).
- **`llm`** — one bounded provider invocation (the §6.2
`expected_output_schema = null` free-text path, driven by the
coordinator agent's `provider`/`model`) parsed into a `SelectorOutput`.
Requires a `selector_budget` (§5.2); its usage is budgeted separately
and never enters `Artifact.cumulative_usage` or the `transcript_digest`.
For fake sessions, script the single selector call with
`--selector-script` (mirrors `--script`).

```bash
# rules: deterministic, no model call
symposium run \
--config examples/configs/rules-selector.yaml \
--script examples/scripts/walking-skeleton.json \
--output runs/

# llm: one bounded selector call (separate fake script) + deliberation
symposium run \
--config examples/configs/llm-selector.yaml \
--selector-script examples/scripts/llm-selector.json \
--script examples/scripts/walking-skeleton.json \
--output runs/
# → stdout: selector_strategy=… / selected_agents=…
# → /selector_output.json
```

The selector is a distinct ADR-005 role: it chooses *who* deliberates,
emits no `canonical_transcript` message, and an empty/malformed selection
terminates the session with `reason = schema_error` before round 1.

### Inspecting metrics

Every persisted run directory can be analysed offline with `symposium
metrics`, which computes the §7.9 MVP observability set (token / cost
usage per agent and per `(provider, model)`, latency per invocation,
participation per round, branch depth, deferred-queue length, panel
contractions, schema-failure counts, termination reason, the
`usage_estimated` flag) and writes `metrics.json` next to the
artifact:

```bash
symposium metrics runs/demo-walking-skeleton-001
# → runs/demo-walking-skeleton-001/metrics.json (full breakdown)
# → stdout: one-screen human-readable summary
```

The §7.9 set is deliberately MVP — `role_purity_score`,
`disagreement_frequency`, `interaction_graph`,
`delegation_frequency`, per-invocation provider-retry counts and a
live `observability_event` stream are §7.10 v1+ extensions and
formally deferred. The MVP set is fully derivable from the persisted
`artifact.json` alone; no live event bus required.

The CLI resolves each agent's `provider` string through the adapter
registry (§6.11). Built-in registrations: `openai`, `anthropic`, and
— when `--script` is given — `fake`. Plug your own adapter in by
registering a factory before the run.

### Re-running a session

`symposium replay` (above) is the §7.5 **`transcript_replay`** — it
re-renders the *stored* `canonical_transcript` and is byte-identical
unconditionally (no model call). `symposium execution-replay` is the
§7.6 **`execution_replay`** — it *re-runs the orchestrator* against the
original `problem_statement` / `Config` to regenerate a fresh transcript,
and is reproducible only when every non-deterministic source is pinned
(the ten **pinning conditions** of §7.6: runtime, adapter, provider,
model, sampling, cache, tool_env, wallclock, persona, transcript_prefix).

```bash
symposium execution-replay runs/demo-walking-skeleton-001 \
--script examples/scripts/walking-skeleton.json \
--output runs/
# → runs/demo-walking-skeleton-001-replay/ (fresh run, distinct session id)
# → digest=match | digest=MISMATCH (first_divergence=…)
```

Before touching the runtime it checks every pinning condition decidable
offline and **aborts** with a `pinning_violation` diagnostic (naming the
exact condition) on the first one that cannot be satisfied — §7.6
forbids silent best-effort replay. Exit codes: `0` digest match, `3`
pinning violation, `4` digest mismatch, `1` any other error.

Reproducibility is conditional, not free (§7.8: *replayable ≠
reproducible*). Two runtime-allocated fields feed the digest but aren't
produced by the provider — `Message.id` (`uuid4`) and `Message.timestamp`
(wall-clock). `execution-replay` pins both to the values recorded in the
original transcript (§7.6 condition #8's *fixed clock source* + §9.4.1's
*deterministic id allocator*), so a deterministic `FakeProvider` run
reproduces its digest exactly — no special recording step required. A
re-execution that genuinely diverges (different content, count, or
routing) desyncs from the recorded sequence and reports a mismatch with
the first diverging message id, never a spurious match. A caller can
override the timestamp source with `fixed_clock` (a library knob).

### Library use

```python
from symposium import Config, FakeProviderScript
from symposium.providers import FakeProvider, default_registry
from symposium.scheduler import run_session

# Fake-driven: pass an explicit per-agent map
artifact = run_session(config, {"default": FakeProvider(script=script)},
runs_root="runs/")

# OpenAI-driven: build providers from the registry
providers = default_registry().build_session_providers(config)
artifact = run_session(config, providers, runs_root="runs/")

print(artifact.transcript_digest) # 64-hex JCS-SHA-256 digest
print(artifact.outcome.kind) # "synthesis" or "termination"

# §7.6 execution_replay — re-execute under the ten pinning conditions and
# compare the fresh digest. ids/timestamps are replayed from the recording,
# so a deterministic run reproduces its digest with no extra setup.
from symposium.replay import execution_replay, PinningViolation

try:
result = execution_replay("runs/" + config.session_id,
providers={"default": FakeProvider(script=script)})
print(result.digest_matches) # True — every pinning condition satisfied
print(result.conditions_checked, result.conditions_assumed)
except PinningViolation as exc:
print("aborted on §7.6 condition:", exc.condition)
```

---

## Use in Claude Code (MCP server)

Symposium ships an optional **MCP server** that exposes the runtime as
tools, so a Claude client (Claude Code, Claude Desktop, claude.ai) can
launch a structured deliberation and read back its result, replay status,
and metrics — over the same `run_session(...)` API, with no changes to the
runtime or the protocol.

```bash
# Install with the optional MCP extra
pip install "symposium-protocol[mcp]"
# …or from the released tag:
pip install "symposium-protocol[mcp] @ git+https://github.com/terrordrummer/symposium@v1.6.0"

# Register the stdio server with Claude Code
claude mcp add symposium -- symposium-mcp
```

For **Claude Desktop**, add the server to your `mcpServers` config
(`claude_desktop_config.json`). Set `ANTHROPIC_API_KEY` (or
`OPENAI_API_KEY`) in `env` when you want real-provider deliberations; omit
it for fake-driven, deterministic runs:

```json
{
"mcpServers": {
"symposium": {
"command": "symposium-mcp",
"env": { "ANTHROPIC_API_KEY": "sk-ant-..." }
}
}
}
```

The server exposes six tools:

- **`deliberate(problem, …)`** — build a `Config` from arguments (panel
persona ids resolved into inline personas exactly as the CLI does), run a
session, and return `{outcome, synthesis_answer | termination_reason,
selected_agents, transcript_digest, cumulative_usage, run_dir, rounds}`.
- **`deliberate_streaming(problem, …)`** — same arguments and same final
result as `deliberate`, but streams each turn **live as the panel
produces it** (every agent turn, each coordinator verdict, the final
synthesis) via MCP progress + log notifications, so you can follow the
discussion as it evolves instead of waiting for the whole session.
- **`deliberate_adaptive(problem, *, experts=None, max_expansions=2, …)`** —
deliberate with **dynamic agent generation**. *Early-start*: each
capability in `experts` (free-text needs) becomes a generated domain
persona added to the panel before the first session. *Runtime*: if a
session terminates asking for help (`user_input_required` /
`external_research_required`), a persona is generated for that need and
the deliberation continues in a fresh session with the augmented panel
(up to `max_expansions`). Returns `{final, sessions, generated_agents,
expansions, panel_final}`. Host-orchestrated over the frozen runtime.
- **`generate_persona(need, …)`** — design one new expert `Persona` for a
capability gap (constrained to the `Persona` JSON Schema, validated) and
return it, to use as a `panel` member.
- **`get_run_summary(run_dir)`** — load a persisted run, recompute the §7.9
metrics, verify the §7.5 transcript replay, and return a compact summary.
- **`list_personas()`** — the six built-in personas (R3 default panel +
coordinator) to use as `panel` / `coordinator` arguments.

A typical `deliberate` call from a Claude client:

```jsonc
// default: route each persona across the installed terminal CLIs — NO API
// key (provider="cli-auto"): visionary → codex, the rest → claude, with
// fallback to whichever CLI is installed
deliberate(problem="Should we adopt a structured deliberation protocol?")

// force a single terminal CLI for all agents
deliberate(problem="…", provider="claude-cli") // or "codex-cli"

// real HTTP API instead (reads ANTHROPIC_API_KEY from the env)
deliberate(problem="…", provider="anthropic")

// deterministic, network-free (used by the tests and demos)
deliberate(
problem="demo",
provider="fake",
fake_script_path="examples/scripts/walking-skeleton.json"
)
```

**No API key needed.** The default `provider="cli-auto"` runs each panel
turn through a locally-installed terminal CLI, reusing its existing login
(OAuth/keychain) — no `ANTHROPIC_API_KEY`/`OPENAI_API_KEY`. It routes by
persona — the lateral/creative **visionary** to `codex-cli`
(`codex exec --output-schema …`, model **`gpt-5.5`** with reasoning effort
`max`), the technical/systematic personas (logician, engineer, researcher,
critic, coordinator) to `claude-cli` (`claude -p --output-format json
--json-schema …`, model **`opus`** — alias for the latest Opus on the
local CLI, currently 4.7) — and **falls back** to whichever CLI is
actually installed (only `claude` installed → the whole panel runs on
claude, and vice-versa). Force one CLI with `provider="claude-cli"` /
`"codex-cli"`. Per-call timeout is **600s** (was 180s through v1.10.2,
which empirically timed out mid-turn on multi-paragraph technical prompts
that produce 10+ internal iterations); session wallclock defaults to
**1800s** (30 min) for a full 5-agent × 4-round panel to have room to
complete.

**Hosted-inside-Claude-Code safety.** When the Symposium runtime is itself
hosted inside a Claude Code session (eg. via the `symposium-mcp` server
launched as an MCP child), the CLI adapters spawn each turn with a
**headless child environment**: (1) nested-Claude-Code markers
(`CLAUDECODE`, `CLAUDE_CODE_ENTRYPOINT`/`EXECPATH`/`SESSION_ID`/
`PROVIDER_MANAGED_BY_HOST`), effort overrides
(`CLAUDE_CODE_EFFORT_LEVEL` / `CLAUDE_EFFORT`), and bare-mode markers
(`CLAUDE_CODE_SIMPLE`) are stripped before each spawn; (2)
`CLAUDE_CODE_DISABLE_CLAUDE_MDS`, `CLAUDE_CODE_DISABLE_AUTO_MEMORY`,
`CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC`, and
`CLAUDE_CODE_DISABLE_BACKGROUND_TASKS` are set to `1` to suppress the
child's *own* auto-loads (the CLAUDE.md auto-discovery walk alone can
turn a sub-second deliberation turn into a multi-minute hang against a
populated `~/.claude/` and Workspace tree). The child CLI then takes
its normal headless `-p` / `exec --json` path and does not inherit the
parent's session state or extended-thinking effort. `ANTHROPIC_*`,
`CODEX_HOME`, `PATH`, locale, and proxy / cert vars are preserved. The
codex adapter also passes `--ignore-user-config --ignore-rules` by default
(opt-out via `isolated=False`; requires codex CLI ≥ 0.122.0). The claude
adapter offers an opt-in `bare=True` for full headless mode — off by
default, because `--bare` disables OAuth/keychain and requires an
`ANTHROPIC_API_KEY`.

**Billing.** When a CLI is logged in with a **subscription** (Claude
Pro/Max for `claude`, a ChatGPT plan for `codex`), turns run against that
subscription's usage and **rate limits — not metered, per-token API
billing**. There is no separate dollar charge to an API account; you are
spending subscription quota, so a full panel (≈ one call per turn) and
especially `deliberate_adaptive` (multiple linked sessions) consume that
quota faster and can hit plan limits. The `cost_usd` Symposium records for
a CLI turn is an **API-equivalent reference** (what the tokens would cost
at API rates), reported as estimated — not a bill. (Only if a CLI is
authenticated via an **API key** instead of a subscription login is the
usage metered.) Use `provider="fake"` for free, deterministic, offline
demos. The HTTP adapters (`anthropic`, `openai`) call the metered API and
do read an API key. Both CLI providers also work from the plain CLI:
`provider: claude-cli` / `codex-cli` in a config's agents.

The `mcp` dependency is optional: `import symposium` and the `symposium`
CLI work without it. See `symposium/integrations/mcp_server.py`.

---

## What's in this repo

```
.
├── docs/
│ ├── specification.md # The protocol (normative, ~6440 lines)
│ ├── repository-strategy.md # Reference-impl conventions (non-normative)
│ └── schemas/v1.0.0/ # 16 JSON Schemas (Draft 2020-12)
│ └── examples/ # 28 positive + 36 negative fixtures + validators
├── symposium/ # Reference Python runtime
│ ├── models.py # Pydantic models mirroring the JSON Schemas
│ ├── providers/ # ProviderAdapter + registry + Fake/OpenAI/Anthropic/Claude-CLI/Codex-CLI adapters
│ ├── selector/ # §4.1 selector: fixed / rules / llm strategies
│ ├── scheduler/ # §4.11 pseudocode → executable loop
│ ├── storage/ # Run directory layout + JCS digest
│ ├── replay/ # transcript_replay (§7.5) + execution_replay (§7.6)
│ ├── observability/ # §7.9 MVP metric set (offline)
│ ├── personas/ # MVP default panel (R3)
│ ├── integrations/ # Host integrations — MCP server (`symposium-mcp`)
│ └── cli/ # `symposium` command
├── examples/ # Walking-skeleton + rules/llm selector configs + scripts
├── tests/ # pytest suite (FakeProvider determinism,
│ # scheduler invariants, e2e schema
│ # validation, replay byte-identity)
├── pyproject.toml
├── .github/workflows/ # validate (CI) + release (publish on tag)
├── CONTRIBUTING.md
├── ROADMAP.md # thin pointer to spec §12 (normative roadmap)
├── LICENSE # Apache 2.0
└── README.md
```

**What's normative**: `docs/specification.md` §1–§9 + the JSON Schemas
under `docs/schemas/v1.0.0/`. A conformant Symposium runtime satisfies
every MUST / MUST NOT there and validates against the schemas. Sections
§10–§13 are positioning, integration, roadmap, and vision (non-binding).
§14 is a thin pointer to the non-normative companion.

**What's reference, not normative**: everything under `symposium/`,
`examples/`, and `tests/`. The Python package is one valid implementation
of the protocol; a different runtime in a different language is equally
valid as long as it conforms to the spec.

---

## Conformance check

Two validators ship with the schemas. Any contributor or implementor
can re-run them locally:

```bash
cd docs/schemas/v1.0.0/examples
pip install "jsonschema==4.26.0" "referencing>=0.35" "rfc8785>=0.1.4"
python3 validate.py # 28/28
python3 validate_negative.py # 36/36
```

The reference runtime's own test suite (pytest) cross-checks the
artifact it emits against those same schemas:

```bash
pip install -e ".[test]"
pytest -q
```

CI runs both on every push and every pull request (see badge above).

---

## Reading order

If you only want the gist, the first 200 lines of the spec are enough:
§1 (conformance surface), §2 (vocabulary), §3 (overview + non-goals).

If you intend to implement: §1 → §2 → §4 (runtime + scheduler) →
§5 (schemas) → §6 (provider/tool adapter contract) → §7
(persistence + replay) → §8 (budget + failure + security) →
§9 (testing harness). §4.11 is the canonical pseudocode.

If you want to compare against existing frameworks: §10 covers
AutoGen, CrewAI, LangGraph, and OpenAI Agents SDK.

---

## Status

**v1.0 — specification frozen 2026-05-26.** Ratified by joint adversarial
review (10 passes, bilateral sign-off). The 16 JSON Schemas under
`docs/schemas/v1.0.0/` are pinned at this version. Forward-compatible
changes will publish under `docs/schemas/v1.1.0/` etc., per the
versioning policy in §5.1.

Issues, errata, and discussion: use the GitHub issue tracker.

## License

[Apache 2.0](LICENSE).