An open API service indexing awesome lists of open source software.

https://github.com/cumbof/team

Orchestrate a cluster of containerized local LLMs — each with its own persona, role, and goal — that collaborate until the work is done.
https://github.com/cumbof/team

agent agent-orchestration agent-team agentic-ai ai ai-agents ai-team ai-workflow cli containerized docker llm multiagent ollama

Last synced: 4 days ago
JSON representation

Orchestrate a cluster of containerized local LLMs — each with its own persona, role, and goal — that collaborate until the work is done.

Awesome Lists containing this project

README

          

# team

Orchestrate a cluster of containerized local LLMs — each with its own
persona, role, and goal — that collaborate until the work is done.

![PyPI - Version](https://img.shields.io/pypi/v/team-core)
![Build Status](https://img.shields.io/github/actions/workflow/status/cumbof/team/tests.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/cumbof/team/blob/main/LICENSE)

![team](https://raw.githubusercontent.com/cumbof/team/refs/heads/main/assets/logo.png)

⭐ Star this repository to stay updated with new releases ⭐





`team` lets you describe a small "organisation" of LLMs in a single YAML
file and then bring it to life: every member runs in **its own isolated
Docker container** with its own [Ollama](https://ollama.com/) daemon and
its own model, the orchestrator drives a turn-based conversation between
them, and the members produce real artifacts (code, manuscripts, reports,
…) in a shared workspace.

You can mix and match model sizes per role — e.g. a 70B generalist as a
Principal Investigator, a 7B coder as a Data Scientist, an 8B model as a
Reviewer — and pick a workflow that matches how the work should flow:
**round-robin**, **manager-driven**, or **review-loop until consensus**.

> [!WARNING]
>
> **Work in Progress:** This repository is currently under active development.
> While the core functionality is present, some features may be incomplete or
> not fully work as expected, and you may encounter unexpected bugs. Please
> test thoroughly before using this in any critical pipelines.

> [!NOTE]
>
> A significant portion of the code and documentation in this repository
> was written **with the assistance of a Large Language Model (LLM)**.
> All LLM-generated contributions have been reviewed, tested, and curated
> by the human maintainers, but — as with any software — bugs may exist.
> Please review the code critically, run the test suite, and open an issue
> if you find something unexpected.
>
> **Pull requests are very welcome**, including those written or
> co-authored with the help of an LLM. We only ask that you review and
> test your changes before submitting, and disclose AI assistance in your
> PR description (e.g. *"co-authored with GitHub Copilot"*) so reviewers
> can calibrate their review accordingly.

---

## Feature overview

| Feature | Description |
| --- | --- |
| **Containerised members** | Every LLM runs in its own Docker + Ollama container with configurable CPU, RAM, and GPU limits. |
| **Flexible workflows** | `round_robin`, `manager`, `review_loop`, `sequential_chain`, `debate`, `parallel_review` — pick or combine. |
| **Shared workspace** | Members read and write real files (code, reports, data) to a host directory. |
| **Agent tool use** | 19 built-in tools (Python, Bash, web search, file I/O, memory, beliefs, decisions, delegation); `tool_mode: text` (fenced blocks) or `tool_mode: native` (OpenAI/Ollama function-calling API with JSON Schema); extend with custom skills. |
| **Predefined persona library** | 16 ready-made personas (`@pi`, `@engineer`, `@reviewer` …) stored as individual YAML files in `personas/`; extend with your own via `TEAM_PERSONA_DIR`. |
| **Per-agent persistent memory** | SQLite-backed memory that survives between runs; agents `remember` and `recall` across sessions. |
| **Shared team belief board** | Structured collective knowledge with confidence scores, voting, and consensus tracking. |
| **Cross-team federation (bridge)** | Two independent `team` clusters can delegate tasks to each other over HTTP — academic-lab-style collaboration. |
| **Shared institutional context** | Drop a `context.md` in the workspace root and every member sees it on every turn — no per-member config needed. |
| **Decision log** | Members call `log_decision` to append timestamped, rationale-rich entries to `decisions.md`; any member can `read_decisions` at any time. |
| **Workspace time-travel** | `team rollback` restores the workspace to any past checkpoint and lets you resume from there. |
| **Human-in-the-loop** | Interrupt a live run, read the transcript, inject a message, and let the team continue. |
| **OpenAI-compatible backends** | Swap Ollama for any OpenAI-compatible API (GPT-4o, Mistral, Together AI, …) per member. |
| **Context window management** | `sliding_window`, `truncate`, or `summarize` strategies keep long runs within token budgets. |
| **Workspace checkpoints** | Automatic snapshots before every member turn; `team restore` rolls back to any point. |
| **Run statistics & reports** | Per-member token usage, turn counts, elapsed time — exportable as a Markdown report. |
| **Interactive wizard** | `team new` walks you through YAML creation. |
| **Structured JSON output** | Force a member to reply with valid JSON; optionally validate against a JSON Schema with automatic retry. |
| **Per-turn timeout** | Hard wall-clock deadline per member turn; raises `TurnTimeoutError` if the LLM doesn't respond in time. |
| **`team test`** | Define assertions in the YAML and run them automatically after a team workflow to verify outputs in CI. |
| **Parallel member execution** | `workflow: type: parallel` — all members run simultaneously in each round, bounded by the slowest rather than the sum. |
| **`team replay`** | Step through a saved transcript turn-by-turn in an interactive terminal viewer; navigate, search by speaker, and view stats. |
| **Token budget** | Hard-cap total tokens per member per run; gracefully stops with `TokenBudgetError` when exhausted. |
| **Conditional routing** | Members declare the next speaker via simple YAML rules (`if_contains`, `if_match`, `default`), enabling dynamic branching and state-machine-like workflows. |
| **LLM retry with backoff** | Automatic retry with exponential backoff on transient errors (5xx, connection refused, timeout); configurable per member. Raises `LLMRetryExhaustedError` when all attempts fail. |
| **Cost estimation** | Estimated USD cost displayed in the token-usage table after every run (`team run`, `team stats`). Built-in pricing for OpenAI, Anthropic, Google, and Mistral; local Ollama models show `$0.00 (local)`. |
| **Multi-team pipelines** | Chain multiple team runs with `team pipeline`; upstream artifacts and transcript summaries are automatically injected into downstream stages via `inject_files`, `inject_context`, and `goal_override` templates. |
| **Team registry (service discovery)** | A lightweight HTTP directory where running team clusters advertise their capabilities (tags, models, tools). Other teams discover and delegate to specialist clusters via `query_registry` or the CLI. |
| **Federated belief board** | Independent team clusters share their collective knowledge across the bridge. Pull accepted beliefs from a partner team (they arrive as pending for local consensus), push local beliefs outward, or bidirectional sync — via the `sync_beliefs` tool or `team beliefs-sync` CLI. |

---

## Table of contents

- [Why?](#why)
- [How it works](#how-it-works)
- [Requirements](#requirements)
- [Installation](#installation)
- [Quick start](#quick-start)
- [Defining a team](#defining-a-team)
- [Top-level fields](#top-level-fields)
- [`defaults`](#defaults)
- [`workflow`](#workflow)
- [`members`](#members)
- [The collaboration protocol](#the-collaboration-protocol)
- [Predefined persona library](#predefined-persona-library)
- [How personas are stored](#how-personas-are-stored)
- [Available personas](#available-personas)
- [Using a persona in YAML](#using-a-persona-in-yaml)
- [Adding your own personas](#adding-your-own-personas)
- [Workflows](#workflows)
- [Workspaces and artifacts](#workspaces-and-artifacts)
- [Containers, isolation, and root](#containers-isolation-and-root)
- [GPU support](#gpu-support)
- [Apple Silicon / no-Docker Ollama](#apple-silicon--no-docker-ollama)
- [OpenAI-compatible backends](#openai-compatible-backends)
- [Remote / no-Docker Ollama](#remote--no-docker-ollama)
- [Custom Ollama image](#custom-ollama-image)
- [Context window management](#context-window-management)
- [Model retention (`keep_alive`)](#model-retention-keep_alive)
- [CLI reference](#cli-reference)
- [Interactive wizard](#interactive-wizard)
- [Pre-flight checks](#pre-flight-checks)
- [Streaming output](#streaming-output)
- [Per-turn timeout](#per-turn-timeout)
- [LLM retry with backoff](#llm-retry-with-backoff)
- [Resuming an interrupted run](#resuming-an-interrupted-run)
- [Human-in-the-loop intervention](#human-in-the-loop-intervention)
- [Agent mode and tool use](#agent-mode-and-tool-use)
- [Available built-in tools](#available-built-in-tools)
- [Custom skill plugins](#custom-skill-plugins)
- [Shared institutional context](#shared-institutional-context)
- [Decision log](#decision-log)
- [Structured JSON output](#structured-json-output)
- [Conditional routing](#conditional-routing)
- [Token budget](#token-budget)
- [Per-agent persistent memory](#per-agent-persistent-memory)
- [Enabling memory](#enabling-memory)
- [Memory tools](#memory-tools)
- [Memory config reference](#memory-config-reference)
- [Shared team belief board](#shared-team-belief-board)
- [Enabling the belief board](#enabling-the-belief-board)
- [Belief tools](#belief-tools)
- [Inspecting beliefs with team beliefs](#inspecting-beliefs-with-team-beliefs)
- [Belief config reference](#belief-config-reference)
- [Workspace checkpoints](#workspace-checkpoints)
- [Workspace time-travel (`team rollback`)](#workspace-time-travel-team-rollback)
- [Token usage tracking](#token-usage-tracking)
- [Cost estimation](#cost-estimation)
- [Run statistics](#run-statistics)
- [Exporting a run report](#exporting-a-run-report)
- [`team replay` — interactive transcript browser](#team-replay--interactive-transcript-browser)
- [Automated testing with `team test`](#automated-testing-with-team-test)
- [Multi-team pipelines](#multi-team-pipelines)
- [Cross-team collaboration (bridge)](#cross-team-collaboration-bridge)
- [How it works](#how-it-works-1)
- [Exposing a team as a bridge server](#exposing-a-team-as-a-bridge-server)
- [Delegating work from another team](#delegating-work-from-another-team)
- [Named peer registry](#named-peer-registry)
- [Broadcasting to multiple teams](#broadcasting-to-multiple-teams)
- [Cancelling a remote task](#cancelling-a-remote-task)
- [Server HTTP API reference](#server-http-api-reference)
- [Bridge config reference](#bridge-config-reference)
- [Security — HMAC-SHA256 shared secret](#security--hmac-sha256-shared-secret)
- [Additional security considerations](#additional-security-considerations)
- [Team registry (service discovery)](#team-registry-service-discovery)
- [Federated belief board](#federated-belief-board)
- [Examples](#examples)
- [Architecture overview](#architecture-overview)
- [Development](#development)
- [Troubleshooting](#troubleshooting)
- [License](#license)

---

## Why?

A single LLM is a generalist. Real work — research, engineering, writing —
is usually done by **several specialists** that disagree, revise, and
converge. `team` makes it easy to assemble such a group locally:

* **Heterogeneous models, one per role.** Use a small, fast model for
routine tasks and a large model only where it matters.
* **Strong isolation.** Every member is a separate `ollama serve`
process in a separate container, on a private Docker network, with its
own model cache. A misbehaving member cannot reach into another's
filesystem, network namespace, or model store.
* **Real deliverables.** Members write actual files (code, prose, data)
into a shared workspace; you keep them after the run.
* **Pluggable workflows.** Pick how the team coordinates — and add your
own in a few lines of Python.

---

## How it works

```
┌────────────────── orchestrator (host) ───────────────────┐
│ │
│ transcript.jsonl shared workspace (./runs/) │
│ ▲ ▲ │
│ │ append every turn │ files written by members│
└────┬───┴────────────┬──────────┴─────────────┬───────────┘
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌───────────────────┐ ┌──────────────────┐
│ container: pi │ │ container: postdoc│ │ container: ... │
│ ollama serve │ │ ollama serve │ │ │
│ model: 70B │ │ model: 8B │ │ │
│ /workspace (ro+) │ │ /workspace (ro+) │ │ /workspace (ro+) │
│ /private │ │ /private │ │ /private │
└──────────────────┘ └───────────────────┘ └──────────────────┘
\\ | //
\\ | //
team--net (private bridge network)
```

For each member, the orchestrator:

1. Starts a dedicated Ollama container, on a per-team Docker network, with
the team's shared workspace bind-mounted at `/workspace` and a
per-member private workspace at `/private`.
2. Pulls the model the member is configured to use (cached in the
member's own named Docker volume).
3. Builds a system prompt from the member's persona, the team goal, the
list of teammates, and the [collaboration protocol](#the-collaboration-protocol).
4. Asks the chosen [workflow](#workflows) to drive the conversation.

At every turn the orchestrator hands the speaking member the **full
shared transcript** plus a snapshot of the workspace; the member's reply
is parsed for fenced `file:` blocks (which become real files on disk) and
for control tokens (`[[TEAM_DONE]]`, `NEXT: @`, `APPROVED`, …).

---

## Requirements

* **Linux** host (tested) — macOS works if Docker Desktop has enough
resources for your models.
* **Docker** (engine ≥ 20.10) reachable by the host user.
* **Python 3.9+**.
* For GPU acceleration: NVIDIA GPU + the
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
* **Disk and RAM/VRAM** sized for your largest model — Ollama itself is
small but model weights aren't.

---

## Installation

Install from PyPI:

```bash
pip install team-core
```

Or clone the repository for the latest development version:

```bash
git clone https://github.com/cumbof/team.git
cd team
python -m venv .venv
. .venv/bin/activate
pip install -e .
```

Installs the `team` CLI into your virtualenv. Verify:

```bash
team --version
team --help
```

For development extras (pytest):

```bash
pip install -e ".[dev]"
pytest -q
```

---

## Quick start

1. Generate a starter spec:

```bash
team init my-team.yaml
```

2. Edit `my-team.yaml`: pick model names that exist in Ollama, write a
real `goal`, and tweak the personas.

3. Run it end-to-end (containers come up, models get pulled if needed,
workflow runs, containers come down):

```bash
team run my-team.yaml
```

4. Inspect the deliverables:

```bash
ls runs/my-team/shared/
team transcript my-team.yaml
```

5. Or manage the lifecycle by hand:

```bash
team up my-team.yaml # start all member containers
team status my-team.yaml # show container state
team logs my-team.yaml # tail Ollama logs per member
team run my-team.yaml --no-up --keep-up # run more rounds
team run my-team.yaml --resume # resume after a crash
team down my-team.yaml --purge # tear down + delete model caches
```

---

## Defining a team

A team is a single YAML file. Annotated minimal example:

```yaml
name: my-team # [a-z][a-z0-9_-]{0,30}
goal: |
Plain-English statement of what the team must accomplish.

workspace: ./runs/my-team # host directory; created on demand

workflow:
type: round_robin # round_robin | manager | review_loop
max_rounds: 6

defaults:
ollama_image: ollama/ollama:latest
context_window: 8192
temperature: 0.4
gpus: none # "all" | "none" | [0, 1, ...]
memory_limit: "16g" # optional Docker memory cap per member
cpu_limit: 4 # optional Docker CPU cap per member (cores)
pull_timeout: 1800
request_timeout: 600

members:
- name: lead
role: Project Lead
model: llama3.1:8b
persona: |
You coordinate the team.
- name: worker
role: Engineer
model: qwen2.5-coder:7b
persona: |
You implement code and produce concrete artifacts.
```

### Top-level fields

| field | required | description |
| --- | --- | --- |
| `name` | yes | DNS-safe team name; used in container/volume/network names. |
| `goal` | yes | The shared objective every member sees in its system prompt. |
| `workspace` | no | Host directory for shared/private workspaces and the transcript. Defaults to `./runs/`. |
| `workflow` | no | See below. Defaults to `round_robin` with 6 rounds. |
| `defaults` | no | Defaults inherited by every member that doesn't override them. |
| `members` | yes | Non-empty list of member specs (see below). |

### `defaults`

| key | type | default | meaning |
| --- | --- | --- | --- |
| `ollama_image` | string | `ollama/ollama:latest` | Image used for member containers. |
| `context_window` | int | `8192` | `num_ctx` passed to Ollama (`/api/chat` `options`). |
| `temperature` | float | `0.4` | Sampling temperature. |
| `top_p` | float | `0.9` | Top-p sampling. |
| `memory_limit` | string | unset | Docker `mem_limit` per member (e.g. `"12g"`). |
| `cpu_limit` | float | unset | Docker CPU cap per member (cores; e.g. `4`). |
| `gpus` | str / list | `none` | `"all"`, `"none"`, or list of GPU indices. |
| `pull_timeout` | int | `1800` | Seconds allowed for a model pull. |
| `request_timeout` | int | `600` | HTTP timeout per chat call. |
| `backend` | string | `ollama` | LLM backend: `"ollama"` or `"openai_compat"`. |
| `api_key` | string | unset | API key for `openai_compat` backend; supports `"env:VAR"`. |
| `context_strategy` | string | `none` | Context management: `"none"`, `"sliding_window"`, `"truncate"`, `"summarize"`. |
| `context_budget` | int | `0` | Budget for context management: max turns (`sliding_window`) or approx token count (`truncate`/`summarize`). |
| `tools` | list | `[]` | Built-in tools enabled for all members by default. |
| `max_tool_rounds` | int | `10` | Maximum agentic tool-call rounds per member turn. |
| `tool_timeout` | int | `300` | Seconds budget per individual tool execution (generous default to allow package installs). |
| `tool_mode` | string | `"text"` | Tool invocation mode: `"text"` (fenced blocks) or `"native"` (LLM function-calling API). |
| `skills` | list | `[]` | Skill plugin sources (local paths or remote URLs) available to all members. |
| `ollama_url` | string | unset | Route **all** members to an existing Ollama instance at this URL instead of starting Docker containers. Per-member `ollama_url` overrides this. See [Apple Silicon / no-Docker](#apple-silicon--no-docker-ollama). |
| `keep_alive` | string | `"-1"` | How long Ollama keeps a model loaded in RAM after a request. `"-1"` (default) means keep forever — models stay resident between turns. Accepts any Ollama duration string (`"5m"`, `"1h"`) or `"0"` to unload immediately after each call. |

### `workflow`

```yaml
workflow:
type: review_loop
max_rounds: 4
producer: postdoc
reviewer: reviewer
approve_token: APPROVED # only review_loop; default "APPROVED"
manager: tech_lead # only when type=manager
prompt_template: | # only sequential_chain; {prev_speaker} and {prev_content} available
@{prev_speaker} produced the following. Refine it:
{prev_content}
```

| `type` | extra options |
| --- | --- |
| `round_robin` | none |
| `manager` | `manager: ` |
| `review_loop` | `producer: `, `reviewer: `, optional `approve_token` |
| `sequential_chain` | optional `prompt_template` (supports `{prev_speaker}`, `{prev_content}`) |
| `debate` | `pro: `, `con: `, `judge: `, optional `rounds` |
| `parallel_review` | `producer: `, `reviewers: [m1, m2, …]` (≥2), `synthesizer: `, optional `approve_token` |

### `members`

| key | required | notes |
| --- | --- | --- |
| `name` | yes | DNS-safe; used as `@handle` in the protocol. |
| `role` | yes | Free-text role label. |
| `model` | yes | Any tag known to Ollama (`llama3.1:8b`, `qwen2.5-coder:7b`, …). |
| `persona` | yes | Free-text persona prompt; quoted block. |
| `temperature`, `top_p`, `context_window` | no | Per-member overrides of `defaults`. |
| `memory_limit`, `cpu_limit`, `gpus` | no | Per-member resource overrides. |
| `can_write_files` | no | Default `true`; set to `false` to forbid this member from creating files. |
| `extra_system` | no | Free-form text appended to the rendered system prompt. |
| `ollama_url` | no | Connect to an existing Ollama instance directly; skips Docker. |
| `backend` | no | `"ollama"` (default) or `"openai_compat"` — overrides `defaults.backend`. |
| `api_base` | no | Base URL for the OpenAI-compat API (required when `backend: openai_compat`). |
| `api_key` | no | API key; supports `"env:VAR"` to read from an environment variable. |
| `context_strategy` | no | Per-member override of context management strategy. |
| `context_budget` | no | Per-member override of context budget. |
| `tools` | no | List of tool names this member may use (e.g. `[web_search, run_python]`). |
| `max_tool_rounds` | no | Per-member override of the tool-round limit. |
| `tool_timeout` | no | Per-member override of the per-tool execution timeout (seconds, default 300). |
| `tool_mode` | no | Per-member override: `"text"` or `"native"` (default inherits from `defaults.tool_mode`). |
| `skills` | no | Member-specific skill sources merged with `defaults.skills`. |
| `keep_alive` | no | Per-member override for Ollama model retention (e.g. `"5m"`, `"-1"`). Inherits from `defaults.keep_alive` when absent. |

---

## The collaboration protocol

Every member receives a system prompt that includes a small,
deterministic protocol so the orchestrator can parse replies reliably:

* **Address a teammate**: prefix a section with `@:`.
* **Write or overwrite a file in the shared workspace**: emit a fenced
block with an `file:` info-string, e.g.

````
```file:manuscript/manuscript.md
# Title
...
```
````

The orchestrator atomically writes the body to that path under
`/shared/`. Path-traversal attempts (`..`) are rejected.
* **Private workspace**: each member has `/private` inside its container
(mapped to `runs//members//` on the host) for personal
scratch files, drafts, and notes that are not shared with the team.
The list of files currently in `/private` is shown at the top of each
of the member's turn prompts.
* **Declare the goal achieved**: end the reply with a line containing
exactly `[[TEAM_DONE]]`. Workflows interpret this as "stop now".
* **Manager workflow**: end the reply with `NEXT: @` to nominate
who speaks next.
* **Review-loop workflow**: the reviewer emits `APPROVED` (configurable)
when the deliverable is ready.

---

## Predefined persona library

Writing a good persona from scratch takes time. `team` ships with
**16 ready-made personas** spanning academic research, software engineering,
and general-purpose roles. Each persona lives in its own YAML file under
`personas/` at the root of this repository — making them easy to read,
edit, and contribute back to the project.

### How personas are stored

```
personas/
├── pi.yaml # Principal Investigator
├── postdoc.yaml # Postdoctoral Researcher
├── phd.yaml # PhD Student
├── reviewer.yaml # Critical Reviewer
├── statistician.yaml # Statistician
├── bioinformatician.yaml
├── ml_researcher.yaml
├── architect.yaml
├── engineer.yaml
├── qa.yaml
├── devops.yaml
├── tech_writer.yaml
├── analyst.yaml
├── writer.yaml
├── manager.yaml
└── ethicist.yaml
```

Each file follows the same simple format:

```yaml
role: Principal Investigator
description: Lab director — sets research direction, evaluates results, writes grants.
persona: |
You are a tenured Principal Investigator at a research university.
Your role is to set and guard the scientific direction of the project.
...
```

The filename stem (e.g. `pi` from `pi.yaml`) becomes the `@`-key used in team
YAML files.

### Available personas

| Key | Role | Description |
| --- | --- | --- |
| `@pi` | Principal Investigator | Lab director — sets research direction, evaluates results, writes grants. |
| `@postdoc` | Postdoctoral Researcher | Senior researcher — deep expertise, drives experiments and analysis. |
| `@phd` | PhD Student | Junior researcher — literature review, baseline experiments, drafting. |
| `@reviewer` | Critical Reviewer | Peer-review skeptic — challenges assumptions, finds weaknesses. |
| `@statistician` | Statistician | Statistical methodologist — study design, power, inference correctness. |
| `@bioinformatician` | Bioinformatician | Omics data specialist — pipelines, databases, variant/sequence analysis. |
| `@ml_researcher` | Machine Learning Researcher | ML specialist — model design, training, evaluation, ablations. |
| `@architect` | Software Architect | System designer — API contracts, scalability, tech decisions. |
| `@engineer` | Software Engineer | Implementer — writes production-quality code, debugs, reviews PRs. |
| `@qa` | QA Engineer | Quality assurance — test strategy, edge cases, regression detection. |
| `@devops` | DevOps / SRE | Infrastructure and reliability — CI/CD, monitoring, deployment. |
| `@tech_writer` | Technical Writer | Documentation specialist — clarity, structure, audience-appropriate prose. |
| `@analyst` | Data Analyst | Data explorer — EDA, visualisation, dashboards, business insights. |
| `@writer` | Science Writer | Communicator — translates technical findings into compelling narratives. |
| `@manager` | Project Manager | Coordinator — milestones, blockers, stakeholder communication. |
| `@ethicist` | AI / Research Ethicist | Ethics and compliance — bias, fairness, privacy, responsible use. |

Browse the library from the terminal:

```bash
team personas # list all personas with key, role, description
team personas pi # print the full persona text for @pi
team personas engineer # print the full persona text for @engineer
```

### Using a persona in YAML

Set `persona` to `@` instead of writing a persona block:

```yaml
members:
- name: alice
model: llama3.1:70b
persona: "@pi" # role is set to "Principal Investigator" automatically
- name: bob
model: llama3.1:8b
persona: "@phd" # role is "PhD Student"
- name: carol
model: qwen2.5:7b
persona: "@reviewer" # role is "Critical Reviewer"
```

You can override the default role while keeping the library persona text:

```yaml
- name: alice
model: llama3.1:70b
persona: "@pi"
role: "Lab Director" # custom title; persona text stays the same
```

You can also mix library personas with fully custom ones in the same team:

```yaml
members:
- name: alice
model: llama3.1:70b
persona: "@pi"
- name: custom
role: Domain Expert
model: llama3.1:8b
persona: |
You are a specialist in protein crystallography with 20 years of
experimental experience. You validate all structural claims against
PDB data.
```

### Adding your own personas

**Option 1 — contribute to the built-in library** (share with everyone):

Drop a `.yaml` file into the `personas/` directory at the repo root and submit
a pull request. The file name becomes the `@`-key.

**Option 2 — project-local personas** (private to your setup):

Point `TEAM_PERSONA_DIR` at any directory; files there are loaded *in addition
to* the built-in library and take precedence over built-in keys with the same
name:

```bash
export TEAM_PERSONA_DIR=~/.team/personas
```

Then add files like `~/.team/personas/clinician.yaml`:

```yaml
role: Clinical Research Collaborator
description: Translates findings into clinical context and regulatory language.
persona: |
You are a physician-scientist with expertise in clinical trial design.
You translate pre-clinical findings into clinical hypotheses, identify
regulatory hurdles (FDA, EMA) early, and ensure the team's outputs are
framed for a clinical audience.
```

Any team YAML can now use `persona: "@clinician"` once the env var is set.

---

## Workflows

### `round_robin`

Every member speaks in declaration order. Repeat for `max_rounds` full
rounds, or until a member emits `[[TEAM_DONE]]`. Useful for brainstorms
and small symmetric teams.

### `manager`

A designated `manager` member opens the work, then after every other
member's turn the manager is asked again to evaluate progress and
nominate the next speaker via `NEXT: @`. The manager can also
take the floor itself, or end the run with `[[TEAM_DONE]]`.

### `review_loop`

A `producer` writes the first draft. A `reviewer` critiques it; the
producer revises; repeat until the reviewer emits `APPROVED` (or
`max_rounds` revisions are reached). When approved, the producer is
given one final turn to finalise and is expected to end with
`[[TEAM_DONE]]`. Ideal for any "make a deliverable, then iterate until
acceptable" workflow (papers, design docs, code).

### `sequential_chain`

Members form a **pipeline**: the first member runs with the default
prompt, then each subsequent member receives the previous member's full
reply as its explicit prompt. At the end of a round the chain wraps
around, so the first member of round N+1 receives the last member of
round N's output.

Use this when the work is a transformation series — for example:

* drafter → editor → translator → formatter
* researcher → summariser → chart-generator

Optional `prompt_template` controls how the handoff is framed; it can
use the `{prev_speaker}` and `{prev_content}` placeholders:

```yaml
workflow:
type: sequential_chain
max_rounds: 2
prompt_template: |
@{prev_speaker} produced the following output.
Your task is to refine and improve it:

{prev_content}
```

### `debate`

Two opposing members argue a proposition for N rounds, then a judge
member delivers a verdict.

```yaml
workflow:
type: debate
rounds: 3 # pro/con exchange rounds before the judge speaks (default: 3)
pro: alice # member arguing in favour
con: bob # member arguing against
judge: carol # member delivering the final verdict
```

1. The **pro** member makes an opening statement.
2. The **con** member rebuts.
3. Steps 1–2 repeat for `rounds` rounds.
4. The **judge** receives the full exchange and delivers a verdict.
5. Any member can end early by emitting `[[TEAM_DONE]]`.

### `parallel_review`

Like `review_loop` but all reviewers read the deliverable **at the same time**
(using a thread pool), so the total review wall-time is bounded by the
*slowest* reviewer, not the sum of all reviewers. A designated **synthesizer**
then consolidates the parallel reviews into one prioritised verdict, and the
**producer** revises.

```yaml
workflow:
type: parallel_review
max_rounds: 4 # max revision cycles before stopping
producer: writer # who creates and revises the deliverable
reviewers: # 2 or more members who review in parallel
- methods_reviewer
- stats_reviewer
- clarity_reviewer
synthesizer: editor # consolidates the parallel reviews (may equal producer)
approve_token: APPROVED # optional; default is "APPROVED"
```

**Flow per revision cycle:**

1. All reviewers are dispatched simultaneously; each receives the same
transcript snapshot and produces its review independently.
2. Reviews are appended to the transcript in declaration order.
3. The **synthesizer** reads all reviews and emits a consolidated verdict
(or `APPROVED` when no further changes are needed).
4. If approved, the producer finalises and emits `[[TEAM_DONE]]`.
5. Otherwise the producer addresses the feedback and the cycle repeats.

> **Thread-safety note:** Reviewer turns are truly parallel LLM calls.
> Each reviewer reads the transcript (read-only during the parallel window)
> and calls its own model. Reviewers should not use file-writing tools
> during their review turns to avoid concurrent workspace writes.

---

### `parallel`

All members speak **simultaneously** in every round. Unlike `parallel_review`
(which has a fixed producer → reviewers → synthesizer structure), `parallel`
is fully symmetric: every declared member runs at the same time, every round.

Each member receives the same transcript snapshot at the start of the round —
it cannot see what another member wrote *in the current round*, only in
previous rounds. After all threads complete, turns are appended in member
declaration order so the transcript is deterministic and `--resume` works.

```yaml
workflow:
type: parallel
max_rounds: 4
```

**When to use `parallel`**

- Independent expert panels — each member evaluates the problem from its own
perspective and writes its findings simultaneously.
- Embarrassingly parallel tasks — member A generates candidate A, member B
generates candidate B; a later sequential step (or `sequential_chain`) picks
the best.
- Speed-critical brainstorming where sequential dialogue would be too slow.

**Rendering**

The CLI shows a `⚡ parallel` separator banner before the round starts, then
renders each member's completed panel (with full content, file-write list, and
colour) when the round finishes — no token-by-token streaming during the
parallel window.

> **Thread-safety note:** Members read the transcript concurrently (safe) and
> write to the shared workspace. Concurrent writes to the *same file path*
> are a race condition. Design your team so that parallel members produce
> output in disjoint paths (e.g. `member_a/output.txt` vs `member_b/output.txt`).

---

## Workspaces and artifacts

For team `` with `workspace: ./runs/` you get:

```
runs//
├── transcript.jsonl # one JSON object per turn
├── shared/ # mounted as /workspace inside every container
│ └──
├── checkpoints/ # automatic point-in-time snapshots (one per live turn)
│ ├── 0001_alice_20240501T120000/
│ ├── 0002_bob_20240501T120145/
│ └── ...
└── members/
├── pi/ # mounted as /private inside the pi container
├── postdoc/
└── ...
```

* `shared/` is the canonical place for deliverables and is visible to
every member at every turn.
* `members//` is the **private workspace** for that member. Its
contents are listed in the member's turn prompt under *"Files in your
private workspace (/private)"*, so the member can reference its own
previous work, intermediate files, or notes across turns. Other members
cannot see these files.
* `transcript.jsonl` is appended to as the run progresses; one record per
turn, with `speaker`, `role`, `content`, `files_written`, and
`timestamp` fields.

`team transcript ` renders the transcript human-readably.

---

## Containers, isolation, and root

Each member runs in **its own container** with the following properties:

| property | value | rationale |
| --- | --- | --- |
| Image | `ollama/ollama:latest` (overridable) | Standard Ollama runtime. |
| User inside | **root** | Members have full root *inside their own filesystem*, satisfying "root inside the container" without granting host root. |
| Network | per-team Docker bridge `team--net`, isolated from other teams and from your host services | Members can only reach each other through the orchestrator, not directly. |
| Port exposure | `127.0.0.1::11434` | Each member's Ollama API is reachable only from the host loopback by the orchestrator. |
| Model cache | per-member named volume `team---models` | Members do *not* share model storage. |
| Mounts | shared workspace at `/workspace`, private workspace at `/private` | Conventional file-exchange surface. |
| Restart policy | `unless-stopped` | Survives daemon restarts during long runs. |
| Resource caps | `memory_limit`, `cpu_limit` honoured if set | Keep large models from starving the host. |

Containers are **not** run with `--privileged` and do not get any host
device access by default; root is confined to the container's mount and
PID namespaces. You can pass GPUs explicitly via `gpus` (see below).

---

## GPU support

Set `gpus` either globally (under `defaults`) or per-member:

```yaml
defaults:
gpus: all # all visible GPUs

members:
- name: pi
gpus: [0] # only GPU 0
- name: postdoc
gpus: none # CPU only
```

Requires the NVIDIA Container Toolkit on the host. Passed through to
Docker via device requests; non-NVIDIA setups can leave `gpus: none`.

### Apple Silicon / no-Docker Ollama

Docker Desktop on **macOS** runs a Linux VM that cannot access the host's
GPU (neither NVIDIA nor Apple Metal). Using `gpus: all` there produces:

```
could not select device driver "nvidia" with capabilities [[gpu]]
```

There are two escape hatches:

#### Option A — CPU-only containers (`--no-gpu`)

Pass `--no-gpu` to `team up` or `team run`. All containers are started
without GPU device requests and fall back to CPU inference inside Docker.
No YAML change required, but inference will be slow on large models.

```bash
team run myteam.yaml --no-gpu
team up myteam.yaml --no-gpu
```

#### Option B — Native host Ollama with Metal (recommended for Apple Silicon)

Install [Ollama for macOS](https://ollama.com) natively. The native app
uses **Apple Metal** for GPU acceleration and is dramatically faster than
CPU-only Docker containers. Then tell `team` to bypass Docker entirely and
connect all members to it:

**Via CLI flag** (no YAML change):

```bash
# Default URL is http://localhost:11434
team run myteam.yaml --host-ollama http://localhost:11434
team up myteam.yaml --host-ollama http://localhost:11434
```

**Via YAML** (permanent):

```yaml
defaults:
ollama_url: http://localhost:11434 # all members skip Docker
```

When `defaults.ollama_url` is set (or `--host-ollama` is passed), no Ollama
containers are started; the orchestrator connects directly to the given URL.
Per-member `ollama_url` overrides the default for individual members.

> **`team check` will report a `FAIL`** on macOS when GPU is requested
> without an `ollama_url` configured, and will guide you to one of the two
> options above.

---

## OpenAI-compatible backends

By default every member runs Ollama in a Docker container. You can instead
point any member at any **OpenAI-compatible API** — LM Studio, vLLM, llama.cpp
server, the real OpenAI API, Anthropic (via a LiteLLM proxy), etc. — without
Docker.

```yaml
defaults:
backend: openai_compat
api_base: http://localhost:1234/v1 # LM Studio
api_key: env:OPENAI_API_KEY # or a literal key

members:
- name: lead
role: Tech Lead
model: gpt-4o # model name sent to the API
persona: ...
- name: worker
role: Engineer
model: llama-3.1-8b-instruct
backend: ollama # this member still uses Docker
persona: ...
```

The `backend` and `api_base` fields can be set globally in `defaults` or
overridden per-member.

| field | meaning |
| --- | --- |
| `backend` | `"ollama"` (default) or `"openai_compat"` |
| `api_base` | Base URL of the OpenAI-compat API (e.g. `https://api.openai.com/v1`) |
| `api_key` | API key; use `"env:VAR"` to read from environment at runtime |

When `backend: openai_compat` is set, no Docker container is started for
that member — the orchestrator calls the remote API directly. The `model`
field is passed as-is to the API.

---

## Remote / no-Docker Ollama

If you already have an Ollama server running (locally or on a remote
machine), you can skip Docker for individual members by setting `ollama_url`:

```yaml
members:
- name: researcher
role: Researcher
model: llama3.1:70b
ollama_url: http://192.168.1.10:11434 # existing Ollama instance
persona: ...
```

To route **all** members to the same Ollama instance, set it in `defaults`
or pass `--host-ollama` on the command line (see
[Apple Silicon / no-Docker](#apple-silicon--no-docker-ollama)):

```yaml
defaults:
ollama_url: http://localhost:11434
```

No container is started for any member that has an effective `ollama_url`
(per-member or from `defaults`); the orchestrator connects directly to the
given URL. The model must already be pulled on that server (or Ollama's
automatic pull will fetch it on first use).

---

## Custom Ollama image

`docker/Dockerfile.ollama` is an optional, slightly-augmented image that
adds `python3`, `git`, `jq`, `curl`, and friends on top of
`ollama/ollama:latest` for members that want richer in-container
tooling. Build it once and reference it from any team:

```bash
docker build -f docker/Dockerfile.ollama -t team/ollama:latest docker/
```

```yaml
defaults:
ollama_image: team/ollama:latest
```

The default `ollama/ollama:latest` is fine for most uses.

---

## Context window management

By default the orchestrator passes the full transcript to every member
every turn. For long-running teams this can exceed a model's context
window, causing silent truncation or errors. Configure a strategy to
keep the context manageable:

```yaml
defaults:
context_strategy: sliding_window # none | sliding_window | truncate | summarize
context_budget: 20 # max turns (sliding_window) or ~token budget (truncate/summarize)
```

| strategy | behaviour |
| --- | --- |
| `none` (default) | Full transcript always sent. |
| `sliding_window` | Only the last `context_budget` turns are sent. |
| `truncate` | Oldest turns are dropped until the estimated token count fits within `context_budget`. A note is prepended explaining that earlier turns were omitted. |
| `summarize` | The oldest turns are compressed into a concise bullet-point digest by calling the member's own LLM (at temperature 0.2). The digest is prepended under a *"Summary of N earlier turn(s)"* heading; the most-recent turns are kept verbatim. 80 % of `context_budget` is reserved for recent turns, 20 % for the digest. Falls back to a plain omission notice if the summarization call fails. |

Override per member:

```yaml
members:
- name: reviewer
context_strategy: sliding_window
context_budget: 10 # this member sees only the last 10 turns
```

---

## Model retention (`keep_alive`)

By default, `team` sets Ollama's `keep_alive` to `"-1"` on every chat request, which tells Ollama to keep the model loaded in RAM indefinitely. Without this, Ollama's built-in default evicts a model after 5 minutes of inactivity — a problem for large models (tens of gigabytes) that must repeatedly load and unload between turns.

```yaml
defaults:
keep_alive: "-1" # keep every model loaded for the duration of the run (default)

members:
- name: summarizer
model: llama3.2:3b
keep_alive: "5m" # lightweight model — OK to evict after 5 minutes of idle
...
```

| Value | Behaviour |
| --- | --- |
| `"-1"` | Keep the model loaded until Ollama stops or another model claim evicts it. **Recommended for team runs.** |
| `"5m"`, `"1h"`, … | Evict after the given idle period (Ollama duration string). |
| `"0"` | Unload immediately after each request (maximises GPU headroom at the cost of reload latency). |

`keep_alive` is an Ollama-only parameter. When the `openai_compat` backend is used it is silently ignored.

---

## CLI reference

```text
team init [PATH] Write a starter team YAML.
team new [PATH] Interactive wizard to create a new team YAML.
team validate Parse and validate the YAML.
team check Run preflight checks (no Docker started).
team up Start containers, pull models.
[--no-gpu] [--host-ollama URL]
team status Show container status per member.
team logs Tail per-member Ollama logs.
[--member NAME] [--tail N]
team run Up + run workflow + (down).
[--no-up] [--keep-up] [--resume] [--no-stream] [--interactive]
[--no-gpu] [--host-ollama URL]
team transcript Render the persisted transcript.
team export Export transcript + artifacts to a report.
[--format markdown|html|json] [--output PATH] [--no-artifacts]
team checkpoints List all workspace checkpoints.
team restore Restore the shared workspace to a checkpoint.
team down Stop & remove containers (and volumes).
[--purge]
```

Common flags:

* `-v / --verbose` — debug-level logging.
* `--prepare-timeout SECONDS` (on `up`/`run`) — how long to wait for each
member's Ollama daemon to become ready and its model to finish pulling
(default 600).

---

## Interactive wizard

`team new` launches a guided wizard that asks you a series of questions
and writes a validated YAML:

```bash
team new my-team.yaml
```

The wizard prompts for:

* Team name and goal
* Number of members, and for each: name, role, model, persona
* Workflow type and max rounds
* Workspace path

The output is a fully-formed, validated YAML ready to use with `team run`.

---

## Pre-flight checks

Before starting containers, verify that the environment is ready with
`team check`:

```bash
team check my-team.yaml
```

The command checks:

| Check | What it tests |
|---|---|
| Workspace writable | Can create the workspace directory and write files to it |
| Disk space | Reports available GB; warns if below **5 GB** |
| Docker daemon | Docker daemon reachable, version ≥ 20.10, Ollama image present |
| GPU availability | Runs `nvidia-smi` when any member requests GPUs; warns if not found |

Exit code is `0` when all checks pass (warnings allowed), `1` when any
check fails. Failures are shown with a red ✗ and warnings with a yellow ⚠.

---

## Streaming output

By default `team run` streams each member's reply **token-by-token** to the
terminal as it is generated. You see a header like `@alice (Lead)` followed
by the reply appearing live — no waiting for the full response.

To disable streaming (e.g. for CI or when redirecting output to a file):

```bash
team run my-team.yaml --no-stream
```

With `--no-stream` the full reply is printed at once after each turn
completes.

---

## Per-turn timeout

Set a hard wall-clock deadline (seconds) on how long any single member turn
may take. If the LLM doesn't finish within the limit, a `TurnTimeoutError`
is raised and the workflow stops.

```yaml
defaults:
turn_timeout: 120 # 2 minutes for every member by default

members:
- name: fast_reviewer
role: Reviewer
model: qwen2.5:3b
persona: You review code quickly.
turn_timeout: 30 # override — this member gets only 30 s
```

Set `turn_timeout: 0` (or leave it absent) to disable timeouts entirely.

**Implementation details**

The member's `take_turn()` is executed in a `ThreadPoolExecutor` thread and
`future.result(timeout=…)` enforces the deadline. If the timeout fires the
thread is abandoned (it will eventually finish and be garbage-collected), but
the calling workflow raises `TurnTimeoutError` immediately.

---

## LLM retry with backoff

`team` automatically retries LLM calls that fail due to transient infrastructure errors — connection refused, timeouts, and HTTP 5xx responses from the server — using **exponential backoff**.

```yaml
defaults:
max_retries: 3 # attempts per call (default: 3; 0 = no retries)
retry_backoff: 2.0 # backoff base in seconds (wait = backoff ** attempt)

members:
- name: alice
max_retries: 5 # per-member override
retry_backoff: 1.5
```

### How it works

| Scenario | Behaviour |
| --- | --- |
| Connection refused / timeout | Retried up to `max_retries` times. |
| HTTP 5xx (server error) | Retried — the server never processed the request. |
| HTTP 4xx (client error) | **Not retried** — a bad model name or malformed request won't self-heal. |
| Partial streaming response | **Not retried** — the caller already received tokens; replaying would produce duplicates. |

The wait between attempts is `retry_backoff ** attempt` seconds (attempt 0 → 1 s, attempt 1 → 2 s, attempt 2 → 4 s for the default `retry_backoff=2.0`).

### When all retries are exhausted

`LLMRetryExhaustedError` (a subclass of `OllamaError`) is raised. The CLI catches it and prints a red error panel instead of crashing, preserving any transcript written so far.

---

## Resuming an interrupted run

If a run is interrupted (crash, timeout, Ctrl-C) you can pick up exactly
where it left off without re-running the turns that already completed:

```bash
team run my-team.yaml --resume
```

`--resume` loads the existing `transcript.jsonl`, replays every already-
completed turn instantly (no LLM call), and then continues the workflow
live from the first missing turn.

* Containers are restarted (or re-used) as normal; models are not re-pulled
if their cache volumes still exist.
* Combine with `--no-up` if your containers are already running from a
previous `team up`.
* If the transcript doesn't exist or is empty, `--resume` is a no-op and
the run starts fresh.
* If the previous run completed, resuming is a harmless no-op: the workflow
will detect `[[TEAM_DONE]]` in the first replayed turn and exit immediately.

---

## Human-in-the-loop intervention

You can inject new directives into a running team at any time without
stopping or restarting. Two mechanisms are available:

### Interactive mode (foreground runs)

Pass `--interactive` to `team run`. After every workflow round completes
you are prompted for an optional directive. Press **Enter** with no text to
let the run continue, or type instructions and press **Enter** to have them
injected before the next round:

```bash
team run my-team.yaml --interactive
```

```text
── round 1/4 complete ──
Enter a directive for the team (or press Enter to continue): Focus only on the auth module for now.
↳ directive injected
```

### File-based injection (background / CI runs)

At any point during a run you can write a plain-text file called
`inject.txt` into the workspace directory:

```bash
echo "Switch to Python 3.12 syntax only." > ./runs/my-team/inject.txt
```

Before the **next member turn** begins, the orchestrator checks for this
file. If it exists, the content is read, the file is deleted, and the
directive is appended to the transcript as a `@human (director)` turn.
All members see it in their next turn's conversation context.

The file is consumed once and automatically removed. Drop a new file to
inject again at any later point.

### What the team sees

Both mechanisms produce the same type of transcript entry:

```text
--- Turn N | @human | director ---

```

The entry is visible to every member in their next turn prompt, just like
any other speaker's turn.

---

## Agent mode and tool use

Members can act as **agents**: they may call external tools, then receive
the tool's output and continue reasoning — all within the same logical turn.
Two invocation modes are supported:

| Mode | How it works |
| --- | --- |
| `text` (default) | Member emits fenced `tool:` blocks in its reply; orchestrator parses and executes them. Works with any model. |
| `native` | Uses the LLM's **function-calling API** (Ollama `tools` parameter / OpenAI function calling). Requires a compatible model (Llama 3.1+, Qwen 2.5, GPT-4 family, etc.). |

### Enabling tools

```yaml
defaults:
tools: [web_search, run_python] # enable globally
max_tool_rounds: 10 # max tool-call rounds per turn (default: 10)
tool_timeout: 300 # seconds per tool execution (default: 300)
tool_mode: text # "text" (default) or "native"

members:
- name: researcher
tools: [web_search, read_url] # per-member override
tool_mode: native # this member uses function-calling API
- name: data_scientist
tools: [run_python, run_bash, read_file, write_file, append_file, list_files]
```

### Tool invocation syntax — `text` mode

A member invokes a tool by emitting a fenced block with a `tool:`
info-string:

````
```tool:web_search
query: IPCC AR6 key findings 2024
```
````

### Tool invocation — `native` mode

In native mode the model receives **JSON Schema** definitions for all
enabled tools and returns structured `tool_calls` objects (OpenAI/Ollama
function-calling format) instead of text fenced blocks. The orchestrator
executes the tools and passes results back via `tool` role messages — no
text parsing required.

Every built-in tool has a corresponding JSON Schema automatically provided
to the model. Custom skill tools that lack a schema receive a minimal
`input: string` schema.

> **Model requirements**: native mode requires a model that supports
> function calling. For Ollama, use `llama3.1:8b` or newer, `qwen2.5:7b`,
> `mistral-nemo`, etc. For OpenAI-compat backends, any GPT-4 / Claude
> model works. If you pass native mode to a model that ignores the `tools`
> parameter, it will fall back to producing a text reply (no tool calls).

````
```tool:run_python
import pandas as pd
df = pd.read_csv('/workspace/shared/data.csv')
print(df.describe())
```
````

````
```tool:read_file
path: analysis/results.json
```
````

````
```tool:write_file
path: output/summary.md
---
# Summary

This file was written by the agent.
```
````

````
```tool:append_file
path: logs/run.log
---
[step 3] analysis complete.
```
````

````
```tool:list_files
pattern: *.py
```
````

After each tool block the orchestrator executes the tool, injects the result
back into the conversation, and asks the member to continue. Once the member
produces a reply with no tool blocks, that reply is recorded in the
transcript as usual.

### Available built-in tools

| tool | description |
| --- | --- |
| `run_python` | Execute Python code; cwd is the shared workspace directory. |
| `run_bash` | Execute a bash command; cwd is the shared workspace directory. |
| `web_search` | Search the web via the DuckDuckGo instant-answer API (no key required). |
| `read_url` | Fetch and return the plain-text content of a URL. |
| `read_file` | Read a file from the shared workspace by relative path. |
| `write_file` | Write (create or overwrite) a file in the shared workspace. |
| `append_file` | Append text to a file in the shared workspace. |
| `list_files` | List files in the shared workspace with an optional glob filter. |
| `remember` | Store a memory in the member's **persistent cross-session** memory store. |
| `recall` | Search the member's persistent memory by keyword. |
| `forget` | Delete a memory by key from the persistent store. |
| `list_memories` | List stored memories (optionally filtered by tag). |
| `assert_belief` | Add a claim to the team's **shared belief board** with confidence score. |
| `contest_belief` | Contest an existing belief (moves it to contested status). |
| `accept_belief` | Cast an accept vote for an existing belief. |
| `list_beliefs` | List the shared belief board (optionally filtered by status). |
| `delegate_task` | Delegate a sub-task to a remote bridge server and wait for results. Use `peer:` for named peers or `url:` for direct addressing. |
| `list_peers` | List all configured peer teams and their live health status (pending/running counts). |
| `broadcast_task` | Fan out the same goal to multiple peer teams concurrently and collect all results. |
| `cancel_remote_task` | Cancel a queued or running task on a remote bridge server by task ID. |
| `delegate_to_expert` | Send a prompt to an external cloud LLM (OpenAI, Anthropic, Google) for expert assistance when the task exceeds local capabilities. |
| `log_decision` | Append a timestamped decision entry to `decisions.md` in the shared workspace. |
| `read_decisions` | Read the full decision log (`decisions.md`) from the shared workspace. |
| `query_registry` | Query a team registry to discover teams matching capability tags or a keyword; returns names, URLs, and tags. |
| `sync_beliefs` | Synchronize the team belief board with a remote team cluster (pull, push, or both directions). |

**`write_file` and `append_file` body format**

Both tools use a two-part body separated by a `---` line:

```
path: relative/path/to/file.txt
---
File content goes here.
Multiple lines are fine.
```

The path is relative to the shared workspace root. Parent directories are
created automatically. `write_file` replaces any existing content;
`append_file` adds to the end of the file (creating it if it does not exist).

**`list_files` body format**

The body is optional. If omitted, all workspace files are listed. Use a
`pattern:` key to filter by glob pattern:

```
pattern: **/*.py
```

### Security note

`run_python` and `run_bash` execute code on the **host machine** with the
privileges of the `team` process. Only enable these tools for members whose
prompts you trust.

### Expert delegation — `delegate_to_expert`

When a task is too complex for the local model assigned to a member, that
member can **delegate the sub-problem** to a subscription-based cloud LLM
(ChatGPT, Claude, Gemini) and receive the answer as a tool result. The
member remains responsible for the turn — it incorporates the expert's reply
into its own response, so the team structure and role assignments are preserved.

The cloud model is **not** a team member. It has no access to the
transcript, the shared workspace, or any other team state — only the prompt
text you explicitly send.

#### Setup

Export the API key for the provider(s) you want to use **on the host** before
running `team`:

```bash
export OPENAI_API_KEY=sk-… # for provider: openai
export ANTHROPIC_API_KEY=sk-ant-… # for provider: anthropic
export GOOGLE_API_KEY=AIza… # for provider: google
```

Enable the tool for a member in the YAML:

```yaml
members:
- name: analyst
model: llama3.2:3b
tools: [delegate_to_expert, read_file, write_file]
```

#### Usage

**Multi-line prompt (recommended for complex requests)**:

````
```tool:delegate_to_expert
provider: openai
model: gpt-4o
max_tokens: 4096
temperature: 0.2
---
You are a statistics expert.
Given the following regression output, identify any violations
of linear regression assumptions and suggest remedies.

Residuals: …
```
````

**Single-line prompt**:

````
```tool:delegate_to_expert
provider: anthropic
model: claude-opus-4-5
prompt: What is the time complexity of Dijkstra's algorithm with a binary heap?
```
````

| field | required | default | description |
| --- | --- | --- | --- |
| `provider` | ✓ | — | `openai`, `anthropic`, or `google` |
| `model` | | provider default | Model name accepted by the provider API |
| `prompt` | ✓* | — | Prompt text (single-line form; ignored when `---` body is present) |
| `max_tokens` | | `2048` | Maximum tokens in the response |
| `temperature` | | `0.2` | Sampling temperature 0–2 |

\* Required unless a `---` body separator is used.

**Provider defaults**: `gpt-4o` (OpenAI), `claude-opus-4-5` (Anthropic),
`gemini-1.5-pro` (Google).

> **Privacy**: the prompt text is sent to the external API. Do not include
> sensitive data unless your data-handling agreement with the provider permits it.
> Only enable `delegate_to_expert` for members that may handle the data appropriately.

### Full system access and package installation

Agents have **full, unrestricted access to the host system** — the same
privileges as the user who runs the `team` process. This is intentional:
agents should be able to do anything a human researcher or engineer can do.

In particular, agents can install software at will:

````
```tool:run_bash
pip install scikit-learn seaborn --quiet
```
````

````
```tool:run_bash
apt-get install -y ffmpeg
```
````

````
```tool:run_python
import subprocess, sys
subprocess.run([sys.executable, "-m", "pip", "install", "biopython"], check=True)
import Bio
print(Bio.__version__)
```
````

When a tool invocation takes longer than expected (e.g. downloading a large
package), increase the `tool_timeout` in your YAML:

```yaml
defaults:
tool_timeout: 600 # 10 minutes — safe for most installs
```

The default `tool_timeout` is **300 seconds** (5 minutes), which covers the
vast majority of `pip install` and `apt-get` operations on a normal network
connection.

### How it works

**Text mode** (`tool_mode: text`):
```
member turn:
1. LLM called with system prompt + conversation context
2. If reply contains tool: fenced blocks → execute each tool
3. Tool results injected as a follow-up user message
4. LLM called again (no streaming; repeats up to max_tool_rounds)
5. If no tool blocks in reply → reply recorded in transcript
```

**Native mode** (`tool_mode: native`):
```
member turn:
1. LLM called with JSON Schema tool definitions in the "tools" parameter
2. If response contains tool_calls → execute each named tool using args_to_body()
3. Each result injected as a "tool" role message
4. LLM called again (repeats up to max_tool_rounds)
5. When LLM returns text (no tool_calls) → reply recorded in transcript
```

Token usage from all tool-call rounds is accumulated and reported in the
[token usage summary](#token-usage-tracking).

### Streaming display

When streaming is enabled (`team run` without `--no-stream`), tool calls
are displayed inline:

```text
@researcher (Research Lead)
I'll search for recent data on this topic.

🔧 tool: web_search query: climate change 2024 report
↳ **Climate Change** A programming language. - Flooding in coastal…
Based on the search, the key findings are…
```

---

### Custom skill plugins

The built-in tool set is a starting point. You can extend it with any
Python file — local or fetched from a URL — and make those tools
available to any member. This gives agents effectively **unlimited**
capabilities depending on what skills you provide.

#### Skill file format

A skill file must expose tools in one of two formats:

**Single-tool format** (`TOOL_NAME` + `execute`):

```python
# skills/my_calculator.py
TOOL_NAME = "my_calculator"
TOOL_DESCRIPTION = "Evaluate a Python arithmetic expression."

def execute(body, *, workspace_path=None, timeout=30, **kwargs):
try:
return str(eval(body.strip(), {"__builtins__": {}}, {}))
except Exception as exc:
return f"ERROR: {exc}"
```

**Multi-tool format** (`TOOLS` dict + optional `TOOL_DESCRIPTIONS`):

```python
# skills/db_tools.py
import sqlite3

def _query(body, *, workspace_path=None, **kwargs):
db_path = workspace_path / "data.sqlite"
conn = sqlite3.connect(db_path)
rows = conn.execute(body.strip()).fetchall()
conn.close()
return "\n".join(str(r) for r in rows)

def _schema(body, *, workspace_path=None, **kwargs):
db_path = workspace_path / "data.sqlite"
conn = sqlite3.connect(db_path)
rows = conn.execute("SELECT name, sql FROM sqlite_master WHERE type='table'").fetchall()
conn.close()
return "\n".join(f"{r[0]}: {r[1]}" for r in rows)

TOOLS = {"sql_query": _query, "sql_schema": _schema}
TOOL_DESCRIPTIONS = {
"sql_query": "Run an SQL SELECT on the shared SQLite database.",
"sql_schema": "Return the schema of all tables in the shared SQLite database.",
}
```

Both formats can coexist in the same file.

#### Configuring skills

Add skill sources under `defaults.skills` (inherited by all members) or
`members[*].skills` (member-specific, merged with defaults on top):

```yaml
defaults:
skills:
- path: ./skills/my_calculator.py # local path (relative to CWD)
- path: ./skills/db_tools.py
- url: https://example.com/skill.py # remote URL (see security note below)
checksum: sha256:e3b0c44298fc… # optional integrity check
- ./skills/shorthand.py # plain string = auto-detect local/remote

tools: [web_search, my_calculator, sql_query, sql_schema] # opt-in by name

members:
- name: analyst
tools: [sql_query, sql_schema, run_python] # member-specific tool set
skills:
- ./skills/analyst_helpers.py # member-specific extra skill
```

Tool names from skills are used exactly like built-in tool names everywhere
(`tools:` lists, `tool:` fenced blocks, system prompts).

#### Checksum verification

For any skill (local or remote) you can supply a checksum to verify
integrity before execution:

```yaml
skills:
- url: https://example.com/skill.py
checksum: sha256:
- path: ./skills/local.py
checksum: sha256:
```

Supported algorithms: any name accepted by Python's `hashlib` (e.g.
`sha256`, `sha512`, `md5`). `team` raises an error and refuses to load
the skill if the digest does not match.

#### Markdown skills — context injection

Skills do not have to be executable code. A Markdown file (`.md`) loaded
as a skill has its content injected verbatim into the member's **system
prompt** at startup — no tool call required. Use this for guidelines,
checklists, templates, and domain rules that should always be visible.

```yaml
defaults:
skills:
- path: ./skills/review_checklist.md # injected into system prompt
- path: ./skills/task_board.py # callable tool as usual
```

A Python skill can also inject context by setting the `INJECT_INTO_CONTEXT`
variable to a non-empty string — the text is injected *and* the tool
remains callable:

```python
TOOL_NAME = "style_guide"
INJECT_INTO_CONTEXT = "## Style guide\n- Use snake_case for all variables.\n..."

def execute(body, **kwargs):
return INJECT_INTO_CONTEXT # also callable on demand
```

#### Bundled team-specific skills

The `skills/` directory in this repository contains a set of skills designed
for multi-agent collaboration — things that have no use outside a team run
and would never appear in a general-purpose skill library.

| File | Type | Description |
|---|---|---|
| `review_checklist.md` | Markdown | Structured peer-review checklist injected into reviewer personas. |
| `escalation_rules.md` | Markdown | When to proceed, flag a risk, or escalate to the manager. |
| `decision_record_format.md` | Markdown | ADR-style template for writing `log_decision` entries. |
| `task_board.py` | Python | `task_add` / `task_done` / `task_list` — shared TASKS.md board. |
| `search_transcript.py` | Python | `search_transcript` — keyword search over the run transcript. |
| `critique_request.py` | Python | `request_critique` / `pick_critique` / `list_critiques` — async peer-review queue. |
| `progress_snapshot.py` | Python | `progress_snapshot` — write (or read) PROGRESS.md in the workspace. |

Reference them by path in your team YAML:

```yaml
defaults:
skills:
- path: ./skills/review_checklist.md
- path: ./skills/escalation_rules.md
- path: ./skills/task_board.py
- path: ./skills/search_transcript.py
tools: [task_add, task_done, task_list, search_transcript]
```

## Shared institutional context

When a workspace contains a `context.md` file at its root, `team` injects its
content into **every** member's turn context automatically — no per-member
configuration required.

This is the right place for knowledge that applies to all members equally:
lab conventions, dataset descriptions, domain terminology, naming standards,
relevant prior work, or any background a new team member would need to read
on day one.

**Creating the context file:**

```bash
cat > ./runs/my-team/context.md << 'EOF'
# Lab context

This project analyses the TCGA-BRCA cohort (1,142 samples, 38 features).

## Naming conventions
- All feature files use `snake_case` column names.
- Model outputs go in `results/`.

## Domain notes
- Use log2 CPM normalisation for expression data.
- Primary endpoint is 5-year overall survival (OS5).
EOF
```

The file is read from disk **on every turn** so you can update it while a
run is in progress (e.g. to correct a mistake or add a new constraint).
If the file is absent, the section is silently omitted.
The content is truncated at 8 192 characters if the file is very large.

---

## Decision log

Members with the `log_decision` tool enabled can record structured, timestamped
decisions in a shared `decisions.md` file inside the workspace. Any member
can later call `read_decisions` to review the accumulated rationale before
making related choices.

**Enabling the tools:**

```yaml
defaults:
tools: [log_decision, read_decisions] # add to any existing tool list
```

**Logging a decision:**

````
```tool:log_decision
title: Chose pandas over polars for data wrangling
rationale: Polars ecosystem is too immature; pandas is already a project dependency.
alternatives: polars, dask, vaex
```
````

The entry is appended to `decisions.md` in the shared workspace:

```markdown
## Decision: Chose pandas over polars for data wrangling
**Date:** 2024-07-15T10:32:44Z
**By:** @data_scientist

**Rationale:** Polars ecosystem is too immature; pandas is already a project dependency.

**Alternatives considered:** polars, dask, vaex

---
```

**Reading the decision log:**

````
```tool:read_decisions
```
````

Returns the full `decisions.md` content so members can consult previous
decisions when facing related choices.

---

## Structured JSON output

By default members reply in free-form text. When you need machine-readable
output — e.g. an extractor member whose results are consumed by downstream
code — set `output_format: json` on that member.

```yaml
members:
- name: extractor
role: Data extractor
model: llama3.1:8b
persona: You extract structured data from documents.
output_format: json
output_schema: # optional — validates the reply
type: object
required: [entities, summary]
properties:
entities:
type: array
items: {type: string}
summary:
type: string
```

**What happens**

1. The system prompt gains an `## Output format` section instructing the model
to reply with valid JSON only.
2. After the LLM replies, `team` calls `json.loads()` on the content.
3. If parsing fails (or schema validation fails when `output_schema` is set),
the orchestrator sends a correction prompt and retries up to **3 times**.
4. The parsed object is stored in `TurnResult.json_output` and is accessible
from custom workflows or post-run code.
5. Schema validation requires `pip install jsonschema`; without it the schema
check is skipped silently.

> **Note:** `output_format` is per-member only — it is not available as a
> team-wide `defaults` key.

---

## Conditional routing

Enable dynamic, branching conversations where each member's output determines who speaks next — building state-machine-like workflows without any code.

```yaml
workflow:
type: conditional
start: writer # optional; defaults to the first listed member
max_rounds: 20

members:
- name: writer
model: llama3
persona: You are a technical writer.
role: Writer
routes:
- if_contains: "NEEDS_REVISION"
next: editor
- if_match: "APPROVED|LGTM"
next: publisher
- default: reviewer # fallback when nothing else matches

- name: editor
model: llama3
persona: You are an editor.
role: Editor
routes:
- if_contains: "DONE"
next: publisher
- default: writer # loop back for another draft

- name: reviewer
model: llama3
persona: You are a reviewer.
role: Reviewer
routes:
- default: writer

- name: publisher # terminal node — no routes needed
model: llama3
persona: You are a publisher.
role: Publisher
```

### Route rules

Rules are evaluated **top-to-bottom**; the first match wins.

| Key | Behaviour |
| --- | --- |
| `if_contains: "TEXT"` | Case-insensitive substring search in the member's last reply. |
| `if_match: "REGEX"` | Case-insensitive `re.search` against the member's last reply. |
| `default: member` | Unconditional fallback; fires when no other rule matches. |

A member with **no `routes`** falls back to the standard round-robin next-speaker logic.

### Workflow end conditions

The workflow stops when:
* any member outputs `[[TEAM_DONE]]`, or
* the total turn count reaches `max_rounds`.

---

## Token budget

Prevent runaway costs by capping how many tokens a member may consume across all turns in a single run.

```yaml
defaults:
token_budget: 5000 # max prompt+completion tokens per member per run

members:
- name: alice
token_budget: 10000 # per-member override
```

When a member's cumulative token usage reaches the budget before their next turn, `TokenBudgetError` is raised and the run stops gracefully. The transcript and any workspace files written so far are preserved, and `team run --resume` with a higher budget can continue from where it left off.

> **Note:** Replayed turns (from `--resume`) do **not** count toward the budget.

### Budget resolution

| Setting | Effective budget |
| --- | --- |
| `token_budget` in `defaults` only | Applied to every member. |
| `token_budget` in a specific member | Overrides the `defaults` value for that member only. |
| Neither set | No limit — member runs until the workflow ends. |

---

## Per-agent persistent memory

In a real research lab, scientists remember what worked and what failed —
across months of experiments. `team` gives each agent a **private,
persistent memory store** backed by SQLite that survives between completely
separate `team run` invocations.

```
Session 1 (January): alice uses remember to store "AlphaFold3 RMSD 1.2 Å"
Session 2 (February): alice uses recall to surface that result and build on it
```

This is what separates `team` from all other orchestration frameworks: your
agents actually **accumulate knowledge over time**.

### Enabling memory

Add a `memory:` section to your team YAML:

```yaml
memory:
enabled: true
inject_recent: 5 # memories injected into each turn's context (default: 5)
store: ~/.team/memory # optional; defaults to /memory/
```

Enable memory tools for each member:

```yaml
members:
- name: alice
tools: [run_python, remember, recall, forget, list_memories]
```

### Memory tools

All memory tools use a `key:` / header + `---` / value body format:

**`remember`** — store a cross-session memory:

````
```tool:remember
key: protein_folding_baseline_2025
tags: results, methods
importance: 0.9
---
AlphaFold3 outperforms RoseTTAFold on monomers (RMSD 1.2 vs 2.1 Å, n=1 000).
Dataset: PDB validation set, tested January 2025.
```
````

**`recall`** — full-text search across all memories:

````
```tool:recall
query: protein folding
limit: 5
```
````

Returns a ranked list of matching memories (by importance then recency).

**`forget`** — delete a memory by key:

````
```tool:forget
key: protein_folding_baseline_2025
```
````

**`list_memories`** — browse all memories (optionally by tag):

````
```tool:list_memories
tag: results
limit: 20
```
````

At the start of every turn, the *n* most recent memories are automatically
injected into the member's context under `## Your persistent memories`.

### Memory config reference

| key | type | default | description |
| --- | --- | --- | --- |
| `enabled` | bool | `false` | Enable persistent memory for all members. |
| `inject_recent` | int | `5` | Number of recent memories to inject into each turn's context. |
| `store` | path | `/memory` | Directory that holds the per-member SQLite databases. |

---

## Shared team belief board

In collaborative science, a team's most important output is not files — it is
**what the team collectively knows**. The `team` belief board formalises this
as a living, structured record of claims with provenance, confidence scores,
and consensus voting.

```
alice asserts: "RNA Pol II is rate-limiting in elongation" (confidence: 85%)
bob accepts → 2/3 votes ≥ threshold → status: ACCEPTED
carol contests with reason: "only tested in HEK293" → status: CONTESTED
```

After a run: `team beliefs myteam.yaml` shows everything the team concluded.

### Enabling the belief board

```yaml
beliefs:
enabled: true
consensus_threshold: 0.5 # fraction of members required for acceptance
inject_limit: 10 # beliefs shown in each member's turn context
```

Enable belief tools for each member:

```yaml
members:
- name: alice
tools: [run_python, assert_belief, contest_belief, accept_belief, list_beliefs]
```

### Belief tools

**`assert_belief`** — propose a claim with optional evidence:

````
```tool:assert_belief
confidence: 0.85
evidence: RMSD analysis, PDB validation set, n=1 000, January 2025
---
AlphaFold3 is the best available method for monomer structure prediction.
```
````

The member who asserts a belief automatically casts an *accept* vote. The
returned belief ID (e.g. `a3f2b1c9`) is used in subsequent votes.

**`accept_belief`** — vote to accept:

````
```tool:accept_belief
id: a3f2b1c9
```
````

**`contest_belief`** — move a belief to `contested` status:

````
```tool:contest_belief
id: a3f2b1c9
reason: Dataset is limited to well-studied proteins; may not generalise.
```
````

**`list_beliefs`** — browse the board:

````
```tool:list_beliefs
status: contested
```
````

Valid status values: `pending`, `accepted`, `contested`, `rejected`. Omit to
list all beliefs.

Beliefs are injected into every member's turn context under
`## Shared team belief board` so the whole team sees the current state before
each turn.

### Inspecting beliefs with team beliefs

```bash
team beliefs myteam.yaml # all beliefs
team beliefs myteam.yaml --status accepted # accepted only
team beliefs myteam.yaml --status contested # contested — needs attention
```

Output example:

```
Belief board — team 'my-team'
┏━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━┳━━━━━━━━━┓
┃ ID ┃ Status ┃ Claim ┃ Confidence ┃ By ┃ For ┃ Against ┃
┡━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━╇━━━━━━━━━┩
│ a3f2b1 │ ✓ accepted │ AlphaFold3 is best for monomer structure prediction. │ 85% │ @alice│ 2 │ 0 │
│ 9c1d33 │ ⚡ contested│ The dataset generalises to all protein families. │ 60% │ @bob │ 1 │ 1 │
└────────┴─────────────┴─────────────────────────────────────────────────────────┴────────────┴───────┴─────┴─────────┘
⚡ Some beliefs are contested — review and resolve via accept_belief / contest_belief tools.
```

### Belief config reference

| key | type | default | description |
| --- | --- | --- | --- |
| `enabled` | bool | `false` | Enable the shared belief board. |
| `consensus_threshold` | float | `0.5` | Fraction of members who must accept a belief for it to become `accepted`. |
| `inject_limit` | int | `10` | Maximum number of beliefs injected into each member's turn context. |

---

## Workspace checkpoints

Every time a live member turn is about to execute, the orchestrator
automatically snapshots the current state of the **shared workspace** before
any files are written. Snapshots are stored under
`/checkpoints/` with names that encode the turn index, the
member about to speak, and the timestamp:

```
checkpoints/
├── 0001_alice_20240501T120000/ # state before alice's 1st turn
├── 0003_bob_20240501T120145/ # state before bob's 2nd turn
└── ...
```

If the shared workspace is empty (no files have been produced yet), the
snapshot is silently skipped — there is nothing to back up.

### Listing checkpoints

```bash
team checkpoints my-team.yaml
```

```
┌──────────────────────────────┬──────┬──────────────────────┬─────────────────────┬───────┐
│ ID │ Turn │ Before member's turn │ Timestamp │ Files │
├──────────────────────────────┼──────┼──────────────────────┼─────────────────────┼───────┤
│ 0001_alice_20240501T120000 │ 1 │ @alice │ 2024-05-01 12:00:00 │ 3 │
│ 0003_bob_20240501T120145 │ 3 │ @bob │ 2024-05-01 12:01:45 │ 5 │
└──────────────────────────────┴──────┴──────────────────────┴─────────────────────┴───────┘
```

### Restoring a checkpoint

Copy the checkpoint ID from the table and pass it to `team restore`:

```bash
team restore my-team.yaml 0001_alice_20240501T120000
```

```
restored checkpoint 0001_alice_20240501T120000 — 3 file(s) now in the shared workspace.
```

The current contents of `shared/` are replaced with the snapshot.
**This cannot be undone** unless a later checkpoint already captured the
state you are overwriting, so check `team checkpoints` before restoring.

### Use cases

* **Undo a bad turn** — a member produced unwanted file changes; restore the
checkpoint taken just before that turn.
* **Branch from a known-good state** — restore an earlier checkpoint, edit
`team.yaml` (e.g. change the goal or persona), and re-run from there.
* **Audit the evolution of the workspace** — inspect any checkpoint
directory directly; it is a plain copy of `shared/` at that point in time.

---

## Workspace time-travel (`team rollback`)

Every live member turn is preceded by an automatic workspace snapshot (see
[Workspace checkpoints](#workspace-checkpoints)). When things go wrong you
can roll back the shared workspace to *any prior point in time* and resume
from there — effectively forking the timeline:

```bash
# 1. List all available snapshots
team rollback myteam.yaml

# 2. Restore to a specific checkpoint (with confirmation prompt)
team rollback myteam.yaml --to 0005_alice_20250510T183000

# 3. Skip the confirmation prompt (useful in scripts)
team rollback myteam.yaml --to 0005_alice_20250510T183000 --yes
```

After rolling back, resume the run from the restored state:

```bash
team run myteam.yaml --resume
```

Because the transcript also persists, `--resume` skips all turns already
recorded in it. To *re-run* from turn 5 with a different approach, truncate
the transcript manually (or delete it and rely entirely on the restored
workspace files).

> `team rollback` is a thin wrapper around the existing
> `CheckpointManager.restore()` logic. The underlying `team restore`
> command (which requires an exact checkpoint ID argument) remains available
> for scripting.

---

## Token usage tracking

After every `team run` a token usage summary is printed:

```text
┌────────────────────────────────────────────────────┐
│ Token usage (live turns) │
├──────────┬─────────┬───────────┬───────────────────┤
│ member │ prompt │ completion│ total │
├──────────┼─────────┼───────────┼───────────────────┤
│ @lead │ 12 450 │ 3 210 │ 15 660 │
│ @worker │ 8 120 │ 5 890 │ 14 010 │
├──────────┼─────────┼───────────┼───────────────────┤
│ total │ 20 570 │ 9 100 │ 29 670 │
└──────────┴─────────┴───────────┴───────────────────┘
```

Token counts come from the Ollama `/api/chat` `eval_count` /
`prompt_eval_count` fields (for the `ollama` backend) or the OpenAI
`usage` object (for `openai_compat`). The summary is omitted when all
counts are zero (e.g. pure replay runs or backends that don't report
token usage).

---

## Cost estimation

After every `team run` and `team stats` command, the token-usage table includes an **Est. cost** column with a USD estimate based on the model used by each member.

Local Ollama models always show **$0.00 (local)** since they run on your hardware. Cloud models (`backend: openai_compat`) are looked up in the built-in pricing table.

### Built-in pricing table

| Provider | Models |
| --- | --- |
| **OpenAI** | `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-4`, `gpt-3.5-turbo`, `o1`, `o1-mini`, `o3`, `o3-mini` |
| **Anthropic** | `claude-opus-4`, `claude-sonnet-4`, `claude-3-5-sonnet`, `claude-3-5-haiku`, `claude-3-opus`, `claude-3-sonnet`, `claude-3-haiku` |
| **Google** | `gemini-2.0-flash`, `gemini-1.5-pro`, `gemini-1.5-flash` |
| **Mistral** | `mistral-large`, `mistral-medium`, `mistral-small`, `codestral` |
| **Meta (cloud-hosted)** | `llama-3.1-405b`, `llama-3.1-70b`, `llama-3.1-8b`, `llama-3-70b`, `llama-3-8b` |

Model names are matched by prefix/substring so versioned names like `gpt-4o-2024-08-06` automatically map to `gpt-4o` pricing. If a model is not recognised, the cost column shows **?**.

> **Prices are estimates only.** Provider pricing changes over time — update `team/pricing.py` with the latest figures from your provider's pricing page.

---

## Run statistics

`team stats` shows a detailed breakdown of a completed run — turn counts,
token usage per speaker, total duration, and files written — without
needing to start any containers:

```bash
team stats my-team.yaml
```

Example output:

```text
Team: my-team 18 turns · 29 670 tokens · duration 142.3s · 5 file(s) written

┌─────────────────────────────────────────────────────────────────────┐
│ Turns & token usage by speaker │
├──────────────┬───────┬───────────────┬──────────────────┬───────────┤
│ Speaker │ Turns │ Prompt tokens │ Completion tokens│ Total │
├──────────────┼───────┼───────────────┼──────────────────┼───────────┤
│ @lead │ 5 │ 12 450 │ 3 210 │ 15 660 │
│ @orchestrator│ 1 │ 0 │ 0 │ 0 │
│ @worker │ 12 │ 8 120 │ 5 890 │ 14 010 │
├──────────────┼───────┼───────────────┼──────────────────┼───────────┤
│ total │ 18 │ 20 570 │ 9 100 │ 29 670 │
└──────────────┴───────┴───────────────┴──────────────────┴───────────┘
```

The `Transcript.stats()` method in `team/bus.py` is also part of the
public Python API:

```python
from team.bus import Transcript
from team.config import load_team

cfg = load_team("my-team.yaml")
t = Transcript(persist_path=cfg.workspace / "transcript.jsonl", resume=True)
s = t.stats()
print(s["total_turns"], s["duration_seconds"])
```

---

## Exporting a run report

After a run you can bundle the full transcript and every produced artifact
into a single shareable document:

```bash
team export my-team.yaml # Markdown (default)
team export my-team.yaml --format html # self-contained HTML (dark-mode aware)
team export my-team.yaml --format json # machine-readable JSON
team export my-team.yaml --output ~/Desktop/run.md
team export my-team.yaml --no-artifacts # omit workspace files (faster, smaller)
```

The report includes:
* Team name, goal, members, and workflow settings.
* Every member turn with speaker, role, content, and files written.
* **Token usage & estimated cost table** — per member and totals.
* Full contents of all files produced in the shared workspace (omit with `--no-artifacts`).

Output path defaults to `/report.md` / `.html` / `.json`.

**Format details:**

| Format | Description |
| --- | --- |
| `markdown` | Single `.md` file with transcript, token table, and fenced artifact blocks. |
| `html` | Self-contained `.html` — embedded CSS, no external deps, respects `prefers-color-scheme: dark`. |
| `json` | Structured JSON (`format_version: 1`) with `team`, `stats`, `token_usage`, `turns`, and `artifacts` keys — suitable for post-processing. |

---

## `team replay` — interactive transcript browser

After a run completes, `team replay` lets you step through the saved
transcript turn-by-turn in an interactive terminal viewer — like a
debugger for a past run. No LLM calls, no Docker, no network — it
works entirely from the persisted `transcript.jsonl` file.

```
team replay myteam.yaml # start at turn 0
team replay myteam.yaml --from 5 # start at turn 5
team replay myteam.yaml --speaker alice # jump to alice's first turn
```

### Navigation keybindings

| Key | Action |
| --- | --- |
| `→` / `n` / Space / Enter | Advance to the next turn |
| `←` / `p` / `b` | Go back one turn |
| `g` | Prompt for a turn number and jump directly to it |
| `f` | Prompt for a speaker name and jump to their next turn |
| `s` | Toggle the stats summary panel (token totals, turn counts) |
| `q` / Esc | Quit |

### Non-interactive mode

When stdin is not a TTY (e.g. a CI pipeline or a pipe), `team replay`
prints all turns sequentially — the same rich panel rendering used by
`team transcript` — and exits immediately. This makes it safe to use
in scripts:

```bash
team replay myteam.yaml | head -100
```

### Options

| Option | Default | Description |
| --- | --- | --- |
| `--from N` | `0` | Start at turn N (0-based). |
| `--speaker NAME` | — | Jump to the first turn by NAME at startup. |

---

## Automated testing with `team test`

`team test` runs the team and then validates a set of assertions defined in the
`tests:` section of the team YAML. This makes it easy to build a repeatable
test suite for your team in CI.

```yaml
tests:
- name: creates hello.py
type: file_exists
path: hello.py

- name: script contains print
type: file_contains
path: hello.py
text: "print"

- name: no error messages
type: file_not_contains
path: report.txt
text: "ERROR"

- name: results is valid JSON
type: json_valid
path: results.json

- name: results matches schema
type: json_schema
path: results.json
schema:
type: object
required: [entities, summary]

- name: any member mentioned Python
type: transcript_contains
text: "Python"

- name: developer specifically mentioned Python
type: transcript_contains
speaker: developer
text: "Python"

- name: exactly 4 member turns
type: transcript_count
count: 4
```

```
team test myteam.yaml # run the team, then assert
team test myteam.yaml --no-run # assert against an existing run
team test myteam.yaml --max-rounds 2 --goal "quick smoke test"
```

Exits with code **0** if all assertions pass, **1** if any fail (suitable for
CI gates).

### Assertion reference

| Type | Required fields | Description |
| --- | --- | --- |
| `file_exists` | `path` | File must exist in the shared workspace. |
| `file_not_exists` | `path` | File must *not* exist. |
| `file_contains` | `path`, `text` | File content must contain the substring. |
| `file_not_contains` | `path`, `text` | File content must *not* contain the substring. |
| `json_valid` | `path` | File must be parseable JSON. |
| `json_schema` | `path`, `schema` | File must be valid JSON matching the JSON Schema. |
| `transcript_contains` | `text` | At least one turn must contain the text. Add `speaker` to restrict to one member. |
| `transcript_count` | `count` | Exact number of member turns (excludes `orchestrator`/`human`). |

All `path` values are relative to the **shared workspace** directory
(`/shared/`).

---

## Multi-team pipelines

A *pipeline* lets you chain multiple team runs together so that the output of one team — its shared workspace files and a transcript summary — is automatically injected into the next team's context.

### Pipeline YAML

Create a `pipeline.yaml` alongside your team files:

```yaml
name: research-and-write
description: Research a topic, then write a publication-ready paper.
workspace: ./runs/research-and-write # optional; default is ./runs/

stages:
- id: research
team: ./teams/researcher.yaml

- id: writing
team: ./teams/writer.yaml
depends_on: [research] # wait for research to complete
inject_files: true # copy research's shared/ files here
inject_context: true # write context.md from research output
goal_override: | # {stage_id.summary} templates available
Write a publication-ready paper based on the research below.

{research.summary}
```

### Running a pipeline

```bash
team pipeline pipeline.yaml
```

Preview the execution plan without running anything:

```bash
team pipeline pipeline.yaml --dry-run
```

### Stage fields

| Field | Type | Default | Description |
| --- | --- | --- | --- |
| `id` | string | *(required)* | Unique stage identifier used in `depends_on` and goal templates. |
| `team` | path | *(required)* | Path to the team YAML file (relative to the pipeline file). |
| `depends_on` | list of IDs | `[]` | Stages that must complete before this stage runs. |
| `inject_files` | bool | `false` | Copy every file from upstream stages' `shared/` directories into this stage's `shared/` directory before the team starts. |
| `inject_context` | bool | `false` | Write a `context.md` file into this stage's workspace summarising upstream stages' output. Members pick it up automatically. |
| `goal_override` | string | — | Replace the team YAML's `goal` for this pipeline run. Supports `{stage_id.summary}` template substitution. |

### How data flows

Each stage runs inside its own sub-workspace: `//`. At the end of every stage the runner extracts:

- **Summary** — the last five member turns from the transcript, concatenated.
- **Artifacts** — all files in `shared/`, keyed by relative path.

When the next stage has `inject_files: true`, artifact files are copied verbatim into the destination stage's `shared/` directory before its team starts. When `inject_context: true`, a `context.md` is written at the stage workspace root with the summaries and file lists from all upstream stages.

### Goal templates

`goal_override` is a Python `str.format()` template. Each upstream stage result is available as `{stage_id.summary}`:

```yaml
goal_override: |
Review the following research and identify gaps.

Research output:
{research.summary}

Initial draft:
{writing.summary}
```

---

## Cross-team collaboration (bridge)

`team` clusters running on **different machines**, operated by **different
people or organisations**, can collaborate on common goals through the bridge
protocol. One cluster delegates a sub-task to a remote cluster; the remote
cluster runs its full team workflow and returns the results — including all
files it produced. The exchange can repeat over multiple turns, just like a
real inter-laboratory collaboration.

### How it works

```
Lab A cluster (local) Lab B cluster (remote)
┌─────────────────────────────────────┐ ┌──────────────────────────────────┐
│ Orchestrator A │ │ team serve lab-b.yaml │
│ members: pi, analyst │ │ BridgeServer (port 7001) │
│ │ │ │
│ @pi uses delegate_task tool ───────┼─────┼──► POST /tasks │
│ │ │ ┌──────────────────────────┐ │
│ │ │ │ Orchestrator B