https://github.com/cumbof/team
Orchestrate a cluster of containerized local LLMs — each with its own persona, role, and goal — that collaborate until the work is done.
https://github.com/cumbof/team
agent agent-orchestration agent-team agentic-ai ai ai-agents ai-team ai-workflow cli containerized docker llm multiagent ollama
Last synced: 4 days ago
JSON representation
Orchestrate a cluster of containerized local LLMs — each with its own persona, role, and goal — that collaborate until the work is done.
- Host: GitHub
- URL: https://github.com/cumbof/team
- Owner: cumbof
- License: mit
- Created: 2026-05-10T01:23:17.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-06-22T15:55:48.000Z (13 days ago)
- Last Synced: 2026-06-22T17:25:04.784Z (13 days ago)
- Topics: agent, agent-orchestration, agent-team, agentic-ai, ai, ai-agents, ai-team, ai-workflow, cli, containerized, docker, llm, multiagent, ollama
- Language: Python
- Homepage:
- Size: 4.24 MB
- Stars: 6
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# team
Orchestrate a cluster of containerized local LLMs — each with its own
persona, role, and goal — that collaborate until the work is done.


[](https://github.com/cumbof/team/blob/main/LICENSE)

⭐ Star this repository to stay updated with new releases ⭐
`team` lets you describe a small "organisation" of LLMs in a single YAML
file and then bring it to life: every member runs in **its own isolated
Docker container** with its own [Ollama](https://ollama.com/) daemon and
its own model, the orchestrator drives a turn-based conversation between
them, and the members produce real artifacts (code, manuscripts, reports,
…) in a shared workspace.
You can mix and match model sizes per role — e.g. a 70B generalist as a
Principal Investigator, a 7B coder as a Data Scientist, an 8B model as a
Reviewer — and pick a workflow that matches how the work should flow:
**round-robin**, **manager-driven**, or **review-loop until consensus**.
> [!WARNING]
>
> **Work in Progress:** This repository is currently under active development.
> While the core functionality is present, some features may be incomplete or
> not fully work as expected, and you may encounter unexpected bugs. Please
> test thoroughly before using this in any critical pipelines.
> [!NOTE]
>
> A significant portion of the code and documentation in this repository
> was written **with the assistance of a Large Language Model (LLM)**.
> All LLM-generated contributions have been reviewed, tested, and curated
> by the human maintainers, but — as with any software — bugs may exist.
> Please review the code critically, run the test suite, and open an issue
> if you find something unexpected.
>
> **Pull requests are very welcome**, including those written or
> co-authored with the help of an LLM. We only ask that you review and
> test your changes before submitting, and disclose AI assistance in your
> PR description (e.g. *"co-authored with GitHub Copilot"*) so reviewers
> can calibrate their review accordingly.
---
## Feature overview
| Feature | Description |
| --- | --- |
| **Containerised members** | Every LLM runs in its own Docker + Ollama container with configurable CPU, RAM, and GPU limits. |
| **Flexible workflows** | `round_robin`, `manager`, `review_loop`, `sequential_chain`, `debate`, `parallel_review` — pick or combine. |
| **Shared workspace** | Members read and write real files (code, reports, data) to a host directory. |
| **Agent tool use** | 19 built-in tools (Python, Bash, web search, file I/O, memory, beliefs, decisions, delegation); `tool_mode: text` (fenced blocks) or `tool_mode: native` (OpenAI/Ollama function-calling API with JSON Schema); extend with custom skills. |
| **Predefined persona library** | 16 ready-made personas (`@pi`, `@engineer`, `@reviewer` …) stored as individual YAML files in `personas/`; extend with your own via `TEAM_PERSONA_DIR`. |
| **Per-agent persistent memory** | SQLite-backed memory that survives between runs; agents `remember` and `recall` across sessions. |
| **Shared team belief board** | Structured collective knowledge with confidence scores, voting, and consensus tracking. |
| **Cross-team federation (bridge)** | Two independent `team` clusters can delegate tasks to each other over HTTP — academic-lab-style collaboration. |
| **Shared institutional context** | Drop a `context.md` in the workspace root and every member sees it on every turn — no per-member config needed. |
| **Decision log** | Members call `log_decision` to append timestamped, rationale-rich entries to `decisions.md`; any member can `read_decisions` at any time. |
| **Workspace time-travel** | `team rollback` restores the workspace to any past checkpoint and lets you resume from there. |
| **Human-in-the-loop** | Interrupt a live run, read the transcript, inject a message, and let the team continue. |
| **OpenAI-compatible backends** | Swap Ollama for any OpenAI-compatible API (GPT-4o, Mistral, Together AI, …) per member. |
| **Context window management** | `sliding_window`, `truncate`, or `summarize` strategies keep long runs within token budgets. |
| **Workspace checkpoints** | Automatic snapshots before every member turn; `team restore` rolls back to any point. |
| **Run statistics & reports** | Per-member token usage, turn counts, elapsed time — exportable as a Markdown report. |
| **Interactive wizard** | `team new` walks you through YAML creation. |
| **Structured JSON output** | Force a member to reply with valid JSON; optionally validate against a JSON Schema with automatic retry. |
| **Per-turn timeout** | Hard wall-clock deadline per member turn; raises `TurnTimeoutError` if the LLM doesn't respond in time. |
| **`team test`** | Define assertions in the YAML and run them automatically after a team workflow to verify outputs in CI. |
| **Parallel member execution** | `workflow: type: parallel` — all members run simultaneously in each round, bounded by the slowest rather than the sum. |
| **`team replay`** | Step through a saved transcript turn-by-turn in an interactive terminal viewer; navigate, search by speaker, and view stats. |
| **Token budget** | Hard-cap total tokens per member per run; gracefully stops with `TokenBudgetError` when exhausted. |
| **Conditional routing** | Members declare the next speaker via simple YAML rules (`if_contains`, `if_match`, `default`), enabling dynamic branching and state-machine-like workflows. |
| **LLM retry with backoff** | Automatic retry with exponential backoff on transient errors (5xx, connection refused, timeout); configurable per member. Raises `LLMRetryExhaustedError` when all attempts fail. |
| **Cost estimation** | Estimated USD cost displayed in the token-usage table after every run (`team run`, `team stats`). Built-in pricing for OpenAI, Anthropic, Google, and Mistral; local Ollama models show `$0.00 (local)`. |
| **Multi-team pipelines** | Chain multiple team runs with `team pipeline`; upstream artifacts and transcript summaries are automatically injected into downstream stages via `inject_files`, `inject_context`, and `goal_override` templates. |
| **Team registry (service discovery)** | A lightweight HTTP directory where running team clusters advertise their capabilities (tags, models, tools). Other teams discover and delegate to specialist clusters via `query_registry` or the CLI. |
| **Federated belief board** | Independent team clusters share their collective knowledge across the bridge. Pull accepted beliefs from a partner team (they arrive as pending for local consensus), push local beliefs outward, or bidirectional sync — via the `sync_beliefs` tool or `team beliefs-sync` CLI. |
---
## Table of contents
- [Why?](#why)
- [How it works](#how-it-works)
- [Requirements](#requirements)
- [Installation](#installation)
- [Quick start](#quick-start)
- [Defining a team](#defining-a-team)
- [Top-level fields](#top-level-fields)
- [`defaults`](#defaults)
- [`workflow`](#workflow)
- [`members`](#members)
- [The collaboration protocol](#the-collaboration-protocol)
- [Predefined persona library](#predefined-persona-library)
- [How personas are stored](#how-personas-are-stored)
- [Available personas](#available-personas)
- [Using a persona in YAML](#using-a-persona-in-yaml)
- [Adding your own personas](#adding-your-own-personas)
- [Workflows](#workflows)
- [Workspaces and artifacts](#workspaces-and-artifacts)
- [Containers, isolation, and root](#containers-isolation-and-root)
- [GPU support](#gpu-support)
- [Apple Silicon / no-Docker Ollama](#apple-silicon--no-docker-ollama)
- [OpenAI-compatible backends](#openai-compatible-backends)
- [Remote / no-Docker Ollama](#remote--no-docker-ollama)
- [Custom Ollama image](#custom-ollama-image)
- [Context window management](#context-window-management)
- [Model retention (`keep_alive`)](#model-retention-keep_alive)
- [CLI reference](#cli-reference)
- [Interactive wizard](#interactive-wizard)
- [Pre-flight checks](#pre-flight-checks)
- [Streaming output](#streaming-output)
- [Per-turn timeout](#per-turn-timeout)
- [LLM retry with backoff](#llm-retry-with-backoff)
- [Resuming an interrupted run](#resuming-an-interrupted-run)
- [Human-in-the-loop intervention](#human-in-the-loop-intervention)
- [Agent mode and tool use](#agent-mode-and-tool-use)
- [Available built-in tools](#available-built-in-tools)
- [Custom skill plugins](#custom-skill-plugins)
- [Shared institutional context](#shared-institutional-context)
- [Decision log](#decision-log)
- [Structured JSON output](#structured-json-output)
- [Conditional routing](#conditional-routing)
- [Token budget](#token-budget)
- [Per-agent persistent memory](#per-agent-persistent-memory)
- [Enabling memory](#enabling-memory)
- [Memory tools](#memory-tools)
- [Memory config reference](#memory-config-reference)
- [Shared team belief board](#shared-team-belief-board)
- [Enabling the belief board](#enabling-the-belief-board)
- [Belief tools](#belief-tools)
- [Inspecting beliefs with team beliefs](#inspecting-beliefs-with-team-beliefs)
- [Belief config reference](#belief-config-reference)
- [Workspace checkpoints](#workspace-checkpoints)
- [Workspace time-travel (`team rollback`)](#workspace-time-travel-team-rollback)
- [Token usage tracking](#token-usage-tracking)
- [Cost estimation](#cost-estimation)
- [Run statistics](#run-statistics)
- [Exporting a run report](#exporting-a-run-report)
- [`team replay` — interactive transcript browser](#team-replay--interactive-transcript-browser)
- [Automated testing with `team test`](#automated-testing-with-team-test)
- [Multi-team pipelines](#multi-team-pipelines)
- [Cross-team collaboration (bridge)](#cross-team-collaboration-bridge)
- [How it works](#how-it-works-1)
- [Exposing a team as a bridge server](#exposing-a-team-as-a-bridge-server)
- [Delegating work from another team](#delegating-work-from-another-team)
- [Named peer registry](#named-peer-registry)
- [Broadcasting to multiple teams](#broadcasting-to-multiple-teams)
- [Cancelling a remote task](#cancelling-a-remote-task)
- [Server HTTP API reference](#server-http-api-reference)
- [Bridge config reference](#bridge-config-reference)
- [Security — HMAC-SHA256 shared secret](#security--hmac-sha256-shared-secret)
- [Additional security considerations](#additional-security-considerations)
- [Team registry (service discovery)](#team-registry-service-discovery)
- [Federated belief board](#federated-belief-board)
- [Examples](#examples)
- [Architecture overview](#architecture-overview)
- [Development](#development)
- [Troubleshooting](#troubleshooting)
- [License](#license)
---
## Why?
A single LLM is a generalist. Real work — research, engineering, writing —
is usually done by **several specialists** that disagree, revise, and
converge. `team` makes it easy to assemble such a group locally:
* **Heterogeneous models, one per role.** Use a small, fast model for
routine tasks and a large model only where it matters.
* **Strong isolation.** Every member is a separate `ollama serve`
process in a separate container, on a private Docker network, with its
own model cache. A misbehaving member cannot reach into another's
filesystem, network namespace, or model store.
* **Real deliverables.** Members write actual files (code, prose, data)
into a shared workspace; you keep them after the run.
* **Pluggable workflows.** Pick how the team coordinates — and add your
own in a few lines of Python.
---
## How it works
```
┌────────────────── orchestrator (host) ───────────────────┐
│ │
│ transcript.jsonl shared workspace (./runs/) │
│ ▲ ▲ │
│ │ append every turn │ files written by members│
└────┬───┴────────────┬──────────┴─────────────┬───────────┘
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌───────────────────┐ ┌──────────────────┐
│ container: pi │ │ container: postdoc│ │ container: ... │
│ ollama serve │ │ ollama serve │ │ │
│ model: 70B │ │ model: 8B │ │ │
│ /workspace (ro+) │ │ /workspace (ro+) │ │ /workspace (ro+) │
│ /private │ │ /private │ │ /private │
└──────────────────┘ └───────────────────┘ └──────────────────┘
\\ | //
\\ | //
team--net (private bridge network)
```
For each member, the orchestrator:
1. Starts a dedicated Ollama container, on a per-team Docker network, with
the team's shared workspace bind-mounted at `/workspace` and a
per-member private workspace at `/private`.
2. Pulls the model the member is configured to use (cached in the
member's own named Docker volume).
3. Builds a system prompt from the member's persona, the team goal, the
list of teammates, and the [collaboration protocol](#the-collaboration-protocol).
4. Asks the chosen [workflow](#workflows) to drive the conversation.
At every turn the orchestrator hands the speaking member the **full
shared transcript** plus a snapshot of the workspace; the member's reply
is parsed for fenced `file:` blocks (which become real files on disk) and
for control tokens (`[[TEAM_DONE]]`, `NEXT: @`, `APPROVED`, …).
---
## Requirements
* **Linux** host (tested) — macOS works if Docker Desktop has enough
resources for your models.
* **Docker** (engine ≥ 20.10) reachable by the host user.
* **Python 3.9+**.
* For GPU acceleration: NVIDIA GPU + the
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
* **Disk and RAM/VRAM** sized for your largest model — Ollama itself is
small but model weights aren't.
---
## Installation
Install from PyPI:
```bash
pip install team-core
```
Or clone the repository for the latest development version:
```bash
git clone https://github.com/cumbof/team.git
cd team
python -m venv .venv
. .venv/bin/activate
pip install -e .
```
Installs the `team` CLI into your virtualenv. Verify:
```bash
team --version
team --help
```
For development extras (pytest):
```bash
pip install -e ".[dev]"
pytest -q
```
---
## Quick start
1. Generate a starter spec:
```bash
team init my-team.yaml
```
2. Edit `my-team.yaml`: pick model names that exist in Ollama, write a
real `goal`, and tweak the personas.
3. Run it end-to-end (containers come up, models get pulled if needed,
workflow runs, containers come down):
```bash
team run my-team.yaml
```
4. Inspect the deliverables:
```bash
ls runs/my-team/shared/
team transcript my-team.yaml
```
5. Or manage the lifecycle by hand:
```bash
team up my-team.yaml # start all member containers
team status my-team.yaml # show container state
team logs my-team.yaml # tail Ollama logs per member
team run my-team.yaml --no-up --keep-up # run more rounds
team run my-team.yaml --resume # resume after a crash
team down my-team.yaml --purge # tear down + delete model caches
```
---
## Defining a team
A team is a single YAML file. Annotated minimal example:
```yaml
name: my-team # [a-z][a-z0-9_-]{0,30}
goal: |
Plain-English statement of what the team must accomplish.
workspace: ./runs/my-team # host directory; created on demand
workflow:
type: round_robin # round_robin | manager | review_loop
max_rounds: 6
defaults:
ollama_image: ollama/ollama:latest
context_window: 8192
temperature: 0.4
gpus: none # "all" | "none" | [0, 1, ...]
memory_limit: "16g" # optional Docker memory cap per member
cpu_limit: 4 # optional Docker CPU cap per member (cores)
pull_timeout: 1800
request_timeout: 600
members:
- name: lead
role: Project Lead
model: llama3.1:8b
persona: |
You coordinate the team.
- name: worker
role: Engineer
model: qwen2.5-coder:7b
persona: |
You implement code and produce concrete artifacts.
```
### Top-level fields
| field | required | description |
| --- | --- | --- |
| `name` | yes | DNS-safe team name; used in container/volume/network names. |
| `goal` | yes | The shared objective every member sees in its system prompt. |
| `workspace` | no | Host directory for shared/private workspaces and the transcript. Defaults to `./runs/`. |
| `workflow` | no | See below. Defaults to `round_robin` with 6 rounds. |
| `defaults` | no | Defaults inherited by every member that doesn't override them. |
| `members` | yes | Non-empty list of member specs (see below). |
### `defaults`
| key | type | default | meaning |
| --- | --- | --- | --- |
| `ollama_image` | string | `ollama/ollama:latest` | Image used for member containers. |
| `context_window` | int | `8192` | `num_ctx` passed to Ollama (`/api/chat` `options`). |
| `temperature` | float | `0.4` | Sampling temperature. |
| `top_p` | float | `0.9` | Top-p sampling. |
| `memory_limit` | string | unset | Docker `mem_limit` per member (e.g. `"12g"`). |
| `cpu_limit` | float | unset | Docker CPU cap per member (cores; e.g. `4`). |
| `gpus` | str / list | `none` | `"all"`, `"none"`, or list of GPU indices. |
| `pull_timeout` | int | `1800` | Seconds allowed for a model pull. |
| `request_timeout` | int | `600` | HTTP timeout per chat call. |
| `backend` | string | `ollama` | LLM backend: `"ollama"` or `"openai_compat"`. |
| `api_key` | string | unset | API key for `openai_compat` backend; supports `"env:VAR"`. |
| `context_strategy` | string | `none` | Context management: `"none"`, `"sliding_window"`, `"truncate"`, `"summarize"`. |
| `context_budget` | int | `0` | Budget for context management: max turns (`sliding_window`) or approx token count (`truncate`/`summarize`). |
| `tools` | list | `[]` | Built-in tools enabled for all members by default. |
| `max_tool_rounds` | int | `10` | Maximum agentic tool-call rounds per member turn. |
| `tool_timeout` | int | `300` | Seconds budget per individual tool execution (generous default to allow package installs). |
| `tool_mode` | string | `"text"` | Tool invocation mode: `"text"` (fenced blocks) or `"native"` (LLM function-calling API). |
| `skills` | list | `[]` | Skill plugin sources (local paths or remote URLs) available to all members. |
| `ollama_url` | string | unset | Route **all** members to an existing Ollama instance at this URL instead of starting Docker containers. Per-member `ollama_url` overrides this. See [Apple Silicon / no-Docker](#apple-silicon--no-docker-ollama). |
| `keep_alive` | string | `"-1"` | How long Ollama keeps a model loaded in RAM after a request. `"-1"` (default) means keep forever — models stay resident between turns. Accepts any Ollama duration string (`"5m"`, `"1h"`) or `"0"` to unload immediately after each call. |
### `workflow`
```yaml
workflow:
type: review_loop
max_rounds: 4
producer: postdoc
reviewer: reviewer
approve_token: APPROVED # only review_loop; default "APPROVED"
manager: tech_lead # only when type=manager
prompt_template: | # only sequential_chain; {prev_speaker} and {prev_content} available
@{prev_speaker} produced the following. Refine it:
{prev_content}
```
| `type` | extra options |
| --- | --- |
| `round_robin` | none |
| `manager` | `manager: ` |
| `review_loop` | `producer: `, `reviewer: `, optional `approve_token` |
| `sequential_chain` | optional `prompt_template` (supports `{prev_speaker}`, `{prev_content}`) |
| `debate` | `pro: `, `con: `, `judge: `, optional `rounds` |
| `parallel_review` | `producer: `, `reviewers: [m1, m2, …]` (≥2), `synthesizer: `, optional `approve_token` |
### `members`
| key | required | notes |
| --- | --- | --- |
| `name` | yes | DNS-safe; used as `@handle` in the protocol. |
| `role` | yes | Free-text role label. |
| `model` | yes | Any tag known to Ollama (`llama3.1:8b`, `qwen2.5-coder:7b`, …). |
| `persona` | yes | Free-text persona prompt; quoted block. |
| `temperature`, `top_p`, `context_window` | no | Per-member overrides of `defaults`. |
| `memory_limit`, `cpu_limit`, `gpus` | no | Per-member resource overrides. |
| `can_write_files` | no | Default `true`; set to `false` to forbid this member from creating files. |
| `extra_system` | no | Free-form text appended to the rendered system prompt. |
| `ollama_url` | no | Connect to an existing Ollama instance directly; skips Docker. |
| `backend` | no | `"ollama"` (default) or `"openai_compat"` — overrides `defaults.backend`. |
| `api_base` | no | Base URL for the OpenAI-compat API (required when `backend: openai_compat`). |
| `api_key` | no | API key; supports `"env:VAR"` to read from an environment variable. |
| `context_strategy` | no | Per-member override of context management strategy. |
| `context_budget` | no | Per-member override of context budget. |
| `tools` | no | List of tool names this member may use (e.g. `[web_search, run_python]`). |
| `max_tool_rounds` | no | Per-member override of the tool-round limit. |
| `tool_timeout` | no | Per-member override of the per-tool execution timeout (seconds, default 300). |
| `tool_mode` | no | Per-member override: `"text"` or `"native"` (default inherits from `defaults.tool_mode`). |
| `skills` | no | Member-specific skill sources merged with `defaults.skills`. |
| `keep_alive` | no | Per-member override for Ollama model retention (e.g. `"5m"`, `"-1"`). Inherits from `defaults.keep_alive` when absent. |
---
## The collaboration protocol
Every member receives a system prompt that includes a small,
deterministic protocol so the orchestrator can parse replies reliably:
* **Address a teammate**: prefix a section with `@:`.
* **Write or overwrite a file in the shared workspace**: emit a fenced
block with an `file:` info-string, e.g.
````
```file:manuscript/manuscript.md
# Title
...
```
````
The orchestrator atomically writes the body to that path under
`/shared/`. Path-traversal attempts (`..`) are rejected.
* **Private workspace**: each member has `/private` inside its container
(mapped to `runs//members//` on the host) for personal
scratch files, drafts, and notes that are not shared with the team.
The list of files currently in `/private` is shown at the top of each
of the member's turn prompts.
* **Declare the goal achieved**: end the reply with a line containing
exactly `[[TEAM_DONE]]`. Workflows interpret this as "stop now".
* **Manager workflow**: end the reply with `NEXT: @` to nominate
who speaks next.
* **Review-loop workflow**: the reviewer emits `APPROVED` (configurable)
when the deliverable is ready.
---
## Predefined persona library
Writing a good persona from scratch takes time. `team` ships with
**16 ready-made personas** spanning academic research, software engineering,
and general-purpose roles. Each persona lives in its own YAML file under
`personas/` at the root of this repository — making them easy to read,
edit, and contribute back to the project.
### How personas are stored
```
personas/
├── pi.yaml # Principal Investigator
├── postdoc.yaml # Postdoctoral Researcher
├── phd.yaml # PhD Student
├── reviewer.yaml # Critical Reviewer
├── statistician.yaml # Statistician
├── bioinformatician.yaml
├── ml_researcher.yaml
├── architect.yaml
├── engineer.yaml
├── qa.yaml
├── devops.yaml
├── tech_writer.yaml
├── analyst.yaml
├── writer.yaml
├── manager.yaml
└── ethicist.yaml
```
Each file follows the same simple format:
```yaml
role: Principal Investigator
description: Lab director — sets research direction, evaluates results, writes grants.
persona: |
You are a tenured Principal Investigator at a research university.
Your role is to set and guard the scientific direction of the project.
...
```
The filename stem (e.g. `pi` from `pi.yaml`) becomes the `@`-key used in team
YAML files.
### Available personas
| Key | Role | Description |
| --- | --- | --- |
| `@pi` | Principal Investigator | Lab director — sets research direction, evaluates results, writes grants. |
| `@postdoc` | Postdoctoral Researcher | Senior researcher — deep expertise, drives experiments and analysis. |
| `@phd` | PhD Student | Junior researcher — literature review, baseline experiments, drafting. |
| `@reviewer` | Critical Reviewer | Peer-review skeptic — challenges assumptions, finds weaknesses. |
| `@statistician` | Statistician | Statistical methodologist — study design, power, inference correctness. |
| `@bioinformatician` | Bioinformatician | Omics data specialist — pipelines, databases, variant/sequence analysis. |
| `@ml_researcher` | Machine Learning Researcher | ML specialist — model design, training, evaluation, ablations. |
| `@architect` | Software Architect | System designer — API contracts, scalability, tech decisions. |
| `@engineer` | Software Engineer | Implementer — writes production-quality code, debugs, reviews PRs. |
| `@qa` | QA Engineer | Quality assurance — test strategy, edge cases, regression detection. |
| `@devops` | DevOps / SRE | Infrastructure and reliability — CI/CD, monitoring, deployment. |
| `@tech_writer` | Technical Writer | Documentation specialist — clarity, structure, audience-appropriate prose. |
| `@analyst` | Data Analyst | Data explorer — EDA, visualisation, dashboards, business insights. |
| `@writer` | Science Writer | Communicator — translates technical findings into compelling narratives. |
| `@manager` | Project Manager | Coordinator — milestones, blockers, stakeholder communication. |
| `@ethicist` | AI / Research Ethicist | Ethics and compliance — bias, fairness, privacy, responsible use. |
Browse the library from the terminal:
```bash
team personas # list all personas with key, role, description
team personas pi # print the full persona text for @pi
team personas engineer # print the full persona text for @engineer
```
### Using a persona in YAML
Set `persona` to `@` instead of writing a persona block:
```yaml
members:
- name: alice
model: llama3.1:70b
persona: "@pi" # role is set to "Principal Investigator" automatically
- name: bob
model: llama3.1:8b
persona: "@phd" # role is "PhD Student"
- name: carol
model: qwen2.5:7b
persona: "@reviewer" # role is "Critical Reviewer"
```
You can override the default role while keeping the library persona text:
```yaml
- name: alice
model: llama3.1:70b
persona: "@pi"
role: "Lab Director" # custom title; persona text stays the same
```
You can also mix library personas with fully custom ones in the same team:
```yaml
members:
- name: alice
model: llama3.1:70b
persona: "@pi"
- name: custom
role: Domain Expert
model: llama3.1:8b
persona: |
You are a specialist in protein crystallography with 20 years of
experimental experience. You validate all structural claims against
PDB data.
```
### Adding your own personas
**Option 1 — contribute to the built-in library** (share with everyone):
Drop a `.yaml` file into the `personas/` directory at the repo root and submit
a pull request. The file name becomes the `@`-key.
**Option 2 — project-local personas** (private to your setup):
Point `TEAM_PERSONA_DIR` at any directory; files there are loaded *in addition
to* the built-in library and take precedence over built-in keys with the same
name:
```bash
export TEAM_PERSONA_DIR=~/.team/personas
```
Then add files like `~/.team/personas/clinician.yaml`:
```yaml
role: Clinical Research Collaborator
description: Translates findings into clinical context and regulatory language.
persona: |
You are a physician-scientist with expertise in clinical trial design.
You translate pre-clinical findings into clinical hypotheses, identify
regulatory hurdles (FDA, EMA) early, and ensure the team's outputs are
framed for a clinical audience.
```
Any team YAML can now use `persona: "@clinician"` once the env var is set.
---
## Workflows
### `round_robin`
Every member speaks in declaration order. Repeat for `max_rounds` full
rounds, or until a member emits `[[TEAM_DONE]]`. Useful for brainstorms
and small symmetric teams.
### `manager`
A designated `manager` member opens the work, then after every other
member's turn the manager is asked again to evaluate progress and
nominate the next speaker via `NEXT: @`. The manager can also
take the floor itself, or end the run with `[[TEAM_DONE]]`.
### `review_loop`
A `producer` writes the first draft. A `reviewer` critiques it; the
producer revises; repeat until the reviewer emits `APPROVED` (or
`max_rounds` revisions are reached). When approved, the producer is
given one final turn to finalise and is expected to end with
`[[TEAM_DONE]]`. Ideal for any "make a deliverable, then iterate until
acceptable" workflow (papers, design docs, code).
### `sequential_chain`
Members form a **pipeline**: the first member runs with the default
prompt, then each subsequent member receives the previous member's full
reply as its explicit prompt. At the end of a round the chain wraps
around, so the first member of round N+1 receives the last member of
round N's output.
Use this when the work is a transformation series — for example:
* drafter → editor → translator → formatter
* researcher → summariser → chart-generator
Optional `prompt_template` controls how the handoff is framed; it can
use the `{prev_speaker}` and `{prev_content}` placeholders:
```yaml
workflow:
type: sequential_chain
max_rounds: 2
prompt_template: |
@{prev_speaker} produced the following output.
Your task is to refine and improve it:
{prev_content}
```
### `debate`
Two opposing members argue a proposition for N rounds, then a judge
member delivers a verdict.
```yaml
workflow:
type: debate
rounds: 3 # pro/con exchange rounds before the judge speaks (default: 3)
pro: alice # member arguing in favour
con: bob # member arguing against
judge: carol # member delivering the final verdict
```
1. The **pro** member makes an opening statement.
2. The **con** member rebuts.
3. Steps 1–2 repeat for `rounds` rounds.
4. The **judge** receives the full exchange and delivers a verdict.
5. Any member can end early by emitting `[[TEAM_DONE]]`.
### `parallel_review`
Like `review_loop` but all reviewers read the deliverable **at the same time**
(using a thread pool), so the total review wall-time is bounded by the
*slowest* reviewer, not the sum of all reviewers. A designated **synthesizer**
then consolidates the parallel reviews into one prioritised verdict, and the
**producer** revises.
```yaml
workflow:
type: parallel_review
max_rounds: 4 # max revision cycles before stopping
producer: writer # who creates and revises the deliverable
reviewers: # 2 or more members who review in parallel
- methods_reviewer
- stats_reviewer
- clarity_reviewer
synthesizer: editor # consolidates the parallel reviews (may equal producer)
approve_token: APPROVED # optional; default is "APPROVED"
```
**Flow per revision cycle:**
1. All reviewers are dispatched simultaneously; each receives the same
transcript snapshot and produces its review independently.
2. Reviews are appended to the transcript in declaration order.
3. The **synthesizer** reads all reviews and emits a consolidated verdict
(or `APPROVED` when no further changes are needed).
4. If approved, the producer finalises and emits `[[TEAM_DONE]]`.
5. Otherwise the producer addresses the feedback and the cycle repeats.
> **Thread-safety note:** Reviewer turns are truly parallel LLM calls.
> Each reviewer reads the transcript (read-only during the parallel window)
> and calls its own model. Reviewers should not use file-writing tools
> during their review turns to avoid concurrent workspace writes.
---
### `parallel`
All members speak **simultaneously** in every round. Unlike `parallel_review`
(which has a fixed producer → reviewers → synthesizer structure), `parallel`
is fully symmetric: every declared member runs at the same time, every round.
Each member receives the same transcript snapshot at the start of the round —
it cannot see what another member wrote *in the current round*, only in
previous rounds. After all threads complete, turns are appended in member
declaration order so the transcript is deterministic and `--resume` works.
```yaml
workflow:
type: parallel
max_rounds: 4
```
**When to use `parallel`**
- Independent expert panels — each member evaluates the problem from its own
perspective and writes its findings simultaneously.
- Embarrassingly parallel tasks — member A generates candidate A, member B
generates candidate B; a later sequential step (or `sequential_chain`) picks
the best.
- Speed-critical brainstorming where sequential dialogue would be too slow.
**Rendering**
The CLI shows a `⚡ parallel` separator banner before the round starts, then
renders each member's completed panel (with full content, file-write list, and
colour) when the round finishes — no token-by-token streaming during the
parallel window.
> **Thread-safety note:** Members read the transcript concurrently (safe) and
> write to the shared workspace. Concurrent writes to the *same file path*
> are a race condition. Design your team so that parallel members produce
> output in disjoint paths (e.g. `member_a/output.txt` vs `member_b/output.txt`).
---
## Workspaces and artifacts
For team `` with `workspace: ./runs/` you get:
```
runs//
├── transcript.jsonl # one JSON object per turn
├── shared/ # mounted as /workspace inside every container
│ └──
├── checkpoints/ # automatic point-in-time snapshots (one per live turn)
│ ├── 0001_alice_20240501T120000/
│ ├── 0002_bob_20240501T120145/
│ └── ...
└── members/
├── pi/ # mounted as /private inside the pi container
├── postdoc/
└── ...
```
* `shared/` is the canonical place for deliverables and is visible to
every member at every turn.
* `members//` is the **private workspace** for that member. Its
contents are listed in the member's turn prompt under *"Files in your
private workspace (/private)"*, so the member can reference its own
previous work, intermediate files, or notes across turns. Other members
cannot see these files.
* `transcript.jsonl` is appended to as the run progresses; one record per
turn, with `speaker`, `role`, `content`, `files_written`, and
`timestamp` fields.
`team transcript ` renders the transcript human-readably.
---
## Containers, isolation, and root
Each member runs in **its own container** with the following properties:
| property | value | rationale |
| --- | --- | --- |
| Image | `ollama/ollama:latest` (overridable) | Standard Ollama runtime. |
| User inside | **root** | Members have full root *inside their own filesystem*, satisfying "root inside the container" without granting host root. |
| Network | per-team Docker bridge `team--net`, isolated from other teams and from your host services | Members can only reach each other through the orchestrator, not directly. |
| Port exposure | `127.0.0.1::11434` | Each member's Ollama API is reachable only from the host loopback by the orchestrator. |
| Model cache | per-member named volume `team---models` | Members do *not* share model storage. |
| Mounts | shared workspace at `/workspace`, private workspace at `/private` | Conventional file-exchange surface. |
| Restart policy | `unless-stopped` | Survives daemon restarts during long runs. |
| Resource caps | `memory_limit`, `cpu_limit` honoured if set | Keep large models from starving the host. |
Containers are **not** run with `--privileged` and do not get any host
device access by default; root is confined to the container's mount and
PID namespaces. You can pass GPUs explicitly via `gpus` (see below).
---
## GPU support
Set `gpus` either globally (under `defaults`) or per-member:
```yaml
defaults:
gpus: all # all visible GPUs
members:
- name: pi
gpus: [0] # only GPU 0
- name: postdoc
gpus: none # CPU only
```
Requires the NVIDIA Container Toolkit on the host. Passed through to
Docker via device requests; non-NVIDIA setups can leave `gpus: none`.
### Apple Silicon / no-Docker Ollama
Docker Desktop on **macOS** runs a Linux VM that cannot access the host's
GPU (neither NVIDIA nor Apple Metal). Using `gpus: all` there produces:
```
could not select device driver "nvidia" with capabilities [[gpu]]
```
There are two escape hatches:
#### Option A — CPU-only containers (`--no-gpu`)
Pass `--no-gpu` to `team up` or `team run`. All containers are started
without GPU device requests and fall back to CPU inference inside Docker.
No YAML change required, but inference will be slow on large models.
```bash
team run myteam.yaml --no-gpu
team up myteam.yaml --no-gpu
```
#### Option B — Native host Ollama with Metal (recommended for Apple Silicon)
Install [Ollama for macOS](https://ollama.com) natively. The native app
uses **Apple Metal** for GPU acceleration and is dramatically faster than
CPU-only Docker containers. Then tell `team` to bypass Docker entirely and
connect all members to it:
**Via CLI flag** (no YAML change):
```bash
# Default URL is http://localhost:11434
team run myteam.yaml --host-ollama http://localhost:11434
team up myteam.yaml --host-ollama http://localhost:11434
```
**Via YAML** (permanent):
```yaml
defaults:
ollama_url: http://localhost:11434 # all members skip Docker
```
When `defaults.ollama_url` is set (or `--host-ollama` is passed), no Ollama
containers are started; the orchestrator connects directly to the given URL.
Per-member `ollama_url` overrides the default for individual members.
> **`team check` will report a `FAIL`** on macOS when GPU is requested
> without an `ollama_url` configured, and will guide you to one of the two
> options above.
---
## OpenAI-compatible backends
By default every member runs Ollama in a Docker container. You can instead
point any member at any **OpenAI-compatible API** — LM Studio, vLLM, llama.cpp
server, the real OpenAI API, Anthropic (via a LiteLLM proxy), etc. — without
Docker.
```yaml
defaults:
backend: openai_compat
api_base: http://localhost:1234/v1 # LM Studio
api_key: env:OPENAI_API_KEY # or a literal key
members:
- name: lead
role: Tech Lead
model: gpt-4o # model name sent to the API
persona: ...
- name: worker
role: Engineer
model: llama-3.1-8b-instruct
backend: ollama # this member still uses Docker
persona: ...
```
The `backend` and `api_base` fields can be set globally in `defaults` or
overridden per-member.
| field | meaning |
| --- | --- |
| `backend` | `"ollama"` (default) or `"openai_compat"` |
| `api_base` | Base URL of the OpenAI-compat API (e.g. `https://api.openai.com/v1`) |
| `api_key` | API key; use `"env:VAR"` to read from environment at runtime |
When `backend: openai_compat` is set, no Docker container is started for
that member — the orchestrator calls the remote API directly. The `model`
field is passed as-is to the API.
---
## Remote / no-Docker Ollama
If you already have an Ollama server running (locally or on a remote
machine), you can skip Docker for individual members by setting `ollama_url`:
```yaml
members:
- name: researcher
role: Researcher
model: llama3.1:70b
ollama_url: http://192.168.1.10:11434 # existing Ollama instance
persona: ...
```
To route **all** members to the same Ollama instance, set it in `defaults`
or pass `--host-ollama` on the command line (see
[Apple Silicon / no-Docker](#apple-silicon--no-docker-ollama)):
```yaml
defaults:
ollama_url: http://localhost:11434
```
No container is started for any member that has an effective `ollama_url`
(per-member or from `defaults`); the orchestrator connects directly to the
given URL. The model must already be pulled on that server (or Ollama's
automatic pull will fetch it on first use).
---
## Custom Ollama image
`docker/Dockerfile.ollama` is an optional, slightly-augmented image that
adds `python3`, `git`, `jq`, `curl`, and friends on top of
`ollama/ollama:latest` for members that want richer in-container
tooling. Build it once and reference it from any team:
```bash
docker build -f docker/Dockerfile.ollama -t team/ollama:latest docker/
```
```yaml
defaults:
ollama_image: team/ollama:latest
```
The default `ollama/ollama:latest` is fine for most uses.
---
## Context window management
By default the orchestrator passes the full transcript to every member
every turn. For long-running teams this can exceed a model's context
window, causing silent truncation or errors. Configure a strategy to
keep the context manageable:
```yaml
defaults:
context_strategy: sliding_window # none | sliding_window | truncate | summarize
context_budget: 20 # max turns (sliding_window) or ~token budget (truncate/summarize)
```
| strategy | behaviour |
| --- | --- |
| `none` (default) | Full transcript always sent. |
| `sliding_window` | Only the last `context_budget` turns are sent. |
| `truncate` | Oldest turns are dropped until the estimated token count fits within `context_budget`. A note is prepended explaining that earlier turns were omitted. |
| `summarize` | The oldest turns are compressed into a concise bullet-point digest by calling the member's own LLM (at temperature 0.2). The digest is prepended under a *"Summary of N earlier turn(s)"* heading; the most-recent turns are kept verbatim. 80 % of `context_budget` is reserved for recent turns, 20 % for the digest. Falls back to a plain omission notice if the summarization call fails. |
Override per member:
```yaml
members:
- name: reviewer
context_strategy: sliding_window
context_budget: 10 # this member sees only the last 10 turns
```
---
## Model retention (`keep_alive`)
By default, `team` sets Ollama's `keep_alive` to `"-1"` on every chat request, which tells Ollama to keep the model loaded in RAM indefinitely. Without this, Ollama's built-in default evicts a model after 5 minutes of inactivity — a problem for large models (tens of gigabytes) that must repeatedly load and unload between turns.
```yaml
defaults:
keep_alive: "-1" # keep every model loaded for the duration of the run (default)
members:
- name: summarizer
model: llama3.2:3b
keep_alive: "5m" # lightweight model — OK to evict after 5 minutes of idle
...
```
| Value | Behaviour |
| --- | --- |
| `"-1"` | Keep the model loaded until Ollama stops or another model claim evicts it. **Recommended for team runs.** |
| `"5m"`, `"1h"`, … | Evict after the given idle period (Ollama duration string). |
| `"0"` | Unload immediately after each request (maximises GPU headroom at the cost of reload latency). |
`keep_alive` is an Ollama-only parameter. When the `openai_compat` backend is used it is silently ignored.
---
## CLI reference
```text
team init [PATH] Write a starter team YAML.
team new [PATH] Interactive wizard to create a new team YAML.
team validate Parse and validate the YAML.
team check Run preflight checks (no Docker started).
team up Start containers, pull models.
[--no-gpu] [--host-ollama URL]
team status Show container status per member.
team logs Tail per-member Ollama logs.
[--member NAME] [--tail N]
team run Up + run workflow + (down).
[--no-up] [--keep-up] [--resume] [--no-stream] [--interactive]
[--no-gpu] [--host-ollama URL]
team transcript Render the persisted transcript.
team export Export transcript + artifacts to a report.
[--format markdown|html|json] [--output PATH] [--no-artifacts]
team checkpoints List all workspace checkpoints.
team restore Restore the shared workspace to a checkpoint.
team down Stop & remove containers (and volumes).
[--purge]
```
Common flags:
* `-v / --verbose` — debug-level logging.
* `--prepare-timeout SECONDS` (on `up`/`run`) — how long to wait for each
member's Ollama daemon to become ready and its model to finish pulling
(default 600).
---
## Interactive wizard
`team new` launches a guided wizard that asks you a series of questions
and writes a validated YAML:
```bash
team new my-team.yaml
```
The wizard prompts for:
* Team name and goal
* Number of members, and for each: name, role, model, persona
* Workflow type and max rounds
* Workspace path
The output is a fully-formed, validated YAML ready to use with `team run`.
---
## Pre-flight checks
Before starting containers, verify that the environment is ready with
`team check`:
```bash
team check my-team.yaml
```
The command checks:
| Check | What it tests |
|---|---|
| Workspace writable | Can create the workspace directory and write files to it |
| Disk space | Reports available GB; warns if below **5 GB** |
| Docker daemon | Docker daemon reachable, version ≥ 20.10, Ollama image present |
| GPU availability | Runs `nvidia-smi` when any member requests GPUs; warns if not found |
Exit code is `0` when all checks pass (warnings allowed), `1` when any
check fails. Failures are shown with a red ✗ and warnings with a yellow ⚠.
---
## Streaming output
By default `team run` streams each member's reply **token-by-token** to the
terminal as it is generated. You see a header like `@alice (Lead)` followed
by the reply appearing live — no waiting for the full response.
To disable streaming (e.g. for CI or when redirecting output to a file):
```bash
team run my-team.yaml --no-stream
```
With `--no-stream` the full reply is printed at once after each turn
completes.
---
## Per-turn timeout
Set a hard wall-clock deadline (seconds) on how long any single member turn
may take. If the LLM doesn't finish within the limit, a `TurnTimeoutError`
is raised and the workflow stops.
```yaml
defaults:
turn_timeout: 120 # 2 minutes for every member by default
members:
- name: fast_reviewer
role: Reviewer
model: qwen2.5:3b
persona: You review code quickly.
turn_timeout: 30 # override — this member gets only 30 s
```
Set `turn_timeout: 0` (or leave it absent) to disable timeouts entirely.
**Implementation details**
The member's `take_turn()` is executed in a `ThreadPoolExecutor` thread and
`future.result(timeout=…)` enforces the deadline. If the timeout fires the
thread is abandoned (it will eventually finish and be garbage-collected), but
the calling workflow raises `TurnTimeoutError` immediately.
---
## LLM retry with backoff
`team` automatically retries LLM calls that fail due to transient infrastructure errors — connection refused, timeouts, and HTTP 5xx responses from the server — using **exponential backoff**.
```yaml
defaults:
max_retries: 3 # attempts per call (default: 3; 0 = no retries)
retry_backoff: 2.0 # backoff base in seconds (wait = backoff ** attempt)
members:
- name: alice
max_retries: 5 # per-member override
retry_backoff: 1.5
```
### How it works
| Scenario | Behaviour |
| --- | --- |
| Connection refused / timeout | Retried up to `max_retries` times. |
| HTTP 5xx (server error) | Retried — the server never processed the request. |
| HTTP 4xx (client error) | **Not retried** — a bad model name or malformed request won't self-heal. |
| Partial streaming response | **Not retried** — the caller already received tokens; replaying would produce duplicates. |
The wait between attempts is `retry_backoff ** attempt` seconds (attempt 0 → 1 s, attempt 1 → 2 s, attempt 2 → 4 s for the default `retry_backoff=2.0`).
### When all retries are exhausted
`LLMRetryExhaustedError` (a subclass of `OllamaError`) is raised. The CLI catches it and prints a red error panel instead of crashing, preserving any transcript written so far.
---
## Resuming an interrupted run
If a run is interrupted (crash, timeout, Ctrl-C) you can pick up exactly
where it left off without re-running the turns that already completed:
```bash
team run my-team.yaml --resume
```
`--resume` loads the existing `transcript.jsonl`, replays every already-
completed turn instantly (no LLM call), and then continues the workflow
live from the first missing turn.
* Containers are restarted (or re-used) as normal; models are not re-pulled
if their cache volumes still exist.
* Combine with `--no-up` if your containers are already running from a
previous `team up`.
* If the transcript doesn't exist or is empty, `--resume` is a no-op and
the run starts fresh.
* If the previous run completed, resuming is a harmless no-op: the workflow
will detect `[[TEAM_DONE]]` in the first replayed turn and exit immediately.
---
## Human-in-the-loop intervention
You can inject new directives into a running team at any time without
stopping or restarting. Two mechanisms are available:
### Interactive mode (foreground runs)
Pass `--interactive` to `team run`. After every workflow round completes
you are prompted for an optional directive. Press **Enter** with no text to
let the run continue, or type instructions and press **Enter** to have them
injected before the next round:
```bash
team run my-team.yaml --interactive
```
```text
── round 1/4 complete ──
Enter a directive for the team (or press Enter to continue): Focus only on the auth module for now.
↳ directive injected
```
### File-based injection (background / CI runs)
At any point during a run you can write a plain-text file called
`inject.txt` into the workspace directory:
```bash
echo "Switch to Python 3.12 syntax only." > ./runs/my-team/inject.txt
```
Before the **next member turn** begins, the orchestrator checks for this
file. If it exists, the content is read, the file is deleted, and the
directive is appended to the transcript as a `@human (director)` turn.
All members see it in their next turn's conversation context.
The file is consumed once and automatically removed. Drop a new file to
inject again at any later point.
### What the team sees
Both mechanisms produce the same type of transcript entry:
```text
--- Turn N | @human | director ---
```
The entry is visible to every member in their next turn prompt, just like
any other speaker's turn.
---
## Agent mode and tool use
Members can act as **agents**: they may call external tools, then receive
the tool's output and continue reasoning — all within the same logical turn.
Two invocation modes are supported:
| Mode | How it works |
| --- | --- |
| `text` (default) | Member emits fenced `tool:` blocks in its reply; orchestrator parses and executes them. Works with any model. |
| `native` | Uses the LLM's **function-calling API** (Ollama `tools` parameter / OpenAI function calling). Requires a compatible model (Llama 3.1+, Qwen 2.5, GPT-4 family, etc.). |
### Enabling tools
```yaml
defaults:
tools: [web_search, run_python] # enable globally
max_tool_rounds: 10 # max tool-call rounds per turn (default: 10)
tool_timeout: 300 # seconds per tool execution (default: 300)
tool_mode: text # "text" (default) or "native"
members:
- name: researcher
tools: [web_search, read_url] # per-member override
tool_mode: native # this member uses function-calling API
- name: data_scientist
tools: [run_python, run_bash, read_file, write_file, append_file, list_files]
```
### Tool invocation syntax — `text` mode
A member invokes a tool by emitting a fenced block with a `tool:`
info-string:
````
```tool:web_search
query: IPCC AR6 key findings 2024
```
````
### Tool invocation — `native` mode
In native mode the model receives **JSON Schema** definitions for all
enabled tools and returns structured `tool_calls` objects (OpenAI/Ollama
function-calling format) instead of text fenced blocks. The orchestrator
executes the tools and passes results back via `tool` role messages — no
text parsing required.
Every built-in tool has a corresponding JSON Schema automatically provided
to the model. Custom skill tools that lack a schema receive a minimal
`input: string` schema.
> **Model requirements**: native mode requires a model that supports
> function calling. For Ollama, use `llama3.1:8b` or newer, `qwen2.5:7b`,
> `mistral-nemo`, etc. For OpenAI-compat backends, any GPT-4 / Claude
> model works. If you pass native mode to a model that ignores the `tools`
> parameter, it will fall back to producing a text reply (no tool calls).
````
```tool:run_python
import pandas as pd
df = pd.read_csv('/workspace/shared/data.csv')
print(df.describe())
```
````
````
```tool:read_file
path: analysis/results.json
```
````
````
```tool:write_file
path: output/summary.md
---
# Summary
This file was written by the agent.
```
````
````
```tool:append_file
path: logs/run.log
---
[step 3] analysis complete.
```
````
````
```tool:list_files
pattern: *.py
```
````
After each tool block the orchestrator executes the tool, injects the result
back into the conversation, and asks the member to continue. Once the member
produces a reply with no tool blocks, that reply is recorded in the
transcript as usual.
### Available built-in tools
| tool | description |
| --- | --- |
| `run_python` | Execute Python code; cwd is the shared workspace directory. |
| `run_bash` | Execute a bash command; cwd is the shared workspace directory. |
| `web_search` | Search the web via the DuckDuckGo instant-answer API (no key required). |
| `read_url` | Fetch and return the plain-text content of a URL. |
| `read_file` | Read a file from the shared workspace by relative path. |
| `write_file` | Write (create or overwrite) a file in the shared workspace. |
| `append_file` | Append text to a file in the shared workspace. |
| `list_files` | List files in the shared workspace with an optional glob filter. |
| `remember` | Store a memory in the member's **persistent cross-session** memory store. |
| `recall` | Search the member's persistent memory by keyword. |
| `forget` | Delete a memory by key from the persistent store. |
| `list_memories` | List stored memories (optionally filtered by tag). |
| `assert_belief` | Add a claim to the team's **shared belief board** with confidence score. |
| `contest_belief` | Contest an existing belief (moves it to contested status). |
| `accept_belief` | Cast an accept vote for an existing belief. |
| `list_beliefs` | List the shared belief board (optionally filtered by status). |
| `delegate_task` | Delegate a sub-task to a remote bridge server and wait for results. Use `peer:` for named peers or `url:` for direct addressing. |
| `list_peers` | List all configured peer teams and their live health status (pending/running counts). |
| `broadcast_task` | Fan out the same goal to multiple peer teams concurrently and collect all results. |
| `cancel_remote_task` | Cancel a queued or running task on a remote bridge server by task ID. |
| `delegate_to_expert` | Send a prompt to an external cloud LLM (OpenAI, Anthropic, Google) for expert assistance when the task exceeds local capabilities. |
| `log_decision` | Append a timestamped decision entry to `decisions.md` in the shared workspace. |
| `read_decisions` | Read the full decision log (`decisions.md`) from the shared workspace. |
| `query_registry` | Query a team registry to discover teams matching capability tags or a keyword; returns names, URLs, and tags. |
| `sync_beliefs` | Synchronize the team belief board with a remote team cluster (pull, push, or both directions). |
**`write_file` and `append_file` body format**
Both tools use a two-part body separated by a `---` line:
```
path: relative/path/to/file.txt
---
File content goes here.
Multiple lines are fine.
```
The path is relative to the shared workspace root. Parent directories are
created automatically. `write_file` replaces any existing content;
`append_file` adds to the end of the file (creating it if it does not exist).
**`list_files` body format**
The body is optional. If omitted, all workspace files are listed. Use a
`pattern:` key to filter by glob pattern:
```
pattern: **/*.py
```
### Security note
`run_python` and `run_bash` execute code on the **host machine** with the
privileges of the `team` process. Only enable these tools for members whose
prompts you trust.
### Expert delegation — `delegate_to_expert`
When a task is too complex for the local model assigned to a member, that
member can **delegate the sub-problem** to a subscription-based cloud LLM
(ChatGPT, Claude, Gemini) and receive the answer as a tool result. The
member remains responsible for the turn — it incorporates the expert's reply
into its own response, so the team structure and role assignments are preserved.
The cloud model is **not** a team member. It has no access to the
transcript, the shared workspace, or any other team state — only the prompt
text you explicitly send.
#### Setup
Export the API key for the provider(s) you want to use **on the host** before
running `team`:
```bash
export OPENAI_API_KEY=sk-… # for provider: openai
export ANTHROPIC_API_KEY=sk-ant-… # for provider: anthropic
export GOOGLE_API_KEY=AIza… # for provider: google
```
Enable the tool for a member in the YAML:
```yaml
members:
- name: analyst
model: llama3.2:3b
tools: [delegate_to_expert, read_file, write_file]
```
#### Usage
**Multi-line prompt (recommended for complex requests)**:
````
```tool:delegate_to_expert
provider: openai
model: gpt-4o
max_tokens: 4096
temperature: 0.2
---
You are a statistics expert.
Given the following regression output, identify any violations
of linear regression assumptions and suggest remedies.
Residuals: …
```
````
**Single-line prompt**:
````
```tool:delegate_to_expert
provider: anthropic
model: claude-opus-4-5
prompt: What is the time complexity of Dijkstra's algorithm with a binary heap?
```
````
| field | required | default | description |
| --- | --- | --- | --- |
| `provider` | ✓ | — | `openai`, `anthropic`, or `google` |
| `model` | | provider default | Model name accepted by the provider API |
| `prompt` | ✓* | — | Prompt text (single-line form; ignored when `---` body is present) |
| `max_tokens` | | `2048` | Maximum tokens in the response |
| `temperature` | | `0.2` | Sampling temperature 0–2 |
\* Required unless a `---` body separator is used.
**Provider defaults**: `gpt-4o` (OpenAI), `claude-opus-4-5` (Anthropic),
`gemini-1.5-pro` (Google).
> **Privacy**: the prompt text is sent to the external API. Do not include
> sensitive data unless your data-handling agreement with the provider permits it.
> Only enable `delegate_to_expert` for members that may handle the data appropriately.
### Full system access and package installation
Agents have **full, unrestricted access to the host system** — the same
privileges as the user who runs the `team` process. This is intentional:
agents should be able to do anything a human researcher or engineer can do.
In particular, agents can install software at will:
````
```tool:run_bash
pip install scikit-learn seaborn --quiet
```
````
````
```tool:run_bash
apt-get install -y ffmpeg
```
````
````
```tool:run_python
import subprocess, sys
subprocess.run([sys.executable, "-m", "pip", "install", "biopython"], check=True)
import Bio
print(Bio.__version__)
```
````
When a tool invocation takes longer than expected (e.g. downloading a large
package), increase the `tool_timeout` in your YAML:
```yaml
defaults:
tool_timeout: 600 # 10 minutes — safe for most installs
```
The default `tool_timeout` is **300 seconds** (5 minutes), which covers the
vast majority of `pip install` and `apt-get` operations on a normal network
connection.
### How it works
**Text mode** (`tool_mode: text`):
```
member turn:
1. LLM called with system prompt + conversation context
2. If reply contains tool: fenced blocks → execute each tool
3. Tool results injected as a follow-up user message
4. LLM called again (no streaming; repeats up to max_tool_rounds)
5. If no tool blocks in reply → reply recorded in transcript
```
**Native mode** (`tool_mode: native`):
```
member turn:
1. LLM called with JSON Schema tool definitions in the "tools" parameter
2. If response contains tool_calls → execute each named tool using args_to_body()
3. Each result injected as a "tool" role message
4. LLM called again (repeats up to max_tool_rounds)
5. When LLM returns text (no tool_calls) → reply recorded in transcript
```
Token usage from all tool-call rounds is accumulated and reported in the
[token usage summary](#token-usage-tracking).
### Streaming display
When streaming is enabled (`team run` without `--no-stream`), tool calls
are displayed inline:
```text
@researcher (Research Lead)
I'll search for recent data on this topic.
🔧 tool: web_search query: climate change 2024 report
↳ **Climate Change** A programming language. - Flooding in coastal…
Based on the search, the key findings are…
```
---
### Custom skill plugins
The built-in tool set is a starting point. You can extend it with any
Python file — local or fetched from a URL — and make those tools
available to any member. This gives agents effectively **unlimited**
capabilities depending on what skills you provide.
#### Skill file format
A skill file must expose tools in one of two formats:
**Single-tool format** (`TOOL_NAME` + `execute`):
```python
# skills/my_calculator.py
TOOL_NAME = "my_calculator"
TOOL_DESCRIPTION = "Evaluate a Python arithmetic expression."
def execute(body, *, workspace_path=None, timeout=30, **kwargs):
try:
return str(eval(body.strip(), {"__builtins__": {}}, {}))
except Exception as exc:
return f"ERROR: {exc}"
```
**Multi-tool format** (`TOOLS` dict + optional `TOOL_DESCRIPTIONS`):
```python
# skills/db_tools.py
import sqlite3
def _query(body, *, workspace_path=None, **kwargs):
db_path = workspace_path / "data.sqlite"
conn = sqlite3.connect(db_path)
rows = conn.execute(body.strip()).fetchall()
conn.close()
return "\n".join(str(r) for r in rows)
def _schema(body, *, workspace_path=None, **kwargs):
db_path = workspace_path / "data.sqlite"
conn = sqlite3.connect(db_path)
rows = conn.execute("SELECT name, sql FROM sqlite_master WHERE type='table'").fetchall()
conn.close()
return "\n".join(f"{r[0]}: {r[1]}" for r in rows)
TOOLS = {"sql_query": _query, "sql_schema": _schema}
TOOL_DESCRIPTIONS = {
"sql_query": "Run an SQL SELECT on the shared SQLite database.",
"sql_schema": "Return the schema of all tables in the shared SQLite database.",
}
```
Both formats can coexist in the same file.
#### Configuring skills
Add skill sources under `defaults.skills` (inherited by all members) or
`members[*].skills` (member-specific, merged with defaults on top):
```yaml
defaults:
skills:
- path: ./skills/my_calculator.py # local path (relative to CWD)
- path: ./skills/db_tools.py
- url: https://example.com/skill.py # remote URL (see security note below)
checksum: sha256:e3b0c44298fc… # optional integrity check
- ./skills/shorthand.py # plain string = auto-detect local/remote
tools: [web_search, my_calculator, sql_query, sql_schema] # opt-in by name
members:
- name: analyst
tools: [sql_query, sql_schema, run_python] # member-specific tool set
skills:
- ./skills/analyst_helpers.py # member-specific extra skill
```
Tool names from skills are used exactly like built-in tool names everywhere
(`tools:` lists, `tool:` fenced blocks, system prompts).
#### Checksum verification
For any skill (local or remote) you can supply a checksum to verify
integrity before execution:
```yaml
skills:
- url: https://example.com/skill.py
checksum: sha256:
- path: ./skills/local.py
checksum: sha256:
```
Supported algorithms: any name accepted by Python's `hashlib` (e.g.
`sha256`, `sha512`, `md5`). `team` raises an error and refuses to load
the skill if the digest does not match.
#### Markdown skills — context injection
Skills do not have to be executable code. A Markdown file (`.md`) loaded
as a skill has its content injected verbatim into the member's **system
prompt** at startup — no tool call required. Use this for guidelines,
checklists, templates, and domain rules that should always be visible.
```yaml
defaults:
skills:
- path: ./skills/review_checklist.md # injected into system prompt
- path: ./skills/task_board.py # callable tool as usual
```
A Python skill can also inject context by setting the `INJECT_INTO_CONTEXT`
variable to a non-empty string — the text is injected *and* the tool
remains callable:
```python
TOOL_NAME = "style_guide"
INJECT_INTO_CONTEXT = "## Style guide\n- Use snake_case for all variables.\n..."
def execute(body, **kwargs):
return INJECT_INTO_CONTEXT # also callable on demand
```
#### Bundled team-specific skills
The `skills/` directory in this repository contains a set of skills designed
for multi-agent collaboration — things that have no use outside a team run
and would never appear in a general-purpose skill library.
| File | Type | Description |
|---|---|---|
| `review_checklist.md` | Markdown | Structured peer-review checklist injected into reviewer personas. |
| `escalation_rules.md` | Markdown | When to proceed, flag a risk, or escalate to the manager. |
| `decision_record_format.md` | Markdown | ADR-style template for writing `log_decision` entries. |
| `task_board.py` | Python | `task_add` / `task_done` / `task_list` — shared TASKS.md board. |
| `search_transcript.py` | Python | `search_transcript` — keyword search over the run transcript. |
| `critique_request.py` | Python | `request_critique` / `pick_critique` / `list_critiques` — async peer-review queue. |
| `progress_snapshot.py` | Python | `progress_snapshot` — write (or read) PROGRESS.md in the workspace. |
Reference them by path in your team YAML:
```yaml
defaults:
skills:
- path: ./skills/review_checklist.md
- path: ./skills/escalation_rules.md
- path: ./skills/task_board.py
- path: ./skills/search_transcript.py
tools: [task_add, task_done, task_list, search_transcript]
```
## Shared institutional context
When a workspace contains a `context.md` file at its root, `team` injects its
content into **every** member's turn context automatically — no per-member
configuration required.
This is the right place for knowledge that applies to all members equally:
lab conventions, dataset descriptions, domain terminology, naming standards,
relevant prior work, or any background a new team member would need to read
on day one.
**Creating the context file:**
```bash
cat > ./runs/my-team/context.md << 'EOF'
# Lab context
This project analyses the TCGA-BRCA cohort (1,142 samples, 38 features).
## Naming conventions
- All feature files use `snake_case` column names.
- Model outputs go in `results/`.
## Domain notes
- Use log2 CPM normalisation for expression data.
- Primary endpoint is 5-year overall survival (OS5).
EOF
```
The file is read from disk **on every turn** so you can update it while a
run is in progress (e.g. to correct a mistake or add a new constraint).
If the file is absent, the section is silently omitted.
The content is truncated at 8 192 characters if the file is very large.
---
## Decision log
Members with the `log_decision` tool enabled can record structured, timestamped
decisions in a shared `decisions.md` file inside the workspace. Any member
can later call `read_decisions` to review the accumulated rationale before
making related choices.
**Enabling the tools:**
```yaml
defaults:
tools: [log_decision, read_decisions] # add to any existing tool list
```
**Logging a decision:**
````
```tool:log_decision
title: Chose pandas over polars for data wrangling
rationale: Polars ecosystem is too immature; pandas is already a project dependency.
alternatives: polars, dask, vaex
```
````
The entry is appended to `decisions.md` in the shared workspace:
```markdown
## Decision: Chose pandas over polars for data wrangling
**Date:** 2024-07-15T10:32:44Z
**By:** @data_scientist
**Rationale:** Polars ecosystem is too immature; pandas is already a project dependency.
**Alternatives considered:** polars, dask, vaex
---
```
**Reading the decision log:**
````
```tool:read_decisions
```
````
Returns the full `decisions.md` content so members can consult previous
decisions when facing related choices.
---
## Structured JSON output
By default members reply in free-form text. When you need machine-readable
output — e.g. an extractor member whose results are consumed by downstream
code — set `output_format: json` on that member.
```yaml
members:
- name: extractor
role: Data extractor
model: llama3.1:8b
persona: You extract structured data from documents.
output_format: json
output_schema: # optional — validates the reply
type: object
required: [entities, summary]
properties:
entities:
type: array
items: {type: string}
summary:
type: string
```
**What happens**
1. The system prompt gains an `## Output format` section instructing the model
to reply with valid JSON only.
2. After the LLM replies, `team` calls `json.loads()` on the content.
3. If parsing fails (or schema validation fails when `output_schema` is set),
the orchestrator sends a correction prompt and retries up to **3 times**.
4. The parsed object is stored in `TurnResult.json_output` and is accessible
from custom workflows or post-run code.
5. Schema validation requires `pip install jsonschema`; without it the schema
check is skipped silently.
> **Note:** `output_format` is per-member only — it is not available as a
> team-wide `defaults` key.
---
## Conditional routing
Enable dynamic, branching conversations where each member's output determines who speaks next — building state-machine-like workflows without any code.
```yaml
workflow:
type: conditional
start: writer # optional; defaults to the first listed member
max_rounds: 20
members:
- name: writer
model: llama3
persona: You are a technical writer.
role: Writer
routes:
- if_contains: "NEEDS_REVISION"
next: editor
- if_match: "APPROVED|LGTM"
next: publisher
- default: reviewer # fallback when nothing else matches
- name: editor
model: llama3
persona: You are an editor.
role: Editor
routes:
- if_contains: "DONE"
next: publisher
- default: writer # loop back for another draft
- name: reviewer
model: llama3
persona: You are a reviewer.
role: Reviewer
routes:
- default: writer
- name: publisher # terminal node — no routes needed
model: llama3
persona: You are a publisher.
role: Publisher
```
### Route rules
Rules are evaluated **top-to-bottom**; the first match wins.
| Key | Behaviour |
| --- | --- |
| `if_contains: "TEXT"` | Case-insensitive substring search in the member's last reply. |
| `if_match: "REGEX"` | Case-insensitive `re.search` against the member's last reply. |
| `default: member` | Unconditional fallback; fires when no other rule matches. |
A member with **no `routes`** falls back to the standard round-robin next-speaker logic.
### Workflow end conditions
The workflow stops when:
* any member outputs `[[TEAM_DONE]]`, or
* the total turn count reaches `max_rounds`.
---
## Token budget
Prevent runaway costs by capping how many tokens a member may consume across all turns in a single run.
```yaml
defaults:
token_budget: 5000 # max prompt+completion tokens per member per run
members:
- name: alice
token_budget: 10000 # per-member override
```
When a member's cumulative token usage reaches the budget before their next turn, `TokenBudgetError` is raised and the run stops gracefully. The transcript and any workspace files written so far are preserved, and `team run --resume` with a higher budget can continue from where it left off.
> **Note:** Replayed turns (from `--resume`) do **not** count toward the budget.
### Budget resolution
| Setting | Effective budget |
| --- | --- |
| `token_budget` in `defaults` only | Applied to every member. |
| `token_budget` in a specific member | Overrides the `defaults` value for that member only. |
| Neither set | No limit — member runs until the workflow ends. |
---
## Per-agent persistent memory
In a real research lab, scientists remember what worked and what failed —
across months of experiments. `team` gives each agent a **private,
persistent memory store** backed by SQLite that survives between completely
separate `team run` invocations.
```
Session 1 (January): alice uses remember to store "AlphaFold3 RMSD 1.2 Å"
Session 2 (February): alice uses recall to surface that result and build on it
```
This is what separates `team` from all other orchestration frameworks: your
agents actually **accumulate knowledge over time**.
### Enabling memory
Add a `memory:` section to your team YAML:
```yaml
memory:
enabled: true
inject_recent: 5 # memories injected into each turn's context (default: 5)
store: ~/.team/memory # optional; defaults to /memory/
```
Enable memory tools for each member:
```yaml
members:
- name: alice
tools: [run_python, remember, recall, forget, list_memories]
```
### Memory tools
All memory tools use a `key:` / header + `---` / value body format:
**`remember`** — store a cross-session memory:
````
```tool:remember
key: protein_folding_baseline_2025
tags: results, methods
importance: 0.9
---
AlphaFold3 outperforms RoseTTAFold on monomers (RMSD 1.2 vs 2.1 Å, n=1 000).
Dataset: PDB validation set, tested January 2025.
```
````
**`recall`** — full-text search across all memories:
````
```tool:recall
query: protein folding
limit: 5
```
````
Returns a ranked list of matching memories (by importance then recency).
**`forget`** — delete a memory by key:
````
```tool:forget
key: protein_folding_baseline_2025
```
````
**`list_memories`** — browse all memories (optionally by tag):
````
```tool:list_memories
tag: results
limit: 20
```
````
At the start of every turn, the *n* most recent memories are automatically
injected into the member's context under `## Your persistent memories`.
### Memory config reference
| key | type | default | description |
| --- | --- | --- | --- |
| `enabled` | bool | `false` | Enable persistent memory for all members. |
| `inject_recent` | int | `5` | Number of recent memories to inject into each turn's context. |
| `store` | path | `/memory` | Directory that holds the per-member SQLite databases. |
---
## Shared team belief board
In collaborative science, a team's most important output is not files — it is
**what the team collectively knows**. The `team` belief board formalises this
as a living, structured record of claims with provenance, confidence scores,
and consensus voting.
```
alice asserts: "RNA Pol II is rate-limiting in elongation" (confidence: 85%)
bob accepts → 2/3 votes ≥ threshold → status: ACCEPTED
carol contests with reason: "only tested in HEK293" → status: CONTESTED
```
After a run: `team beliefs myteam.yaml` shows everything the team concluded.
### Enabling the belief board
```yaml
beliefs:
enabled: true
consensus_threshold: 0.5 # fraction of members required for acceptance
inject_limit: 10 # beliefs shown in each member's turn context
```
Enable belief tools for each member:
```yaml
members:
- name: alice
tools: [run_python, assert_belief, contest_belief, accept_belief, list_beliefs]
```
### Belief tools
**`assert_belief`** — propose a claim with optional evidence:
````
```tool:assert_belief
confidence: 0.85
evidence: RMSD analysis, PDB validation set, n=1 000, January 2025
---
AlphaFold3 is the best available method for monomer structure prediction.
```
````
The member who asserts a belief automatically casts an *accept* vote. The
returned belief ID (e.g. `a3f2b1c9`) is used in subsequent votes.
**`accept_belief`** — vote to accept:
````
```tool:accept_belief
id: a3f2b1c9
```
````
**`contest_belief`** — move a belief to `contested` status:
````
```tool:contest_belief
id: a3f2b1c9
reason: Dataset is limited to well-studied proteins; may not generalise.
```
````
**`list_beliefs`** — browse the board:
````
```tool:list_beliefs
status: contested
```
````
Valid status values: `pending`, `accepted`, `contested`, `rejected`. Omit to
list all beliefs.
Beliefs are injected into every member's turn context under
`## Shared team belief board` so the whole team sees the current state before
each turn.
### Inspecting beliefs with team beliefs
```bash
team beliefs myteam.yaml # all beliefs
team beliefs myteam.yaml --status accepted # accepted only
team beliefs myteam.yaml --status contested # contested — needs attention
```
Output example:
```
Belief board — team 'my-team'
┏━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━┳━━━━━━━━━┓
┃ ID ┃ Status ┃ Claim ┃ Confidence ┃ By ┃ For ┃ Against ┃
┡━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━╇━━━━━━━━━┩
│ a3f2b1 │ ✓ accepted │ AlphaFold3 is best for monomer structure prediction. │ 85% │ @alice│ 2 │ 0 │
│ 9c1d33 │ ⚡ contested│ The dataset generalises to all protein families. │ 60% │ @bob │ 1 │ 1 │
└────────┴─────────────┴─────────────────────────────────────────────────────────┴────────────┴───────┴─────┴─────────┘
⚡ Some beliefs are contested — review and resolve via accept_belief / contest_belief tools.
```
### Belief config reference
| key | type | default | description |
| --- | --- | --- | --- |
| `enabled` | bool | `false` | Enable the shared belief board. |
| `consensus_threshold` | float | `0.5` | Fraction of members who must accept a belief for it to become `accepted`. |
| `inject_limit` | int | `10` | Maximum number of beliefs injected into each member's turn context. |
---
## Workspace checkpoints
Every time a live member turn is about to execute, the orchestrator
automatically snapshots the current state of the **shared workspace** before
any files are written. Snapshots are stored under
`/checkpoints/` with names that encode the turn index, the
member about to speak, and the timestamp:
```
checkpoints/
├── 0001_alice_20240501T120000/ # state before alice's 1st turn
├── 0003_bob_20240501T120145/ # state before bob's 2nd turn
└── ...
```
If the shared workspace is empty (no files have been produced yet), the
snapshot is silently skipped — there is nothing to back up.
### Listing checkpoints
```bash
team checkpoints my-team.yaml
```
```
┌──────────────────────────────┬──────┬──────────────────────┬─────────────────────┬───────┐
│ ID │ Turn │ Before member's turn │ Timestamp │ Files │
├──────────────────────────────┼──────┼──────────────────────┼─────────────────────┼───────┤
│ 0001_alice_20240501T120000 │ 1 │ @alice │ 2024-05-01 12:00:00 │ 3 │
│ 0003_bob_20240501T120145 │ 3 │ @bob │ 2024-05-01 12:01:45 │ 5 │
└──────────────────────────────┴──────┴──────────────────────┴─────────────────────┴───────┘
```
### Restoring a checkpoint
Copy the checkpoint ID from the table and pass it to `team restore`:
```bash
team restore my-team.yaml 0001_alice_20240501T120000
```
```
restored checkpoint 0001_alice_20240501T120000 — 3 file(s) now in the shared workspace.
```
The current contents of `shared/` are replaced with the snapshot.
**This cannot be undone** unless a later checkpoint already captured the
state you are overwriting, so check `team checkpoints` before restoring.
### Use cases
* **Undo a bad turn** — a member produced unwanted file changes; restore the
checkpoint taken just before that turn.
* **Branch from a known-good state** — restore an earlier checkpoint, edit
`team.yaml` (e.g. change the goal or persona), and re-run from there.
* **Audit the evolution of the workspace** — inspect any checkpoint
directory directly; it is a plain copy of `shared/` at that point in time.
---
## Workspace time-travel (`team rollback`)
Every live member turn is preceded by an automatic workspace snapshot (see
[Workspace checkpoints](#workspace-checkpoints)). When things go wrong you
can roll back the shared workspace to *any prior point in time* and resume
from there — effectively forking the timeline:
```bash
# 1. List all available snapshots
team rollback myteam.yaml
# 2. Restore to a specific checkpoint (with confirmation prompt)
team rollback myteam.yaml --to 0005_alice_20250510T183000
# 3. Skip the confirmation prompt (useful in scripts)
team rollback myteam.yaml --to 0005_alice_20250510T183000 --yes
```
After rolling back, resume the run from the restored state:
```bash
team run myteam.yaml --resume
```
Because the transcript also persists, `--resume` skips all turns already
recorded in it. To *re-run* from turn 5 with a different approach, truncate
the transcript manually (or delete it and rely entirely on the restored
workspace files).
> `team rollback` is a thin wrapper around the existing
> `CheckpointManager.restore()` logic. The underlying `team restore`
> command (which requires an exact checkpoint ID argument) remains available
> for scripting.
---
## Token usage tracking
After every `team run` a token usage summary is printed:
```text
┌────────────────────────────────────────────────────┐
│ Token usage (live turns) │
├──────────┬─────────┬───────────┬───────────────────┤
│ member │ prompt │ completion│ total │
├──────────┼─────────┼───────────┼───────────────────┤
│ @lead │ 12 450 │ 3 210 │ 15 660 │
│ @worker │ 8 120 │ 5 890 │ 14 010 │
├──────────┼─────────┼───────────┼───────────────────┤
│ total │ 20 570 │ 9 100 │ 29 670 │
└──────────┴─────────┴───────────┴───────────────────┘
```
Token counts come from the Ollama `/api/chat` `eval_count` /
`prompt_eval_count` fields (for the `ollama` backend) or the OpenAI
`usage` object (for `openai_compat`). The summary is omitted when all
counts are zero (e.g. pure replay runs or backends that don't report
token usage).
---
## Cost estimation
After every `team run` and `team stats` command, the token-usage table includes an **Est. cost** column with a USD estimate based on the model used by each member.
Local Ollama models always show **$0.00 (local)** since they run on your hardware. Cloud models (`backend: openai_compat`) are looked up in the built-in pricing table.
### Built-in pricing table
| Provider | Models |
| --- | --- |
| **OpenAI** | `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-4`, `gpt-3.5-turbo`, `o1`, `o1-mini`, `o3`, `o3-mini` |
| **Anthropic** | `claude-opus-4`, `claude-sonnet-4`, `claude-3-5-sonnet`, `claude-3-5-haiku`, `claude-3-opus`, `claude-3-sonnet`, `claude-3-haiku` |
| **Google** | `gemini-2.0-flash`, `gemini-1.5-pro`, `gemini-1.5-flash` |
| **Mistral** | `mistral-large`, `mistral-medium`, `mistral-small`, `codestral` |
| **Meta (cloud-hosted)** | `llama-3.1-405b`, `llama-3.1-70b`, `llama-3.1-8b`, `llama-3-70b`, `llama-3-8b` |
Model names are matched by prefix/substring so versioned names like `gpt-4o-2024-08-06` automatically map to `gpt-4o` pricing. If a model is not recognised, the cost column shows **?**.
> **Prices are estimates only.** Provider pricing changes over time — update `team/pricing.py` with the latest figures from your provider's pricing page.
---
## Run statistics
`team stats` shows a detailed breakdown of a completed run — turn counts,
token usage per speaker, total duration, and files written — without
needing to start any containers:
```bash
team stats my-team.yaml
```
Example output:
```text
Team: my-team 18 turns · 29 670 tokens · duration 142.3s · 5 file(s) written
┌─────────────────────────────────────────────────────────────────────┐
│ Turns & token usage by speaker │
├──────────────┬───────┬───────────────┬──────────────────┬───────────┤
│ Speaker │ Turns │ Prompt tokens │ Completion tokens│ Total │
├──────────────┼───────┼───────────────┼──────────────────┼───────────┤
│ @lead │ 5 │ 12 450 │ 3 210 │ 15 660 │
│ @orchestrator│ 1 │ 0 │ 0 │ 0 │
│ @worker │ 12 │ 8 120 │ 5 890 │ 14 010 │
├──────────────┼───────┼───────────────┼──────────────────┼───────────┤
│ total │ 18 │ 20 570 │ 9 100 │ 29 670 │
└──────────────┴───────┴───────────────┴──────────────────┴───────────┘
```
The `Transcript.stats()` method in `team/bus.py` is also part of the
public Python API:
```python
from team.bus import Transcript
from team.config import load_team
cfg = load_team("my-team.yaml")
t = Transcript(persist_path=cfg.workspace / "transcript.jsonl", resume=True)
s = t.stats()
print(s["total_turns"], s["duration_seconds"])
```
---
## Exporting a run report
After a run you can bundle the full transcript and every produced artifact
into a single shareable document:
```bash
team export my-team.yaml # Markdown (default)
team export my-team.yaml --format html # self-contained HTML (dark-mode aware)
team export my-team.yaml --format json # machine-readable JSON
team export my-team.yaml --output ~/Desktop/run.md
team export my-team.yaml --no-artifacts # omit workspace files (faster, smaller)
```
The report includes:
* Team name, goal, members, and workflow settings.
* Every member turn with speaker, role, content, and files written.
* **Token usage & estimated cost table** — per member and totals.
* Full contents of all files produced in the shared workspace (omit with `--no-artifacts`).
Output path defaults to `/report.md` / `.html` / `.json`.
**Format details:**
| Format | Description |
| --- | --- |
| `markdown` | Single `.md` file with transcript, token table, and fenced artifact blocks. |
| `html` | Self-contained `.html` — embedded CSS, no external deps, respects `prefers-color-scheme: dark`. |
| `json` | Structured JSON (`format_version: 1`) with `team`, `stats`, `token_usage`, `turns`, and `artifacts` keys — suitable for post-processing. |
---
## `team replay` — interactive transcript browser
After a run completes, `team replay` lets you step through the saved
transcript turn-by-turn in an interactive terminal viewer — like a
debugger for a past run. No LLM calls, no Docker, no network — it
works entirely from the persisted `transcript.jsonl` file.
```
team replay myteam.yaml # start at turn 0
team replay myteam.yaml --from 5 # start at turn 5
team replay myteam.yaml --speaker alice # jump to alice's first turn
```
### Navigation keybindings
| Key | Action |
| --- | --- |
| `→` / `n` / Space / Enter | Advance to the next turn |
| `←` / `p` / `b` | Go back one turn |
| `g` | Prompt for a turn number and jump directly to it |
| `f` | Prompt for a speaker name and jump to their next turn |
| `s` | Toggle the stats summary panel (token totals, turn counts) |
| `q` / Esc | Quit |
### Non-interactive mode
When stdin is not a TTY (e.g. a CI pipeline or a pipe), `team replay`
prints all turns sequentially — the same rich panel rendering used by
`team transcript` — and exits immediately. This makes it safe to use
in scripts:
```bash
team replay myteam.yaml | head -100
```
### Options
| Option | Default | Description |
| --- | --- | --- |
| `--from N` | `0` | Start at turn N (0-based). |
| `--speaker NAME` | — | Jump to the first turn by NAME at startup. |
---
## Automated testing with `team test`
`team test` runs the team and then validates a set of assertions defined in the
`tests:` section of the team YAML. This makes it easy to build a repeatable
test suite for your team in CI.
```yaml
tests:
- name: creates hello.py
type: file_exists
path: hello.py
- name: script contains print
type: file_contains
path: hello.py
text: "print"
- name: no error messages
type: file_not_contains
path: report.txt
text: "ERROR"
- name: results is valid JSON
type: json_valid
path: results.json
- name: results matches schema
type: json_schema
path: results.json
schema:
type: object
required: [entities, summary]
- name: any member mentioned Python
type: transcript_contains
text: "Python"
- name: developer specifically mentioned Python
type: transcript_contains
speaker: developer
text: "Python"
- name: exactly 4 member turns
type: transcript_count
count: 4
```
```
team test myteam.yaml # run the team, then assert
team test myteam.yaml --no-run # assert against an existing run
team test myteam.yaml --max-rounds 2 --goal "quick smoke test"
```
Exits with code **0** if all assertions pass, **1** if any fail (suitable for
CI gates).
### Assertion reference
| Type | Required fields | Description |
| --- | --- | --- |
| `file_exists` | `path` | File must exist in the shared workspace. |
| `file_not_exists` | `path` | File must *not* exist. |
| `file_contains` | `path`, `text` | File content must contain the substring. |
| `file_not_contains` | `path`, `text` | File content must *not* contain the substring. |
| `json_valid` | `path` | File must be parseable JSON. |
| `json_schema` | `path`, `schema` | File must be valid JSON matching the JSON Schema. |
| `transcript_contains` | `text` | At least one turn must contain the text. Add `speaker` to restrict to one member. |
| `transcript_count` | `count` | Exact number of member turns (excludes `orchestrator`/`human`). |
All `path` values are relative to the **shared workspace** directory
(`/shared/`).
---
## Multi-team pipelines
A *pipeline* lets you chain multiple team runs together so that the output of one team — its shared workspace files and a transcript summary — is automatically injected into the next team's context.
### Pipeline YAML
Create a `pipeline.yaml` alongside your team files:
```yaml
name: research-and-write
description: Research a topic, then write a publication-ready paper.
workspace: ./runs/research-and-write # optional; default is ./runs/
stages:
- id: research
team: ./teams/researcher.yaml
- id: writing
team: ./teams/writer.yaml
depends_on: [research] # wait for research to complete
inject_files: true # copy research's shared/ files here
inject_context: true # write context.md from research output
goal_override: | # {stage_id.summary} templates available
Write a publication-ready paper based on the research below.
{research.summary}
```
### Running a pipeline
```bash
team pipeline pipeline.yaml
```
Preview the execution plan without running anything:
```bash
team pipeline pipeline.yaml --dry-run
```
### Stage fields
| Field | Type | Default | Description |
| --- | --- | --- | --- |
| `id` | string | *(required)* | Unique stage identifier used in `depends_on` and goal templates. |
| `team` | path | *(required)* | Path to the team YAML file (relative to the pipeline file). |
| `depends_on` | list of IDs | `[]` | Stages that must complete before this stage runs. |
| `inject_files` | bool | `false` | Copy every file from upstream stages' `shared/` directories into this stage's `shared/` directory before the team starts. |
| `inject_context` | bool | `false` | Write a `context.md` file into this stage's workspace summarising upstream stages' output. Members pick it up automatically. |
| `goal_override` | string | — | Replace the team YAML's `goal` for this pipeline run. Supports `{stage_id.summary}` template substitution. |
### How data flows
Each stage runs inside its own sub-workspace: `//`. At the end of every stage the runner extracts:
- **Summary** — the last five member turns from the transcript, concatenated.
- **Artifacts** — all files in `shared/`, keyed by relative path.
When the next stage has `inject_files: true`, artifact files are copied verbatim into the destination stage's `shared/` directory before its team starts. When `inject_context: true`, a `context.md` is written at the stage workspace root with the summaries and file lists from all upstream stages.
### Goal templates
`goal_override` is a Python `str.format()` template. Each upstream stage result is available as `{stage_id.summary}`:
```yaml
goal_override: |
Review the following research and identify gaps.
Research output:
{research.summary}
Initial draft:
{writing.summary}
```
---
## Cross-team collaboration (bridge)
`team` clusters running on **different machines**, operated by **different
people or organisations**, can collaborate on common goals through the bridge
protocol. One cluster delegates a sub-task to a remote cluster; the remote
cluster runs its full team workflow and returns the results — including all
files it produced. The exchange can repeat over multiple turns, just like a
real inter-laboratory collaboration.
### How it works
```
Lab A cluster (local) Lab B cluster (remote)
┌─────────────────────────────────────┐ ┌──────────────────────────────────┐
│ Orchestrator A │ │ team serve lab-b.yaml │
│ members: pi, analyst │ │ BridgeServer (port 7001) │
│ │ │ │
│ @pi uses delegate_task tool ───────┼─────┼──► POST /tasks │
│ │ │ ┌──────────────────────────┐ │
│ │ │ │ Orchestrator B