{"id":51402484,"url":"https://github.com/heggria/taskflow","last_synced_at":"2026-07-04T08:00:52.457Z","repository":{"id":362481592,"uuid":"1259147774","full_name":"heggria/taskflow","owner":"heggria","description":"A declarative, verifiable graph of task nodes for the Pi coding agent — not a workflow you script, but a DAG you declare: statically verified before it runs, with dynamic fan-out, gates, isolated subagent context, and resumable runs. Zero deps.","archived":false,"fork":false,"pushed_at":"2026-07-01T09:45:12.000Z","size":7227,"stargazers_count":18,"open_issues_count":2,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-07-01T10:27:30.984Z","etag":null,"topics":["agent-orchestration","ai-agents","dag","declarative-workflow","pi-coding-agent","pi-extension","subagents","task-graph","taskflow","verifiable-orchestration","workflow-orchestration","zero-dependencies"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/heggria.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-04T08:31:05.000Z","updated_at":"2026-07-01T07:37:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/heggria/taskflow","commit_stats":null,"previous_names":["heggria/pi-taskflow","heggria/taskflow"],"tags_count":31,"template":false,"template_full_name":null,"purl":"pkg:github/heggria/taskflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heggria%2Ftaskflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heggria%2Ftaskflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heggria%2Ftaskflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heggria%2Ftaskflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/heggria","download_url":"https://codeload.github.com/heggria/taskflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heggria%2Ftaskflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35032202,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-02T02:00:06.368Z","response_time":173,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-orchestration","ai-agents","dag","declarative-workflow","pi-coding-agent","pi-extension","subagents","task-graph","taskflow","verifiable-orchestration","workflow-orchestration","zero-dependencies"],"created_at":"2026-07-04T08:00:36.237Z","updated_at":"2026-07-04T08:00:52.447Z","avatar_url":"https://github.com/heggria.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"./assets/hero.png\" alt=\"taskflow — a declarative, verifiable graph of task nodes for coding-agent subagents: stateful, resumable, context-isolated\" width=\"900\"\u003e\n\n\u003cp\u003e\n  \u003ca href=\"https://www.npmjs.com/package/pi-taskflow\"\u003e\u003cimg src=\"https://img.shields.io/npm/v/pi-taskflow?style=flat-square\u0026color=4B4ACF\u0026label=npm\" alt=\"npm version\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.npmjs.com/package/pi-taskflow\"\u003e\u003cimg src=\"https://img.shields.io/npm/dm/pi-taskflow?style=flat-square\u0026color=5A5D63\u0026label=downloads\" alt=\"npm downloads\"\u003e\u003c/a\u003e\n  \u003ca href=\"./LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-0E8A66?style=flat-square\" alt=\"MIT license\"\u003e\u003c/a\u003e\n  \u003ca href=\"#whats-inside\"\u003e\u003cimg src=\"https://img.shields.io/badge/runtime%20deps-0-0E8A66?style=flat-square\" alt=\"zero runtime dependencies\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/heggria/taskflow/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/heggria/taskflow/ci.yml?branch=main\u0026style=flat-square\u0026label=CI\" alt=\"CI status\"\u003e\u003c/a\u003e\n  \u003ca href=\"#whats-inside\"\u003e\u003cimg src=\"https://img.shields.io/badge/tests-918-4B4ACF?style=flat-square\" alt=\"918 tests\"\u003e\u003c/a\u003e\n  \u003ca href=\"#whats-inside\"\u003e\u003cimg src=\"https://img.shields.io/badge/dogfooded-%E2%9C%93-0E8A66?style=flat-square\" alt=\"dogfooded\"\u003e\u003c/a\u003e\n  \u003ca href=\"#run-it-on-your-agent\"\u003e\u003cimg src=\"https://img.shields.io/badge/runs%20on-Pi%20%2B%20Codex-4B4ACF?style=flat-square\" alt=\"runs on Pi and Codex\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cb\u003eEnglish\u003c/b\u003e ·\n  \u003ca href=\"./README.zh-CN.md\"\u003e简体中文\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eA declarative, verifiable \u003cem\u003egraph of tasks\u003c/em\u003e for coding-agent subagents.\u003c/strong\u003e\u003cbr/\u003e\nNot a workflow you script — a DAG you declare. Fan out · gate · loop · tournament · resume · save as a command — intermediate results stay out of your context.\u003cbr/\u003e\nRuns on the \u003ca href=\"https://pi.dev\"\u003ePi\u003c/a\u003e coding agent and on \u003ca href=\"https://github.com/openai/codex\"\u003eOpenAI Codex\u003c/a\u003e.\u003c/p\u003e\n\n\u003c/div\u003e\n\n```bash\n# Pi\npi install npm:pi-taskflow\n\n# Codex\ncodex plugin marketplace add heggria/taskflow\ncodex plugin add taskflow@taskflow\n```\n\n---\n\n**A `workflow` flows. A `taskflow` is a *graph*.** Other orchestrators let the model *script* the work — imperative code that flows step by step, with the graph hidden inside control flow. `taskflow` does the opposite: you **declare** the work as a graph of discrete, named **task** nodes connected by `dependsOn` edges — and the runtime *verifies that graph before it spends a single token.*\n\nYou already know your agent's built-in subagent shorthand — `task` / `tasks` / `chain`. `taskflow` speaks the *same* shorthand — so your existing delegations instantly become **tracked, resumable, and saveable by name** (on Pi, a saved flow becomes a one-word `/tf:\u003cname\u003e` command; on Codex you run it by name through `taskflow_run`). When you outgrow the shorthand, the full DSL gives you a real DAG: dynamic fan-out over dozens of items, conditional routing, quality gates, human approvals, retries, loops, tournaments, and a hard spend ceiling.\n\nAnd the whole time, **only the final phase reaches your conversation.** Every intermediate transcript stays in the runtime, never your context window.\n\n## Why \"taskflow\" and not \"workflow\"?\n\nThe name is the thesis. In engineering, a **task** is a *discrete, declared unit of work* — the node of a task graph (the same `task` a build system, scheduler, or compiler wires into a DAG). **Work**, by contrast, is *fluid and unbounded* — the continuous, imperative act of doing.\n\nThat distinction is exactly the design split playing out across coding-agent ecosystems:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./assets/task-vs-work.png\" alt=\"work is a fluid imperative script whose graph hides in control flow and can't be verified before it runs; a taskflow is a declarative graph of discrete task nodes that is statically verified before any token is spent\" width=\"900\"\u003e\n\u003c/div\u003e\n\n- A **`workflow`** (the dynamic, code-mode kind) is the model writing an **imperative script** that *flows*: `await agent(...)`, an `if`, a `for`, another `await`. Expressive — it's Turing-complete — but the graph only exists *as the code runs*. You can't see it, diff it, or prove it terminates before you pay for it.\n- A **`taskflow`** moves the plan **out of code and into a declarative graph of `task` nodes.** Because the graph is *data*, the runtime can do what an imperative script structurally cannot: **statically verify it** (no cycles, no dead ends, no budget overflow, no dangling refs) before a single subagent spawns, **render it** (the live progress *is* the DAG), **resume it** phase-by-phase, and **save it** as a one-word command.\n\n\u003e **The trade we make on purpose:** we give up the raw expressivity of arbitrary code to gain something an imperative script can't have — a graph that is **verifiable, observable, replayable, and safe to generate with an LLM.** When a job needs twelve steps with branching fan-out and a review gate, you want a graph you can *check* — not a script you *hope* runs right.\n\n## Why this exists\n\nHere's the wall you hit with raw subagents: you describe a multi-step plan in prose, the model re-derives it every single run, the intermediate transcripts flood your context, and the moment one model call fails you start over from zero. There's no reuse, no recovery, no structure — and no way to *check* the plan before it burns tokens.\n\n`taskflow` moves the plan **out of the prompt and into a declarative graph of task nodes.** The runtime owns the DAG, the loops, the retries, and the intermediate state. You declare a pipeline once and run it a hundred times — by name. Because the plan is data, not prose and not code, it can be **validated, visualized, and replayed.**\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./assets/context-isolation.png\" alt=\"With raw subagents every transcript floods your context; with taskflow transcripts stay in the runtime and only the final result returns\" width=\"900\"\u003e\n\u003c/div\u003e\n\n\u003e Twelve steps, branching fan-out, a review gate, a spend cap — that's a graph, and you want to *see and check* it, not re-prompt it every run.\n\n| | subagent (built-in) | **taskflow** |\n|---|---|---|\n| **Who drives** | the model, turn by turn | the runtime, from a definition |\n| **Topology** | chain / flat parallel | **DAG with layered concurrency + routing** |\n| **Intermediate results** | in your context window | **in the runtime — not your context** |\n| **Scale** | a handful of tasks | **dynamic `map` fan-out over dozens of items** |\n| **Reusable** | re-described every time | **saved by name (`/tf:\u003cname\u003e` on Pi; `taskflow_run` by name on Codex)** |\n| **Resumable** | ✗ | **✓ cross-session — cached phases auto-skip** |\n| **Quality gates** | ✗ | **`gate` phases that halt on `VERDICT: BLOCK`** |\n| **Conditional routing** | ✗ | **`when` guards + `join: any` OR-joins** |\n| **Fault tolerance** | ✗ | **per-phase `retry` + auto-retry on transient errors** |\n| **Human-in-the-loop** | ✗ | **`approval` phases (approve / reject / edit)** |\n| **Cost control** | ✗ | **run-wide `budget` (USD / token caps)** |\n| **Composition** | ✗ | **`flow` phases run saved *or runtime-generated* sub-flows** |\n| **Iterative loops** | ✗ | **`loop` phases — repeat until condition, convergence, or cap** |\n| **Competitive selection** | ✗ | **`tournament` phases — N variants + judge** |\n| **Live progress** | opaque while running | **live DAG render with timing + cost (Pi `/tf`); one streaming tool call on Codex** |\n| **Ergonomics** | inline JSON each time | **shorthand (`task`/`tasks`/`chain`) *or* DSL** |\n\nIt doesn't replace the subagent tool. It gives your subagents a **graph**, a memory, and a name.\n\n## Declarative graph vs. imperative script\n\nThe closest thing to `taskflow` in spirit is the **dynamic / code-mode workflow** — where the model writes a JavaScript orchestration script. It's powerful and genuinely expressive. But it sits at the *opposite* end of one fundamental axis: **expressivity vs. verifiability.**\n\n| | dynamic `workflow` (code-mode) | **`taskflow`** (declarative graph) |\n|---|---|---|\n| **The plan is** | imperative JS the model writes \u0026 runs | **declarative JSON data the runtime executes** |\n| **The graph** | implicit — hidden in `if`/`for`/`await` control flow | **explicit — `phases[]` + `dependsOn` edges, a first-class object** |\n| **Verify before running** | ✗ Turing-complete; can't prove it terminates | **✓ static checks: no cycles, dead-ends, budget overflow, dangling refs** |\n| **See it** | ✗ the graph only exists as the code runs | **✓ the live progress render *is* the DAG** |\n| **Resume** | coarse (call-cache dedup) | **✓ phase-by-phase input-hash resume, cross-session** |\n| **Safe to LLM-generate** | risky — it's executable code | **✓ it's just data — no JavaScript `eval`; and a runtime-generated sub-flow is *structurally validated* (cycles / dangling refs / duplicate ids) before it runs** |\n| **Expressivity ceiling** | **higher** — arbitrary control flow | bounded by the DSL, but `map`/`when`/`loop`/`gate`/`tournament` — plus **runtime-generated sub-flows (`flow {def}`)** for plan-then-execute and iterative replanning — cover most jobs |\n\nWe chose the **verifiable** side on purpose. The expressivity you give up is real; what you get back — a plan you can check, watch, replay, and safely let a model author — is what turns one-off prompting into durable orchestration.\n\n## Compared to other Pi extensions\n\n\u003e This section is **Pi-specific** — it maps `pi-taskflow` against other packages in the Pi ecosystem. If you're on Codex, skip to [Phase types](#phase-types); the engine and DSL are identical.\n\nThe Pi ecosystem now has **20+ delegation, workflow, and orchestration extensions** — each great at what it's for. Here's an honest map of where `pi-taskflow` sits (verified against each package's latest npm release, June 2026). For the full breakdown — every package, strengths *and* weaknesses — see [`docs/internal/PI-ECOSYSTEM.md`](./docs/internal/PI-ECOSYSTEM.md). For the broader, non-Pi landscape (LangGraph, Temporal, CrewAI, Mastra…) see [`docs/internal/COMPETITORS.md`](./docs/internal/COMPETITORS.md).\n\n| Extension | Model | Custom DSL | DAG | Dynamic fan-out | Cross-session resume | Quality gate | Human approval | Save as command | Zero deps |\n|---|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n| **taskflow** | **declarative multi-phase taskflows** | **✓** | **✓** | **✓ `map`** | **✓ phase-hash** | **✓** | **✓** | **✓ `/tf:\u003cname\u003e`** | **✓** |\n| [`@pi-agents/orchid`](https://www.npmjs.com/package/@pi-agents/orchid) | opinionated 9-phase pipeline + Ralph loop | fixed | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✕ (2) |\n| [`pi-crew`](https://www.npmjs.com/package/pi-crew) | role teams + git worktrees + async | partial | ✓ | ✓ | ✓ | ✓ | ✓ | – | ✕ (7) |\n| [`ultimate-pi`](https://www.npmjs.com/package/ultimate-pi) | governed plan→execute→review harness | YAML contracts | ✓ (plan-time) | ✕ | ✓ | ✓ (3-tier) | ✓ | ✓ | ✕ (16) |\n| [`@zhushanwen/pi-workflow`](https://www.npmjs.com/package/@zhushanwen/pi-workflow) | JS scripts (`agent`/`parallel`/`pipeline`) | yes (JS) | ✕ (linear) | ✓ | ✓ | ✕ | ✕ | ✓ (call cache) | ✓ |\n| [`@fiale-plus/pi-rogue-orchestration`](https://www.npmjs.com/package/@fiale-plus/pi-rogue-orchestration) | timer loop + goal resolution | ✕ | ✕ | ✕ | ✓ | ✓ (goal-check) | ✕ | ✕ | ✓ |\n| [`pi-subagents`](https://www.npmjs.com/package/pi-subagents) | single / parallel / chain delegation | ✕ | ✕ | static | – | ✕ | clarify | named workflows | ✕ (3) |\n| [`@gotgenes/pi-subagents`](https://www.npmjs.com/package/@gotgenes/pi-subagents) | Claude-Code-style subagents + worktrees | ✕ | ✕ | ✕ | ✓ (by id) | ✕ | per-agent | ✕ | ✕ (1) |\n| [`pi-pipeline`](https://www.npmjs.com/package/pi-pipeline) | fixed SPEC→PLAN→TASKS→VERIFY | ✕ | fixed | ✕ | session planning | ✓ | clarify | ✕ | ✕ (2) |\n| [`pi-agent-flow`](https://www.npmjs.com/package/pi-agent-flow) | one-shot parallel specialist `fork` | yes | ✕ | ✕ | – | ✕ | ✕ | – | ✕ (2) |\n\n*(Representative slice of the 20+ — see [`docs/internal/PI-ECOSYSTEM.md`](./docs/internal/PI-ECOSYSTEM.md) for all of them, plus `@0xkobold/pi-orchestration`, `@melihmucuk/pi-crew`, `@mediadatafusion/pi-workflow-suite`, `gentle-pi`, `@dreki-gg/pi-subagent`, and more.)*\n\n**How to choose:**\n\n- **`@pi-agents/orchid`** is the most feature-complete orchestrator in the ecosystem (DAG + worktrees + Ralph loop + agent mailbox) — but its DSL is a *fixed* 9-phase pipeline, it carries runtime deps + jiti, and it's beta. Reach for `taskflow` when you want to **define your own graph** (not adopt an opinionated one) with **zero dependencies** and a one-command install.\n- **`pi-crew` / `ultimate-pi`** go heavier — worktree isolation, durable async teams, multi-tier governance. If you want lightweight, declarative, and zero-dependency, that's this project.\n- **`@zhushanwen/pi-workflow`** is the closest in spirit and also zero-dep, but it's the **imperative** side of the split above: you author workflows as **JavaScript scripts** the model writes and runs. `taskflow`'s **declarative JSON DAG** is the verifiable side — statically checkable, visualizable, safe to LLM-generate, and resumable at phase granularity rather than call-cache dedup.\n- **`@fiale-plus/pi-rogue-orchestration`** has a real **loop-until-done** (goal-driven iteration). `taskflow` now ships its own `loop` phase (v0.0.13+) plus `tournament` for competitive selection — and unlike rogue-orchestration, `taskflow` has a full DAG with gates, compositional sub-flows, and cross-session resume. For raw \"keep going until the goal is met\" with minimal structure, rogue-orchestration is still lighter; for structured, branching pipelines, `taskflow` covers the same ground and more.\n- **`pi-subagents` / `@gotgenes/pi-subagents`** are the mature picks for ad-hoc \"use reviewer on this diff\" delegation and background jobs. `taskflow` is for when those delegations need to become a *repeatable, resumable pipeline*.\n- **`pi-pipeline` / `pi-agent-flow`** ship *opinionated, fixed* flows. `taskflow` ships an *empty canvas*: you (or the model) declare the graph that fits the job.\n\n\u003e The honest one-liner: **`pi-taskflow` is the only Pi extension that gives you a *declarative, verifiable, resumable* DAG of task nodes — saved as a one-word `/tf:\u003cname\u003e` command, with zero runtime dependencies and context isolation by design** (and the same engine runs on Codex via the `taskflow_*` MCP tools). Where code-mode workflows let the model *script* the work, `taskflow` lets it *declare a graph the runtime can prove correct before running.* Recently shipped from the roadmap: the Shared Context Tree (blackboard + supervision) and worktree isolation (see [`docs/internal/STRATEGY.md`](./docs/internal/STRATEGY.md)).\n\n## 30-second start\n\n### On Pi\n\n**1. Install** — one command:\n\n```bash\npi install npm:pi-taskflow\n```\n\n\u003e **Optional:** run `/tf init` once to map the 18 built-in agents' model roles\n\u003e (`fast`, `strong`, `thinker`, …) to your own models — an interactive picker.\n\u003e Skip it and agents just use Pi's default model. See [Model roles](#model-roles).\n\n**2. Run** — just ask the model in a Pi session:\n\n\u003e *Run a chain: first explore the auth flow, then summarize the findings.*\n\nThe model calls the `taskflow` tool automatically. You get live progress, per-step timing, token cost, and a saved run record — **same effort as the built-in tool, now tracked and resumable.**\n\n**3. Save** — say *\"save it\"* and you have `/tf:\u003cname\u003e` forever.\n\nThat's it. You can be running your first workflow before your coffee cools — without writing a single phase definition.\n\n\u003ca id=\"run-it-on-your-agent\"\u003e\u003c/a\u003e\n### On Codex\n\ntaskflow ships as a Codex **plugin** — install it once and the `taskflow_*` MCP tools plus a routing skill light up automatically, no manual `mcp add` and no config editing:\n\n```bash\ncodex plugin marketplace add heggria/taskflow\ncodex plugin add taskflow@taskflow\n```\n\nThe plugin's MCP server runs via `npx` (a version-pinned `codex-taskflow`), so there's nothing else to install globally and the plugin version binds the exact code that runs. Then just ask Codex to run a multi-phase or fan-out job and it calls the tools. See the [Codex guide](./docs/codex-mcp.md).\n\n### The shorthand (same shape as the built-in tool)\n\n```jsonc\n// Single — one agent, one job\n{ \"task\": \"Summarize the architecture of src/\", \"agent\": \"explorer\" }\n\n// Parallel — fire several at once, outputs merge\n{ \"tasks\": [\n  { \"task\": \"Audit auth in src/api\",             \"agent\": \"analyst\" },\n  { \"task\": \"Audit input validation in src/api\", \"agent\": \"analyst\" }\n] }\n\n// Chain — sequential; each step sees the previous output\n{ \"chain\": [\n  { \"task\": \"List the public API of src/lib\", \"agent\": \"scout\" },\n  { \"task\": \"Write docs for:\\n{previous.output}\", \"agent\": \"writer\" }\n] }\n```\n\n`agent` is optional (defaults to the first discovered agent). Add a `name` to label the run and unlock saving it as a command.\n\nShorthand modes also support per-step **context pre-reading** — pass `context` (file paths) and optionally `contextLimit` (max chars per file, default 8000) at the step level:\n\n```jsonc\n// Chain with context files injected into each step\n{ \"chain\": [\n  { \"task\": \"List the public API\", \"agent\": \"scout\", \"context\": [\"src/lib/**/*.ts\"] },\n  { \"task\": \"Write docs for:\\n{previous.output}\", \"agent\": \"writer\" }\n] }\n```\n\n## Watch it run\n\nThis is not a mockup. **This is stdout from a real run** (the Pi TUI) — the `self-improve` flow that writes and verifies its own test suites, caught mid-flight by a quality gate:\n\n```\n⊗ taskflow self-improve  6/7 · blocked · $0.095\n    ✓ discover            agent   deepseek-v4-flash  10t ↑38k ↓6.7k $0.011\n  ┌ ✓ write-runner-tests  agent   claude-sonnet-4-6  10t ↑13 ↓6.6k $0.020\n  ├ ✓ write-store-tests   agent   claude-sonnet-4-6  10t ↑11 ↓10k $0.018\n  ├ ✓ write-agents-tests  agent   claude-sonnet-4-6  10t ↑28 ↓13k $0.030\n  └ ✓ fix-stability       agent   claude-sonnet-4-6  10t ↑13 ↓3.9k $0.012\n    ✓ verify              gate    BLOCK 3 type errors in test files  deepseek-v4-flash\n    ⊘ report              reduce  skipped · Gate blocked  ↳ fix-stability\n```\n\n**The layout *is* the DAG.** No dashboard, no logs to grep — you read the progress bar and you understand the whole pipeline:\n\n- **Header** — `⊗` = blocked (a gate halted it); `6/7` phases processed; aggregate cost `$0.095`.\n- **Status icons** — `✓` done · `◐` running · `✗` failed · `⊘` skipped · `○` pending.\n- **Rail `┌ ├ └`** — phases in the same DAG layer, running concurrently. The four `write-*`/`fix-stability` tasks fan out from `discover`. A blank gutter = a single-phase layer.\n- **`↳`** — a long, layer-skipping dependency. `report` depends on the adjacent `verify` *and* on `fix-stability` two layers back, so only that skip edge is annotated.\n- **Gate** — `verify` emitted `VERDICT: BLOCK`, so the runtime skipped `report` and ended the run as `blocked`, surfacing the reason inline.\n- **Detail** — per phase: model, token counts (`↑`in `↓`out), cost, timing. Fan-out phases also show sub-task progress (`3/15 2✗ 8▸`).\n\n## Go declarative\n\nThe shorthand is your onramp. The DSL is where `taskflow` earns its keep — dynamic fan-out, structured routing, and quality gates.\n\n### Fan out and reduce\n\n```jsonc\n{\n  \"name\": \"summarize-files\",\n  \"description\": \"Discover files, summarize each, produce one report\",\n  \"args\": { \"dir\": { \"default\": \".\" } },\n  \"concurrency\": 8,\n  \"phases\": [\n    { \"id\": \"discover\", \"type\": \"agent\", \"agent\": \"scout\",\n      \"task\": \"List source files under {args.dir} (non-recursive).\\nOutput ONLY a JSON array [{\\\"file\\\":\\\"\\\"}]. No prose.\",\n      \"output\": \"json\" },\n    { \"id\": \"summarize\", \"type\": \"map\",\n      \"over\": \"{steps.discover.json}\", \"as\": \"item\", \"agent\": \"scout\",\n      \"task\": \"Read {item.file} and give a one-sentence summary.\",\n      \"dependsOn\": [\"discover\"] },\n    { \"id\": \"report\", \"type\": \"reduce\", \"from\": [\"summarize\"], \"agent\": \"writer\",\n      \"task\": \"Combine into a short overview:\\n{steps.summarize.output}\",\n      \"dependsOn\": [\"summarize\"], \"final\": true }\n  ]\n}\n```\n\n1. **`discover`** lists every file and emits a JSON array.\n2. **`summarize`** is a `map` — it fans out one subagent per file, throttled to 8 concurrent, with `{item.file}` bound to each path.\n3. **`report`** is a `reduce` — it merges every summary into one clean overview.\n\nThe intermediate summaries never enter your context. The runtime owns them; you get the report. **Save it once → `/tf:summarize-files dir=src` forever.**\n\n### Route, gate, retry, approve, and cap the spend\n\n```jsonc\n{\n  \"name\": \"triage-and-fix\",\n  \"budget\": { \"maxUSD\": 1.5 },\n  \"phases\": [\n    { \"id\": \"triage\", \"type\": \"agent\", \"agent\": \"analyst\", \"output\": \"json\",\n      \"task\": \"Classify the bug. Output ONLY {\\\"severity\\\":\\\"high\\\"} or {\\\"severity\\\":\\\"low\\\"}.\" },\n    { \"id\": \"deep\",  \"when\": \"{steps.triage.json.severity} == high\", \"dependsOn\": [\"triage\"],\n      \"agent\": \"executor-code\", \"task\": \"Root-cause and patch it.\",\n      \"retry\": { \"max\": 2, \"backoffMs\": 500 } },\n    { \"id\": \"quick\", \"when\": \"{steps.triage.json.severity} == low\",  \"dependsOn\": [\"triage\"],\n      \"agent\": \"executor-fast\", \"task\": \"Apply the quick fix.\" },\n    { \"id\": \"approve\", \"type\": \"approval\", \"join\": \"any\", \"dependsOn\": [\"deep\", \"quick\"],\n      \"task\": \"Review the fix before it ships.\" },\n    { \"id\": \"ship\", \"type\": \"agent\", \"dependsOn\": [\"approve\"],\n      \"task\": \"Open a PR with the change.\", \"final\": true }\n  ]\n}\n```\n\n- **`when`** routes to `deep` *or* `quick` from the triage JSON — the other branch is skipped.\n- **`join: \"any\"`** lets `approve` fire the moment whichever branch ran completes (an OR-join).\n- **`retry`** re-runs a flaky patch with backoff; **`budget`** halts the whole run if it gets too expensive.\n- **`approval`** pauses for a human (approve / reject / edit) before the final `ship`.\n\nNo scripting. No JavaScript `eval`. Just data the runtime executes — safe enough to run LLM-generated definitions directly.\n\n### Loop until done\n\nSome work is inherently iterative — refine a draft until a reviewer is satisfied, retry-and-improve until tests pass, converge on an answer:\n\n```jsonc\n{\n  \"id\": \"refine\",\n  \"type\": \"loop\",\n  \"task\": \"Improve this draft (iteration {loop.iteration}). Previous attempt:\\n{loop.lastOutput}\\n\\nReturn JSON {\\\"draft\\\":\\\"…\\\",\\\"done\\\":true|false}.\",\n  \"until\": \"{steps.refine.json.done} == true\",\n  \"output\": \"json\",\n  \"maxIterations\": 6,\n  \"convergence\": true\n}\n```\n\nSee [Loop phases](#loop-until-done-loop) for the full reference.\n\n### Plan, then execute (runtime sub-flows)\n\nA planner decides *at runtime* what work to spawn — each iteration's plan depends on the previous result:\n\n```jsonc\n{\n  \"name\": \"iterative-replan\",\n  \"phases\": [\n    { \"id\": \"plan\", \"type\": \"agent\", \"agent\": \"planner\",\n      \"task\": \"Given the current state, output a JSON taskflow definition (with phases[]).\",\n      \"output\": \"json\" },\n    { \"id\": \"execute\", \"type\": \"flow\", \"def\": \"{steps.plan.json}\",\n      \"dependsOn\": [\"plan\"] }\n  ]\n}\n```\n\nThe generated sub-flow is **validated** (no cycles, no dangling refs, no duplicate IDs) before a single token is spent. See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).\n\n### Tournament (compete and judge)\n\nFor open-ended creative or subjective work, spawn several competing variants and let a judge pick the best:\n\n```jsonc\n{\n  \"id\": \"headline\",\n  \"type\": \"tournament\",\n  \"task\": \"Write a punchy headline for this launch post.\",\n  \"variants\": 4,\n  \"judge\": \"Pick the headline with the strongest hook and clearest promise.\",\n  \"mode\": \"best\"\n}\n```\n\nSee [Tournament phases](#tournament-tournament) for the full reference.\n\n## Phase types\n\n| type | what it does | required fields |\n|------|--------------|-----------------|\n| `agent` | one subagent runs a single task | `task` |\n| `parallel` | run `branches[]` concurrently | `branches` (array of `{task, agent?}`) |\n| `map` | **fan out** over an array — one subagent per item, `{item}` bound | `over`, `task` |\n| `gate` | quality/review step that can **halt the flow** | `task` |\n| `reduce` | aggregate `from[]` phase outputs into one | `from`, `task` |\n| `approval` | **human-in-the-loop** pause — approve / reject / edit | — |\n| `flow` | run a **sub-flow** as one phase — a **saved** flow (`use`) or a **runtime-generated** one (`def`) | `use` \\| `def` |\n| `loop` | **iterate a task until done** — re-run a body until a condition, convergence, or a cap | `task`, `until` |\n| `tournament` | **N variants compete**, a judge picks the best (or aggregates) | `task` \\| `branches` |\n| `script` | run a **shell command** — no LLM, zero tokens — capturing stdout as the phase output | `run` |\n\n### Common phase fields\n\nEvery phase needs a unique `id` and a `type` (defaults to `agent`). On top of the per-type fields:\n\n| Field | Meaning |\n|---|---|\n| `agent` | Agent to run (defaults to the first discovered agent) |\n| `dependsOn` | Phase ids this phase waits for — builds the DAG |\n| `join` | `\"all\"` (default) waits for every dep; `\"any\"` is an OR-join |\n| `when` | Conditional guard — skip unless the expression is truthy |\n| `retry` | `{ max, backoffMs?, factor? }` — retry a failing subagent |\n| `output` | `\"text\"` (default) or `\"json\"` (exposes `{steps.ID.json}`) |\n| `model` / `thinking` / `tools` | Per-phase overrides for the subagent |\n| `cwd` | Working directory for the subagent. A literal path, or a reserved keyword for **workspace isolation** — `\"temp\"` (ephemeral dir, removed after), `\"dedicated\"` (persistent dir under the run state, kept), `\"worktree\"` (a git worktree on a throwaway branch, removed after). Fail-open; rejected in LLM-authored sub-flows. |\n| `context` | File paths to pre-read and inject into the agent prompt |\n| `contextLimit` | Max chars per context file (default 8000) |\n| `concurrency` | Fan-out cap for `map` / `parallel` (overrides the flow default) |\n| `final` | Marks the result-bearing phase (else the last phase wins) |\n| `optional` | A failure here does **not** abort the run |\n| `shareContext` | Opt this phase's subagent into the **Shared Context Tree** (see below). Set `contextSharing: true` at the flow level to enable it for every phase |\n| `cache` | `{ scope, ttl?, fingerprint? }` — cross-run memoization (see below) |\n| `onBlock` | `\"halt\"` (default) or `\"retry\"` — what happens when a gate blocks |\n| `eval` | Zero-token machine-checkable criteria that run *before* the LLM gate |\n\nFlow-level keys: `name`, `description`, `args`, `concurrency` (default 8), `agentScope`, `contextSharing`, `strictInterpolation`, and `budget: { maxUSD?, maxTokens? }`.\n\n### Shared Context Tree (blackboard + supervision)\n\nBy default subagents are fully isolated — they share nothing and only return a\nfinal string. Opt a phase in with `shareContext: true` (or `contextSharing: true`\nflow-wide) to give its subagent four extra tools backed by a per-run, file-based\nblackboard:\n\n| tool | direction | use |\n|------|-----------|-----|\n| `ctx_write(key, value)` | horizontal | publish a finding so siblings/descendants reuse it (stop re-reading the same files) |\n| `ctx_read(key?)` | horizontal | read findings visible to this node: its own + ancestors' + **completed** others' |\n| `ctx_report(summary, structured?)` | vertical ↑ | report a result up to the parent |\n| `ctx_spawn(assignments[])` | vertical ↓ | delegate child work at runtime; each assignment is a flat `{task}` **or** a `{subflow}` (a dependency-bearing DAG the runtime validates and runs nested). Child reports fold back into this phase's output |\n\nThe first two are a **horizontal blackboard** (siblings reuse expensive context);\nthe last two are a **vertical supervision tree** (a node delegates work and its\nchildren report up). Everything is opt-in, fail-open, depth-capped (5 levels), size-bounded\n(256KB per value, 256 keys per node, 16 spawn assignments max), and cleaned up\nwith the run — flows that don't opt in behave exactly as before.\n\n```jsonc\n{ \"id\": \"survey\", \"type\": \"agent\", \"agent\": \"scout\", \"shareContext\": true,\n  \"task\": \"Map the API surface. ctx_write key 'endpoints' so the auditors don't re-scan.\" },\n{ \"id\": \"audit\", \"type\": \"map\", \"over\": \"{steps.survey.json}\", \"shareContext\": true,\n  \"dependsOn\": [\"survey\"], \"agent\": \"analyst\",\n  \"task\": \"ctx_read 'endpoints' for shared context, then audit {item} for missing auth.\" }\n```\n\n### Control flow \u0026 reliability\n\n- **`when`** — skip a phase unless an expression is truthy. Supports `{refs}`, `== != \u003c \u003e \u003c= \u003e=`, `\u0026\u0026 || !`, parentheses, and quoted strings/numbers. Pair with `join: \"any\"` on the merge phase for real if/else routing. Parse errors **fail open** (the phase runs — never silently dropped).\n- **`join: \"any\"`** — an OR-join: the phase runs as soon as *one* dependency completes (default `\"all\"` waits for all).\n- **`retry`** — `{ \"max\": 2, \"backoffMs\": 500, \"factor\": 2 }` retries a failing subagent with fixed or exponential backoff; usage is summed and the attempt count shows as `↻N` in the TUI. Transient provider errors (rate-limit / 5xx / timeout) **auto-retry even without an explicit policy**; hard errors don't.\n- **`onBlock`** — `\"halt\"` (default) stops the run when a gate blocks. `\"retry\"` retries upstream phases when a gate blocks, instead of halting — a self-healing rework loop with budget and idle-watchdog guards and a nested recursion depth cap.\n- **`eval`** — zero-token machine-checkable criteria that run *before* the LLM gate. If the eval check fails, the gate blocks without spawning an agent.\n- **`score`** — graded, composable quality gates: deterministic scorers (`exact-match`, `contains`, `regex`, `json-schema`, `length-range`, `code-compiles`) run against a target string at **zero tokens** and combine via `all`/`any`/`weighted` against a `threshold`. Deterministic pass → auto-PASS with no LLM call **when the judge cannot veto** — no judge configured, or `weighted` where the deterministic score is a *lower bound* already clearing the threshold. With `all`/`any` + a judge, the judge always runs (its verdict is authoritative — it may check what scorers cannot, e.g. factuality). Deterministic fail → the optional LLM `judge` decides (fail-open on unparseable output), or the gate `task` runs with the scorer report appended, or — with no fallback — the gate **blocks explicitly**. The structured result is the gate's `.json` (`{steps.\u003cgate\u003e.json.combined}`, `.json.results`), so downstream phases can route on quality, not just pass/fail. LLM-generated dynamic sub-flows may not use `code-compiles` (compiler execution) or `regex` (ReDoS) scorers — same hardening class as the `script` block.\n- **`idempotent: false`** — side-effect classification for phases with **irreversible effects** (webhook POSTs, deploys, DB writes): the implicit transient auto-retry is suppressed (an explicit `retry{}` is still honored — it's the author's declaration that repeats are acceptable) and the result is **never cached** in any scope (within-run resume, cross-run, `incremental`) — the phase re-runs every time. The phase state records `sideEffect: true` (rendered as ⚡). Default `true` — existing flows are unchanged.\n- **`approval`** — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs (detached / CI) **auto-reject** (safety: approval gates are never bypassed).\n- **`flow`** — `{ \"type\": \"flow\", \"use\": \"deep-research\", \"with\": { \"topic\": \"{item}\" } }` runs a **saved** flow as a phase (recursion is detected and rejected). Or **generate the sub-flow at runtime**: `{ \"type\": \"flow\", \"def\": \"{steps.plan.json}\" }` resolves an upstream phase's JSON output into a sub-flow, **validates it (cycles / dangling refs / duplicate ids / dead-ends), then runs it** — the number and shape of the generated phases is decided at runtime, not authored in advance. A malformed plan fails *open* (the phase is skipped with a `defError`, the run continues). This is how a planner decides *at runtime* what work to spawn — the declarative answer to a code-mode `for` loop, with each generated plan checked before it spends a token. Security hardening for LLM-generated sub-flows: breadth caps (100 phases, 200 map items, 16 concurrency), `cwd` containment, budget clamped to `min(child, parent)`, nesting cap (5 levels), and prototype-pollution defense (deep-cloned, `__proto__`/`constructor`/`prototype` stripped). Pair it with `loop` for **data-dependent iterative replanning** (round N's plan depends on round N-1's result). See [`examples/dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) and [`examples/iterative-replan.json`](./examples/iterative-replan.json).\n\n### Loop-until-done (`loop`)\n\nSome work is inherently iterative — refine a draft until a reviewer is satisfied, retry-and-improve until tests pass, converge on an answer. A `loop` phase re-runs one task body until a stop condition holds:\n\n```jsonc\n{\n  \"id\": \"refine\",\n  \"type\": \"loop\",\n  \"task\": \"Improve this draft (iteration {loop.iteration}). Previous attempt:\\n{loop.lastOutput}\\n\\nReturn JSON {\\\"draft\\\":\\\"…\\\",\\\"done\\\":true|false}.\",\n  \"until\": \"{steps.refine.json.done} == true\",   // the iteration's own output is exposed here\n  \"output\": \"json\",\n  \"maxIterations\": 6,        // default 10, hard cap 100 — the loop ALWAYS terminates\n  \"convergence\": true        // default: stop early if an iteration's output is identical to the last\n}\n```\n\n- **Body locals** — the task can read `{loop.iteration}` (1-based), `{loop.lastOutput}` (the prior iteration's output), and `{loop.maxIterations}` to build on its own previous work; all three are also available to the `until` condition.\n- **`until`** — evaluated after each iteration with the iteration's output exposed as `{steps.\u003cthisId\u003e.output}` / `.json`. Same operators as `when`. The loop stops the moment it's truthy.\n- **Always terminates.** Four independent stops: `until` truthy, **convergence** (a fixed point — output identical to the previous iteration), **`maxIterations`** (hard-capped at 100), or a **failing iteration** (the phase fails with the partial output preserved). A malformed `until` **stops** the loop rather than spinning forever (fail-safe) and surfaces a warning on the phase.\n- **Reflexion memory (`reflexion: true`)** — by default each iteration only sees the prior *output*; the **reason** it wasn't good enough (an `expect` contract violation, an error, the unmet `until`) is discarded, so models repeat mistakes. With `reflexion: true` every iteration after the first receives a structured failure summary of the prior one via the `{reflexion}` placeholder (auto-appended when absent, capped at 2000 chars): contract diagnostics like `$.done: required key is missing`, the error message, or the unmet stop condition, plus a truncated output snippet. Semantics shift to enable self-correction: **body failures become feedback instead of terminating the loop** — timeout/abort still hard-stop, and exhausting `maxIterations` on a failure still fails the phase (reflexion defers failure, never erases it). The last injected summary is persisted on the phase state for audit.\n- The TUI shows `↻N` with the stop reason (`done` / `converged` / `max` / `failed`); usage is summed across iterations. Like `gate`/`approval`, `loop` is **excluded from `cross-run` cache** (each run must iterate fresh).\n\n### Tournament (`tournament`)\n\nFor open-ended work, the best result often comes from generating several candidates and picking the strongest — best-of-N with a judge, in one declarative phase:\n\n```jsonc\n{\n  \"id\": \"headline\",\n  \"type\": \"tournament\",\n  \"task\": \"Write a punchy headline for this launch post.\",\n  \"variants\": 4,                    // spawn 4 competitors of the SAME task (default 3, max 20)\n  \"judge\": \"Pick the headline with the strongest hook and clearest promise.\",\n  \"judgeAgent\": \"reviewer\",          // optional; defaults to the phase agent\n  \"mode\": \"best\"                     // \"best\" (default) | \"aggregate\"\n}\n```\n\n- **Competitors** — either `variants: N` copies of one `task` (diversity comes from model nondeterminism), or distinct `branches: [{task, agent?}, …]` when you want to pit *different approaches* against each other.\n- **Judge** — after the fan-out, one judge agent sees every variant (numbered) plus your `judge` rubric and picks a winner via a `WINNER: \u003cn\u003e` line or `{\"winner\": n}`. An unreadable verdict **fails open** to variant 1; a failed judge falls back too — the work is never lost.\n- **`mode`** — `best` returns the winning variant **verbatim**; `aggregate` returns the judge's **synthesized** answer combining the strongest parts.\n- **Short-circuits:** if only one competitor survives, it wins with no judge call; if all fail, the phase fails. The TUI shows `⚑ N→#k`; usage sums variants + judge. Like `gate`, it's **excluded from `cross-run` cache**.\n\n### Shell steps (`script`)\n\nNot every step needs a model. A `script` phase runs a **shell command** directly — zero tokens, no subagent — and captures its stdout as the phase output. Use it to glue LLM work to real tools: run a build or test suite, a formatter, `git`, `curl` a webhook, or pipe a previous phase's output through a script.\n\n```jsonc\n{\n  \"id\": \"build\",\n  \"type\": \"script\",\n  \"run\": \"npm run build\",              // string → runs in a shell\n  \"timeout\": 120000                     // optional ms cap (1000–300000, default 60000)\n},\n{\n  \"id\": \"score\",\n  \"type\": \"script\",\n  \"run\": [\"python\", \"score.py\"],        // array → direct exec, no shell (injection-safe)\n  \"input\": \"{steps.analyze.output}\",    // optional — piped to stdin (interpolation-enabled)\n  \"dependsOn\": [\"analyze\"]\n}\n```\n\n- **`run`** — the command. A **string** runs through a shell (`sh -c` / `cmd`); an **array** is spawned directly (execvp-style, no shell). Prefer the array form for anything containing interpolated values: a string `run` that contains an interpolation placeholder is **rejected at validation** (a shell-injection guard) — pass dynamic values via the array form or `input` instead.\n- **`input`** — optional text piped to the command's stdin; supports interpolation (`{steps.X.output}`, `{args.X}`). If omitted, stdin is closed.\n- **`timeout`** — optional millisecond cap (1000–300000, default 60000). On timeout the child gets `SIGTERM`, then `SIGKILL` after a grace period, and the phase fails.\n- A non-zero exit **fails** the phase (stderr is captured); stdout is capped at 1 MB. `script` phases spend **zero tokens**, do not support `retry` or `output: \"json\"`, and are **excluded from `cross-run` cache** (a shell step may have side effects). The `compile` diagram renders them as `⚡ script`.\n\n### Cross-run memoization (`cache`)\n\nEvery phase is already content-addressed: within a single run's **resume**, a phase whose resolved inputs are unchanged is skipped. `cache` extends that reuse **across independent runs** — if any prior run computed a phase with an identical input hash, its result is reused for **$0.00**.\n\n```jsonc\n{\n  \"id\": \"analyze-auth\",\n  \"task\": \"Summarize how the auth module works.\",\n  \"context\": [\"src/auth/**/*.ts\"],\n  \"cache\": {\n    \"scope\": \"cross-run\",                 // \"run-only\" (default) | \"cross-run\" | \"off\"\n    \"ttl\": \"6h\",                          // optional max age before a hit is treated as a miss\n    \"fingerprint\": [\"git:HEAD\", \"glob:src/auth/**/*.ts\"]  // fold world-state into the key\n  }\n}\n```\n\n- **`scope`** — `\"run-only\"` (default) is exactly the historical behavior (within-run resume only). `\"cross-run\"` opts the phase into the persistent store. `\"off\"` disables reuse entirely (even within a run), for debugging.\n- **Freshness is the whole game.** The cache key already includes the prompt, the `over` items, and any `context` files (pre-read into the task). `fingerprint` folds *implicit* inputs into the key so \"the world changed\" becomes a cache miss: `git:HEAD`, `glob:\u003cpat\u003e` (size+mtime), `glob!:\u003cpat\u003e` (content hash), `file:\u003cpath\u003e`, `env:\u003cNAME\u003e`. `ttl` (`30m`/`6h`/`7d`) is a time backstop.\n- **Honest limit:** a subagent that reads a file it didn't declare in `context`/`fingerprint` can still serve a stale `cross-run` hit. That's why the default is `run-only` and why `gate`/`approval` phases are **forbidden** from `cross-run` (they must produce a fresh result each run). Opt in only for phases whose output is a function of declared inputs.\n- Cache lives in `.pi/taskflows/cache/` (gitignored). Clear it with `action: \"cache-clear\"` on the tool. Full rationale: [`docs/internal/rfc-cross-run-memoization.md`](./docs/internal/rfc-cross-run-memoization.md).\n\n### Gate phases (quality control)\n\nA `gate` runs an agent to review upstream output and can **block the rest of the workflow.** End the gate task by asking for a verdict the runtime can read:\n\n- a final line `VERDICT: PASS` or `VERDICT: BLOCK` (also accepts `OK`, `FAIL`, `STOP`, `REJECT`, `HALT` — last occurrence wins), or\n- JSON like `{\"continue\": false, \"reason\": \"missing auth checks\"}` / `{\"verdict\": \"block\", \"reason\": \"...\"}`.\n\nOn **BLOCK**, downstream phases skip and the run ends as `blocked` with the reason surfaced. **Ambiguous output fails open** (treated as PASS) — a gate never halts your flow by accident.\n\n```\nReview the audit below. If any endpoint is missing auth, end with\n\"VERDICT: BLOCK\" and a one-line reason; otherwise end with \"VERDICT: PASS\".\n\n{steps.audit.output}\n```\n\n## Interpolation \u0026 expressions\n\n| placeholder | resolves to |\n|---|---|\n| `{args.X}` | invocation argument |\n| `{steps.ID.output}` | a prior phase's text output |\n| `{steps.ID.json}` | prior output parsed as JSON (or `{steps.ID.json.field}`) |\n| `{item}` / `{item.field}` | current item inside a `map` phase |\n| `{previous.output}` | the immediately-upstream phase output |\n| `{loop.iteration}` | current iteration number inside a `loop` phase |\n| `{loop.lastOutput}` | previous iteration's output inside a `loop` phase |\n| `{loop.maxIterations}` | the iteration cap inside a `loop` phase |\n\nCondition grammar (for `when`): `== != \u003c \u003e \u003c= \u003e=`, `\u0026\u0026 || !`, parentheses, quoted strings/numbers, and any `{...}` reference — e.g. `\"when\": \"{steps.triage.json.route} == deep \u0026\u0026 {args.force} != true\"`.\n\n\u003e Referencing `{steps.X}` that isn't declared in `dependsOn` is a **hard validation error** — the runtime catches the most common pipeline bug before a single agent runs.\n\n\u003e Unresolved interpolation refs (e.g. `{args.typo}` or a missing `dependsOn`) are surfaced as **phase warnings** (`PhaseState.warnings`) in the run record and `/tf runs` — no more silent intact placeholders.\n\n## Commands\n\nSaved flows become CLI shortcuts. **These `/tf` commands are Pi-only** (they run in the Pi session). On Codex, use the `taskflow_*` MCP tools instead — `taskflow_list` / `taskflow_show` / `taskflow_run` (by `name`) / `taskflow_verify` / `taskflow_compile`.\n\n| Command | What it does |\n|---|---|\n| `/tf list` | List all saved flows |\n| `/tf run \u003cname\u003e [args]` | Run a saved flow (e.g. `/tf run summarize-files dir=src`) |\n| `/tf show \u003cname\u003e` | Print a flow's definition |\n| `/tf compile \u003cname\u003e [lr\\|td]` | **Render the flow as a Mermaid diagram + verification overlay** — 0 tokens, no LLM; paste into a README/issue/PR |\n| `/tf runs` | Browse recent run history (interactive TUI — **live auto-refreshes** while any run is active) |\n| `/tf resume \u003crunId\u003e` | Continue a paused/failed run — cached phases skip automatically |\n| `/tf init` | **Interactively map model roles** to your enabled models (writes `~/.pi/agent/settings.json`) |\n| `/tf:\u003cname\u003e [args]` | Shortcut — runs the flow in one tap |\n\nTool actions (used by the model on Pi): `run` (inline `define` or saved `name`), `save`, `resume`, `list`, `agents`, `init`, `verify`, `compile`, `ir`, `provenance`, `why-stale`, `recompute`, `cache-clear`. On Codex the exposed MCP tools are `taskflow_run` / `taskflow_list` / `taskflow_show` / `taskflow_verify` / `taskflow_compile`.\n\n## Background (detached) execution\n\nPass `detach: true` to run a taskflow in a detached child process — the tool returns immediately with the `runId` and the flow continues running even if the host session exits:\n\n```jsonc\n{\n  \"action\": \"run\",\n  \"name\": \"nightly-audit\",\n  \"detach\": true\n}\n```\n\n- The child process reads serialized context, calls the orchestration engine, and persists terminal state to the store.\n- Status is polled via `/tf runs` (which now **auto-refreshes live** when any run is running) or `action: \"resume\"`.\n- Stale PID detection via signal-0 probe; the idle watchdog kills stalled children.\n- **Approval phases auto-reject** in detached mode — human gates are never silently bypassed.\n- `resume` works normally after a detached run completes or fails.\n\n## Resume across sessions\n\nA taskflow run isn't tied to your session. Every completed phase is written to disk, so a run that fails (or that you stop) can be continued later with `/tf resume \u003crunId\u003e` — **cached phases skip automatically** and only the remaining work spends tokens.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"./assets/resume.png\" alt=\"A run fails midway in session 1; in session 2 /tf resume skips the cached phases and only re-runs the failed phase and what follows\" width=\"900\"\u003e\n\u003c/div\u003e\n\nResume is keyed on each phase's input hash — if an upstream output changed, dependent phases re-run; if nothing changed, they're reused. No competing Pi extension does this across sessions.\n\n## Storage\n\n```\n.pi/taskflows/\u003cname\u003e.json              # project-scope definitions (commit to share)\n~/.pi/agent/taskflows/\u003cname\u003e.json      # user-scope definitions\n.pi/taskflows/runs/\u003cflowName\u003e/\u003crunId\u003e.json  # run state for resume (gitignore this)\n.pi/taskflows/cache/                   # cross-run memoization cache (gitignored)\n```\n\n\u003e Commit `.pi/taskflows/` and your whole team shares the pipelines — no config sync, no onboarding doc. Run state is written atomically via `writeFileAtomic()` (temp file + `renameSync`) and guarded by a zero-dependency file lock (`O_CREAT|O_EXCL` with stale-lock steal via atomic rename), so concurrent runs never corrupt the index.\n\nAgent discovery scope (via `agentScope` in the flow definition):\n\n| value | discovers agents from |\n|---|---|\n| `\"user\"` (default) | `~/.pi/agent/agents/*.md` |\n| `\"project\"` | `.pi/agents/*.md` (walks up the tree) |\n| `\"both\"` | user + project; project wins on name collision |\n\nRun cleanup is configurable via `maxKeptRuns` and `maxRunAgeDays` in settings.\n\n## Agents\n\nTaskflow ships **18 built-in agents** — each a `.md` file with a tuned system prompt, thinking level, and tool set. You can reference them by `name` in any phase or shorthand, right after install. No setup required.\n\n### Built-in agent roster\n\n| Agent | Role | Thinking | Default role |\n|---|---|---:|---|\n| `executor` | Implement planned code changes | high | `{{fast}}` |\n| `executor-fast` | Trivial fixes (≤2 files, ≤50 lines) | off | `{{fast}}` |\n| `executor-code` | Complex multi-file implementation | high | `{{strong}}` |\n| `executor-ui` | Frontend / styling / visual changes | high | `{{vision}}` |\n| `scout` | Fast codebase recon \u0026 file mapping | off | `{{fast}}` |\n| `planner` | Implementation plan creation | high | `{{strong}}` |\n| `analyst` | Requirements analysis, ambiguity detection | high | `{{thinker}}` |\n| `critic` | Inline self-doubt during reasoning | xhigh | `{{thinker}}` |\n| `reviewer` | General code / architecture review | high | `{{strong}}` |\n| `risk-reviewer` | Backend / infra / DB / API risk | high | `{{reasoner}}` |\n| `security-reviewer` | Security vulns, auth/crypto | xhigh | `{{reasoner}}` |\n| `plan-arbiter` | Plan quality gate (complex tasks) | high | `{{arbiter}}` |\n| `final-arbiter` | Tiebreaker when critics disagree | xhigh | `{{arbiter}}` |\n| `test-engineer` | Design \u0026 implement tests | high | `{{fast}}` |\n| `doc-writer` | Documentation authoring | off | `{{fast}}` |\n| `recover` | Session recovery after compaction | low | `{{fast}}` |\n| `verifier` | Run tests, validate outcomes | off | `{{fast}}` |\n| `visual-explorer` | Figma design metadata analysis | high | `{{vision}}` |\n\nAgents are layered: **built-in → user (`~/.pi/agent/agents/`) → project (`.pi/agents/`)**. A user or project agent with the same `name` overrides the built-in — so you can customize any agent without touching the package.\n\n### Model roles\n\nEach built-in agent's `model` field uses a **role placeholder** (e.g. `{{fast}}`) instead of a hardcoded provider string. This decouples *intent* from *implementation* — you map roles to models once, and every agent adapts.\n\n| Role | Intent | Typical model |\n|---|---|---|\n| `{{fast}}` | Cheap \u0026 quick — high-volume, low-stakes | DeepSeek V4 Flash |\n| `{{strong}}` | Balanced — planning, review, moderate complexity | MiMo v2.5 Pro |\n| `{{thinker}}` | Deep analysis — requirements, critique | DeepSeek V4 Pro |\n| `{{arbiter}}` | Final judgment — tiebreak, plan quality gates | Qwen 3.7 Max |\n| `{{vision}}` | Multimodal — UI work, design reading | MiniMax M3 |\n| `{{reasoner}}` | Cautious reasoning — security, risk | GLM 5.1 |\n\nWithout configuration, agents fall back to Pi's default model. To map roles to real models, run the interactive setup:\n\n```bash\n/tf init\n```\n\n`/tf init` starts with an **action menu**. First-time users get a 2-option shortcut (\"Use recommended defaults\" / \"Configure each role\"). Returning users see the full 5-option menu:\n\n```\n? What do you want to do with model roles?\n  ❯ Use recommended defaults\n    Configure each role\n    Edit one role\n    Show current roles\n    Cancel\n```\n\nThe picker shows model **display names** with capability flags and current/recommended markers:\n\n```\n? Model for 'vision' — Multimodal (executor-ui, visual-explorer)\n  Current: openrouter/anthropic/claude-sonnet-4-6\n  Recommended: minimax/MiniMax-M3\n  ───────────────\n  ❯ MiniMax M3 (minimax/MiniMax-M3) · image ✓ · reasoning ✓ · (recommended)\n    Claude Sonnet 4.6 (openrouter/anthropic/...) · image ✓ · reasoning ✓ · (current)\n    GPT-5 (openrouter/openai/gpt-5) · image ✓\n    DeepSeek V4 Flash (openrouter/deepseek/v4-flash)\n    ───────────────\n    Custom (type your own)\n    Keep current\n    Back to action menu\n```\n\nBefore saving, a **preview screen** shows the diff of your changes:\n\n```\n? Review changes:\n  fast       openrouter/deepseek/deepseek-v4-flash   (unchanged)\n  strong     openrouter/xiaomi/mimo-v2.5-pro         (unchanged)\n  thinker    openrouter/qwen/qwen3.7-max             (changed ← was: openrouter/deepseek/v4-pro)\n  arbiter    openrouter/qwen/qwen3.7-max             (unchanged)\n  vision     minimax/MiniMax-M3                      (unchanged)\n  reasoner   z-ai/glm-5.1                            (unchanged)\n  ───────────────\n  ❯ Save these changes\n    Edit a role\n    Cancel\n```\n\nYour choices are written to `~/.pi/agent/settings.json`:\n\n```json\n{\n  \"modelRoles\": {\n    \"fast\":     \"openrouter/deepseek/deepseek-v4-flash\",\n    \"strong\":   \"openrouter/xiaomi/mimo-v2.5-pro\",\n    \"thinker\":  \"openrouter/deepseek/deepseek-v4-pro\",\n    \"arbiter\":  \"openrouter/qwen/qwen3.7-max\",\n    \"vision\":   \"minimax/MiniMax-M3\",\n    \"reasoner\": \"z-ai/glm-5.1\"\n  }\n}\n```\n\nEdit the values manually any time, or just re-run `/tf init`.\n\nTo customize a specific agent's model or thinking without changing `modelRoles`, create an agent file at `~/.pi/agent/agents/\u003cname\u003e.md` with the desired overrides in the YAML frontmatter.\n\n### Tool path (`action=\"init\"`)\n\nThe model can also configure roles via the `taskflow` tool:\n\n| Mode | Behavior |\n|---|---|\n| `mode: \"show\"` (default) | Read-only report of current `modelRoles`. Never overwrites. |\n| `mode: \"apply-defaults\"` + `force: true` | Writes `RECOMMENDED_DEFAULTS` to `settings.json`, preserving stale keys. |\n| `mode: \"interactive\"` | Launches the full action menu + picker flow (requires a UI session). |\n\n\n### Custom agents\n\nDrop a `.md` file into `~/.pi/agent/agents/` (user-level) or `.pi/agents/` (project-level, commit it) to add your own:\n\n```markdown\n---\nname: my-linter\n\ndescription: Run ESLint and report violations\n\ntools: read, bash\n\nmodel: \"{{fast}}\"\n\nthinking: off\n---\n\nYou are a linting agent. Run `npx eslint --format json` on the\nprovided files. Report violations grouped by file. No fixes.\n```\n\nThen reference it in any phase: `{ \"agent\": \"my-linter\", \"task\": \"Lint src/\" }`.\n\n## Examples\n\nReady-to-read definitions in [`examples/`](./examples):\n\n| File | Demonstrates |\n|---|---|\n| [`summarize-files.json`](./examples/summarize-files.json) | discover → `map` fan-out → `reduce` |\n| [`conditional-research.json`](./examples/conditional-research.json) | `when` routing + `join: any` + `gate` + `budget` |\n| [`guarded-refactor.json`](./examples/guarded-refactor.json) | `approval` (human-in-the-loop) + `retry` + `gate` |\n| [`dynamic-plan-execute.json`](./examples/dynamic-plan-execute.json) | `flow { def }` — plan then execute at runtime |\n| [`iterative-replan.json`](./examples/iterative-replan.json) | `loop` + `flow { def }` — iterative replanning |\n\nCopy one into `.pi/taskflows/\u003cname\u003e.json` (or `~/.pi/agent/taskflows/`) and it registers as `/tf:\u003cname\u003e` — or just point the model at it.\n\n## What's inside\n\n\u003cdiv align=\"center\"\u003e\n\n**0 runtime dependencies** · **918 tests** · **10 phase types** · **shared context tree** · **cross-session resume** · **cross-run memoization** · **per-item map caching** · **incremental recompute** · **FlowIR compile seam** · **detached execution** · **`compile` Mermaid renderer** · **~9k LOC runtime**\n\n\u003c/div\u003e\n\n- **Zero runtime dependencies.** No `dependencies` field — the runtime is built entirely on Node built-ins (`fs` / `path` / `os` / `child_process` / `crypto`). The file lock is `fs.openSync(\"wx\")`, not a third-party library.\n- **918 tests across 52 test files** covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), backward-compatible cache-key migration (4-tier legacy fallback), per-phase structural sub-fingerprint (v3:phasefp — editing one phase invalidates only it and its dependents), per-item map caching (one changed item re-executes, N−1 cache hits), the `incremental` flag (run-wide cross-run default), reuse reporting, the FlowIR compile seam (determinism, declared-plane synthesis), incremental recompute (early-cutoff propagation, partial cascade strictly \u003c full, observed ∪ declared union frontier), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, the shared context tree (blackboard reuse, supervision spawn, subflow validation/nesting), workspace isolation (temp/dedicated/worktree lifecycle, fail-open degrade, dynamic-flow rejection), dynamic sub-flow security hardening, detached execution (PID persistence, stale detection, crash→failed, resume after failure), live run-history refresh, callback isolation, the idle watchdog, model-role init config, parseModelFromLabel with parenthesized-model-name regression, and multi-fence `safeParse` recovery, plus the `compile` Mermaid renderer (id-collision disambiguation, markdown-injection hardening, and full verify-overlay category coverage).\n- **Hardened by design.** Path-traversal defense (lexical + `realpath` containment check), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via `rename`, and an idle watchdog that kills wedged subagents (SIGTERM → SIGKILL after 5 minutes of silence). Dynamic sub-flows additionally get breadth caps, `cwd` containment, budget clamping, nesting depth caps, and prototype-pollution defense.\n- **Dogfooded.** Every new feature has to survive the project's own `self-improve` taskflow before it ships.\n\n## 🍽️ We eat our own dog food\n\nEvery feature in `taskflow` ships **through `taskflow`.**\n\nOur `self-improve` flow is a 10-phase DAG — it audits the codebase, patches defects, verifies correctness, gates on quality, and surfaces the report — all declaratively. We run it (as a user-scope `/tf:self-improve` flow) before releases. No other agent orchestrator in the Pi ecosystem builds itself with itself.\n\n| Campaign | Scale | Phases | Outcome |\n|----------|-------|--------|---------|\n| [v0.0.8 dogfood](./docs/internal/dogfooding-v0.0.8-report.md) | Full codebase audit → triage → fix → verify | 10 phases, 234 tests | 13 fixes, all pass |\n| [v0.0.6 self-audit](./docs/internal/self-audit-report.md) | inventory → map audit → gate → approval → map fix → reduce | 9 phases | 11 critical defects fixed |\n| [Cross-run cache dogfood](./docs/internal/rfc-cross-run-memoization.md) | Real runtime + on-disk store | Dedicated test harness | Cache correctness under adversarial fingerprints |\n| [Adversarial cross-review](./docs/internal/brainstorm-adversarial-review-report.md) | Multi-agent adversarial review | `tournament` + `gate` | P0 cache-key fix shipped |\n| [Init redesign review](./docs/internal/issue-necessity-review-report.md) | Necessity audit → parallel checks → verdict | 7 phases | Full redesign plan validated |\n| [Round 2 adversarial audit](./docs/internal/dogfooding-report.md) | Integration layer + cross-module — 12 findings across runner/runtime/interpolate/verify | 14 phases | 10 fixes applied, 0 regressions |\n| [Round 3 adversarial audit](./docs/internal/dogfooding-report.md) | Integration layer + cross-module — 10 findings across index/agents/cache/render/runs-view | 9 phases | 10 fixes applied, 0 regressions |\n| [v0.0.23 Shared Context Tree](./docs/internal/dogfooding-report.md) | End-to-end validation: org-tree spawn, 5-way audit via loop+gate | 6 e2e runs | Spawn-drain bug fixed, 50 new tests |\n\n\u003e **Meta:** we used `taskflow`'s `map` fan-out, `gate` verdicts, `approval` human-in-the-loop, `tournament` best-of-N, `loop` until-done, and `cross-run` cache — to build `taskflow`.\n\n## Status \u0026 limits\n\n**v0.1.3** — the current release. See [CHANGELOG](./CHANGELOG.md) for the full history (incl. the v0.1.1 execution fix for issue #3). This release adds the Codex MCP `taskflow_compile` SVG diagram and a full malformed-input hardening pass across `validate`/`verify`/`compile`. Baseline: **multi-host monorepo** — the engine is split into the host-neutral `taskflow-core` plus `pi-taskflow` (Pi adapter) and `codex-taskflow` (Codex runner + MCP server + plug-and-play Codex plugin). **Shared Context Tree**: opt-in (`shareContext` / `contextSharing`) blackboard + supervision tools (`ctx_read`/`ctx_write` horizontal reuse, `ctx_report`/`ctx_spawn` vertical supervision); `ctx_spawn` accepts a flat task **or** a dependency-bearing `subflow` (a runtime-validated nested DAG), depth-capped on a unified nesting counter with budget accounting. **Workspace isolation**: a phase's `cwd` accepts reserved keywords `temp`/`dedicated`/`worktree` — the runtime allocates an isolated dir (or a git worktree on a throwaway branch) and tears it down after the phase, fail-open, rejected in LLM-authored sub-flows. **Detached execution**: runs can execute in the background, detached from the Pi session. Prior: loop-until-done (`loop`), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive `/tf init`, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow \u0026 reliability layer (`when` guards, `join: any`, `retry`/backoff, `approval`, `flow` composition, `budget` caps, `onBlock: \"retry\"`, `eval` machine gates, idle watchdog) on top of the DSL + DAG runtime (`agent`/`parallel`/`map`/`gate`/`reduce`). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.\n\nKnown boundaries (tracked, bounded — no surprises mid-flow):\n\n- **Shared context is opt-in.** Subagents share nothing unless a phase sets `shareContext` (or the flow sets `contextSharing`). The blackboard is per-run, file-based, size-bounded, and cleaned up with the run. Spawn nesting is capped at `MAX_DYNAMIC_NESTING` (5). A spawned flat task is not individually checkpointed — on crash it re-runs on resume (spawned *subflows* resume their completed inner phases via the cache).\n- **Workspace isolation is fail-open.** `cwd: \"worktree\"` requires the base cwd to be a git work tree; otherwise it degrades to a `temp` dir (with a warning). `temp`/`worktree` dirs are removed when the phase ends — a hard crash mid-phase may leave a stray dir (cleaned on the next run for `dedicated`; `temp`/`worktree` are under the OS tmpdir). The reserved keywords are honoured only in author-written flows.\n- **No `output: \"file\"`.** Outputs are text/JSON only — write files via an agent's `write` tool call.\n- **`map` fans out over a JSON array from a string `over`.** The `over` field is a string that either interpolates to a JSON array (e.g. `{steps.ID.json}`) or is a literal JSON-array string. Wrap a plain text list in a single-agent `output: \"json\"` phase first, or pass `JSON.stringify([...])` for a fixed list. (A raw literal array is rejected — emit it from a phase and reference that.)\n- **The DAG must be acyclic.** Cycles are rejected at validation.\n- **Cross-run cache excludes `gate`, `approval`, `loop`, `tournament`, and `script`.** These must produce a fresh result each run (a `script` phase may also have side effects).\n- **Approval auto-rejects in detached mode.** This is a safety invariant — approval gates are never silently bypassed.\n\n## Development\n\n`taskflow` is an npm-workspaces monorepo of three published packages:\n\n| Package | Role |\n|---------|------|\n| [`taskflow-core`](./packages/taskflow-core) | Host-neutral orchestration engine (zero host-SDK deps; only `typebox`) |\n| [`pi-taskflow`](./packages/pi-taskflow) | Pi extension adapter — `taskflow` tool + `/tf` commands (what `pi install npm:pi-taskflow` gives you) |\n| [`codex-taskflow`](./packages/codex-taskflow) | Codex subagent runner + a dependency-free MCP server, plus the [Codex plugin](./packages/codex-taskflow/plugin) ([guide](./docs/codex-mcp.md)) |\n\n```bash\nnpm install\nnpm run typecheck     # tsc --noEmit across all packages (no build needed)\nnpm test              # unit tests — no network, no process spawning\nnpm run test:core     # engine tests only  (also: test:pi, test:codex)\nnpm run build         # emit dist/*.js + .d.ts for all three packages\nnpm run test:e2e-codex      # codex executor e2e (needs `codex` + model access)\nnpm run test:e2e-codex-mcp  # codex MCP server e2e\n```\n\nThe pi end-to-end suites spawn live `pi` subagents and are run directly (they use\nthe `.mts` extension so the unit-test glob skips them), e.g.:\n\n```bash\nnode --conditions=development --experimental-strip-types packages/pi-taskflow/test/e2e.mts\n# others: e2e-team, e2e-context, e2e-context-value, e2e-spawn-subflow,\n#         e2e-flowir, e2e-incremental-suite, dogfood-cache\n```\n\nEngine code lives in `packages/taskflow-core/src/`, the Pi adapter in `packages/pi-taskflow/src/`, tests in each package's `test/`, and runnable examples in `examples/`. Published packages ship compiled `dist/`; dev resolves the TypeScript sources directly via a `development` export condition — no build step needed to typecheck or test.\n\n## Contributing\n\nContributions welcome — this is a young, fast-moving project. Open an issue or PR on [GitHub](https://github.com/heggria/taskflow). Good first contributions: new example flows, phase-type ideas, and TUI polish. See [`CONTRIBUTING.md`](./CONTRIBUTING.md) and [`AGENTS.md`](./AGENTS.md) for coding conventions and common task recipes.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheggria%2Ftaskflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheggria%2Ftaskflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheggria%2Ftaskflow/lists"}