{"id":51117513,"url":"https://github.com/bes-dev/reharness","last_synced_at":"2026-06-24T23:00:56.168Z","repository":{"id":356520144,"uuid":"1224042670","full_name":"bes-dev/reharness","owner":"bes-dev","description":null,"archived":false,"fork":false,"pushed_at":"2026-06-13T19:38:48.000Z","size":414,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-13T21:24:33.570Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bes-dev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-28T22:52:02.000Z","updated_at":"2026-06-13T19:38:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/bes-dev/reharness","commit_stats":null,"previous_names":["bes-dev/reharness"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/bes-dev/reharness","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bes-dev","download_url":"https://codeload.github.com/bes-dev/reharness/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34752465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-24T02:00:07.484Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-24T23:00:55.503Z","updated_at":"2026-06-24T23:00:56.162Z","avatar_url":"https://github.com/bes-dev.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# reharness\n\n[![npm](https://img.shields.io/npm/v/reharness)](https://www.npmjs.com/package/reharness) [![license](https://img.shields.io/npm/l/reharness)](LICENSE)\n\n**A reasoning compiler.** Spend a model's intelligence *once*, at compile time, to turn a natural-language request — or a recorded agent trace — into a deterministic finite-state-machine pipeline. Most of the pipeline is ordinary code; only a few clearly-marked **agent** leaves call a model at runtime. The compiled artifact is a persistent, version-controllable directory you can read, test, and ship — and a fully-mechanical task compiles all the way down to **zero runtime model calls** (`0 agent runs · 0 tokens · $0.0000`).\n\nThe human approves the **intent** (a short PRD), never the generated graph. Inter-stage data flow is derived from the topology, not declared. The agent leaves run on the **Pi** backend (the runtime keeps a one-adapter seam for adding others).\n\n## Installation\n\n```bash\nnpm install -g reharness\n```\n\nPackage: **[npmjs.com/package/reharness](https://www.npmjs.com/package/reharness)**.\n\nreharness runs its agent leaves on the **Pi** backend — install it and put it on `PATH`:\n\n- **Pi** — the minimalist agent CLI (`pi`). See [pi-mono](https://github.com/badlogic/pi-mono).\n\nProvide model/API auth as `pi` expects. Node ≥ 18.\n\n## Quick Start\n\n```bash\n# Interactive: design + checkpoint + construct\nreharness compile \"Code review FSM for this project\"\n\n# Agent-driven: skip checkpoint, resolve via auto-event\nreharness compile --auto-approve \"FSM for generating React Native apps from a one-line idea\"\n\n# Run a compiled FSM\nreharness                  # Interactive TUI\nreharness \u003ccommand\u003e args   # Direct\n```\n\n## How `compile` works\n\n```\nresearch (agent)  — optional domain research (skipped with --fast)\nprd (agent)       — distil a human-readable PRD (spec) from request + research\nreview_prd        — APPROVAL CHECKPOINT (the ONLY thing the human approves)\n                    Approve  → design\n                    Revise   → discuss_prd (interactive) → re-approve\ndesign (agent)    — one pass: graph + per-node behavioural \u003ccontract\u003e\nconstruct (code)  — validate, derive inter-stage wiring from the graph, codegen\nfill_prompts      — agent fills agent prompts + code-state implementations\ncheck_dataflow    — deterministic use-before-def report (fed to polish)\npolish (agent)    — one pass: review vs PRD + fix leaves (prompts/code); topology issue → redesign (rare)\nverify (code)     — TS compile + structural checks → done\n```\n\nThe human approves the **PRD** — confirmation the compiler understood the intent — never the FSM graph. Everything downstream is generated from the approved PRD. One checkpoint, agent-friendly: `--auto-approve` resolves it via the state's `auto-event` and emits a warning, so the same workflow serves humans and agents.\n\n**Three fronts, one PRD.** The flow above is the natural-language front (`compile \u003cdescription\u003e`). A **demonstration** (`compile --from-session \u003cpath\u003e`) is grounded by the same evidence-adaptive `research` agent (here from the trace's observed facts, into `skills/`) and distilled into a PRD. An **amendment** (`amend \u003crequest\u003e`) folds a change into an existing PRD. All three converge at `review_prd` and share everything downstream; the human always approves the PRD, never the graph.\n\n## Writing a pipeline by hand\n\n```typescript\n// reharness/commands/build.ts\nimport { defineCommand, definePipeline } from 'reharness';\n\nexport default defineCommand({\n  description: 'Build something',\n  usage: '\u003cname\u003e',\n  run: (args, ctx) =\u003e definePipeline({\n    config: { name: args[0] },\n    initial: 'plan',\n    states: {\n      plan:   { entry: async (c) =\u003e { await c.agent('planner', 'Plan'); },  on: 'code' },\n      code:   { entry: async (c) =\u003e { await c.agent('coder', 'Build'); },   on: 'verify' },\n      verify: {\n        entry: async (c) =\u003e c.shell('npx tsc --noEmit') ? 'PASS' : 'FAIL',\n        on: {\n          PASS: 'done',\n          FAIL: [\n            { target: 'fix', guard: (c) =\u003e c.retries('v') \u003c 3 },\n            { target: 'error' },\n          ],\n        },\n      },\n      fix:    { entry: async (c) =\u003e { c.retry('v'); await c.agent('fixer', 'Fix'); }, on: 'verify' },\n      done:   { type: 'final', status: 'success' },\n      error:  { type: 'final', status: 'error' },\n    },\n  }),\n});\n```\n\n## State types\n\n| Type       | Behavior |\n|------------|----------|\n| `agent`    | LLM agent runs under the state's harness (prompt + tools + contract). |\n| `code`     | Deterministic TypeScript function. Returns an event string. |\n| `approval` | Runtime pauses, shows artifacts, awaits a chosen event. `auto-event` resolves it in auto-approve mode. |\n| `final`    | Terminal: `status: success | error`. |\n\nComposite/routing types — `parallel` (fan-out over an array), `loop` (bounded iteration, `max` required), `switch` (declarative routing), `wait`, `call`, `set`, `interactive` — are documented in [AGENTS.md](AGENTS.md).\n\n## CLI\n\n```bash\nreharness                                  # Interactive TUI\nreharness \u003ccommand\u003e [args...]              # Direct run\nreharness compile \u003cdescription\u003e            # Compile a new workflow (interactive checkpoint)\nreharness compile --auto-approve \u003cdesc\u003e    # Compile autonomously (for agent invocation)\nreharness compile --name \u003cid\u003e \u003cdesc\u003e       # Name the compiled command yourself\nreharness compile --from-session \u003cpath\u003e    # Distil a recorded session (any format) into a reusable workflow\nreharness amend [\u003ccommand\u003e] \u003crequest\u003e      # Amend a compiled command with a new feature\nreharness evolve [\u003ccommand\u003e]               # Learn from the last run: self-heal / amortize routines into tools / refine skills\nreharness graph \u003ccommand\u003e                  # Write the FSM as Mermaid → \u003ccommand\u003e.mmd (renders inline on GitHub)\nreharness graph \u003ccommand\u003e --html           # …or a self-contained interactive viewer → \u003ccommand\u003e.html (click a node)\nreharness \u003ccommand\u003e --dry-run              # Smoke-test routing \u0026 data flow with agents/shells stubbed (no tokens)\nreharness \u003ccommand\u003e --resume               # Resume an interrupted run\nreharness \u003ccommand\u003e --evolve               # After the run, auto-chain evolve on its verdict\n```\n\nOptions: `--model \u003cid\u003e`, `--provider \u003cid\u003e`, `--auto-approve`, `--resume`, `--fast`/`--no-research` (skip web research in\ncompile/amend), `--no-enhance` (skip the auto-chained harness/skill layer), `--name \u003cid\u003e` (compile only). After a\nsuccessful `compile`/`amend`, an **enhance** layer auto-runs (attaches domain-skills per leaf) and a dependency\n**manifest** + `setup.sh`/`Dockerfile` are emitted — the compiler derives and renders them, but never installs anything itself.\n\n### Visualizing a pipeline\n\nA compiled pipeline is a graph, so you can look at it. `reharness graph \u003ccommand\u003e` writes a Mermaid `flowchart` to\n`\u003ccommand\u003e.mmd` (a deterministic pass — no model call) that GitHub and most markdown viewers render inline; `--html`\nwrites a self-contained interactive viewer to `\u003ccommand\u003e.html` where clicking a node shows its contract, prompt,\ntransitions and derived data flow. (`--output \u003cfile\u003e` overrides the name; `--output -` streams to stdout.) Here is the\n`reviewer` pipeline — *clone a repo → review it → open a GitHub issue* — note that **only one node (`review_chunk`)\nis an agent**; everything else is ordinary code:\n\n```mermaid\nflowchart TD\n  START(( )):::st_start\n  START --\u003e n0\n  n0[\"validate\"]\n  n1[\"clone_repo\"]\n  n2[\"filter_files\"]\n  n3[[\"review\"]]\n  n4([\"review_chunk\"])\n  n5[\"aggregate\"]\n  n6[\"compile_issue\"]\n  n7[\"create_issue\"]\n  n8((\"done\"))\n  n9((\"error\"))\n  n0 --\u003e|\"PASS\"| n1\n  n0 --\u003e|\"FAIL\"| n9\n  n1 --\u003e|\"DONE\"| n2\n  n2 --\u003e|\"DONE\"| n3\n  n3 -.-\u003e|\"each item\"| n4\n  n3 --\u003e|\"fork-join\"| n5\n  n5 --\u003e|\"DONE\"| n6\n  n6 --\u003e|\"DONE\"| n7\n  n7 --\u003e|\"DONE\"| n8\n  n7 --\u003e|\"FAIL\"| n9\n  n0 -.-\u003e|\"ERROR\"| n9\n  n1 -.-\u003e|\"ERROR\"| n9\n  classDef st_agent fill:#dbeafe,stroke:#2563eb,color:#1e3a8a\n  classDef st_code fill:#f1f5f9,stroke:#64748b,color:#0f172a\n  classDef st_parallel fill:#dcfce7,stroke:#16a34a,color:#14532d\n  classDef st_final fill:#dcfce7,stroke:#16a34a,color:#14532d\n  classDef st_fail fill:#fee2e2,stroke:#dc2626,color:#7f1d1d\n  classDef st_start fill:#0f172a,stroke:#0f172a,color:#fff\n  class n0,n1,n2,n5,n6,n7 st_code\n  class n3 st_parallel\n  class n4 st_agent\n  class n8 st_final\n  class n9 st_fail\n```\n\n### Backends (providers)\n\nThe agent leaves run on the **Pi** backend; the FSM/compiler are provider-agnostic (a leaf is just \"someone runs it\").\nThe backend is a single adapter in `src/runtime/providers.ts` (argv lowering of the three harness axes + event-stream\nnormalization + RPC turn-framing + synthesized-tool rendering), so adding another backend is one Provider, not a\ncross-cutting change. Select with `--provider`, `def.provider`, or `REHARNESS_PROVIDER` (today: `pi`). `--model` /\n`def.piModel` choose the model within the backend.\n\n### Tuning hyperparameters\n\nTwo tiers, by whose knob it is:\n\n- **The compiler's \u0026 runtime's own knobs** — env vars, one place (`src/config.ts`): `REHARNESS_COMPILER_CONCURRENCY`,\n  `REHARNESS_CORRECTION_RETRIES`, `REHARNESS_SHELL_TIMEOUT_MS`, `REHARNESS_POLL_MS`, `REHARNESS_LIGHT_MODEL`,\n  `REHARNESS_SESSION_CHUNK_CHARS`, `REHARNESS_EVOLVE_GRACE`, `REHARNESS_AGENT_RETRIES` (transient-failure retries\n  per leaf), `REHARNESS_AGENT_BACKOFF_MS` (retry backoff base), `REHARNESS_RUN_RETENTION` (past runs to keep).\n  Set the env var to override the default.\n- **A pipeline's per-run structural knobs** — `--param state.knob=value` (repeatable), applied at the runtime layer\n  so it works the same for a compiled command and for the compiler's own pipelines. `knob ∈ max` (loop iterations),\n  `concurrency` (parallel fan-out), `timeoutMs` (any state). Validated up-front and fail-loud — a loop `max`\n  override must stay a finite integer ≥1 (the termination guarantee). A `--params \u003cfile.json\u003e` profile (a flat\n  `{ \"state.knob\": number }` map) is the base; individual `--param` flags win over it.\n\n```bash\nreharness my-flow --param refine.max=10 --param fanout.concurrency=8   # more loop turns, wider fan-out, this run only\nreharness my-flow --params ./profiles/thorough.json                    # a saved profile of overrides\n```\n\n## Operating in production\n\n- **Backends fail loud, clearly.** If the `pi`/`claude` binary isn't on `PATH` you get an actionable\n  *\"Backend not found — install it / pass a path\"* (not a raw `ENOENT`).\n- **Transient resilience.** A leaf whose backend returns a rate-limit / 5xx / dropped connection backs off and\n  retries (`REHARNESS_AGENT_RETRIES`, default 2; exponential with jitter). A *content* error (bad request, auth)\n  fails fast. Once the budget is spent the leaf fails loud — the FSM never silently stalls.\n- **Secrets are redacted** from traces, terminal output and `state.json` (credentials in a URL, `Authorization`\n  headers, `sk-`/`ghp_`/`AKIA…` tokens, `key=value` secrets). Defence-in-depth, not a substitute for not putting\n  secrets in commands.\n- **Disk is bounded.** Each command keeps its newest `REHARNESS_RUN_RETENTION` runs (default 20); older run\n  records are pruned automatically. Everything regenerable lives under the gitignored `.cache/`.\n- **Termination is guaranteed structurally** — every `loop` carries a required `max`, and every state may set a\n  `timeoutMs`. A compiled pipeline is statically checked (reachability, definite-assignment data-flow, workspace\n  \u0026 substrate rules) before it ever runs.\n- **Smoke-test before you spend.** `reharness \u003ccommand\u003e --dry-run` runs the compiled FSM with every agent, `c.shell`\n  and `c.exec` stubbed — it exercises routing, guards and the derived data flow end-to-end for **0 tokens**, turning\n  *\"it compiled\"* into *\"it reaches a terminal without a crash or dead-end\"*. (It isolates the agent/shell/exec\n  seam; a code state that spawns a subprocess *directly* via `child_process` still runs — so generated code prefers\n  `c.shell` / `c.exec`, which are abortable, timeout-bounded and dry-run-aware.)\n\n**Known limitations (v0.1.0).** No global wall-clock run timeout (bound individual states with `timeoutMs`). A\ncommand's run records live next to their output target (`\u003ctarget\u003e/logs/`), so there is no single cross-target run\nbrowser yet. A lifted bundle needs `reharness` installed at its new location (`npm install` in the bundle, once it\nis published); a pipeline with **external** dependencies also needs its `setup.sh` (from the derived manifest), not\njust `npm install`. While `0.x`, the runtime/compiler API may change between minor versions — a bundle pins the\n`reharness` version it was compiled against, and there is no runtime version-compatibility check. A pipeline that\nmutates **external targets** (your real files) is not transactional: a re-run may re-apply. Linux/macOS are the\ntested platforms (Windows symlink/shell paths are untried). The compiler authors the leaf code; it is a readable,\nversion-controlled artifact you should review like any code — it is not sandboxed.\n\n## Project structure\n\nThe compiled pipeline is a **first-class, liftable bundle**: the `reharness/` directory IS the deliverable —\nversion it, ship it, `mv` it to another machine, run `npm install` once, and it runs (its `package.json` declares\n`reharness` as a dependency, so the generated commands' `import 'reharness'` resolves anywhere). It holds\n**several commands** (a workspace of isolated targets); agents, the PRD archive, and synthesized tools are\nnamespaced per command (`\u003ccmdId\u003e`). Everything regenerable — run logs/state, the evolve ledger, compiler\nscratch — is quarantined under a hidden, gitignored `.cache/`.\n\n```\nmy-project/\n└── reharness/                # ← the deliverable (versioned, shippable, liftable)\n    ├── skeletons/            # Source of truth — one \u003ccmdId\u003e.xml per command\n    ├── prds/                 # Approved intent archive — \u003ccmdId\u003e.md (the \"what\", human-approved)\n    ├── commands/             # Generated from skeletons — do not edit (\u003ccmdId\u003e.ts)\n    ├── lib/                  # Code-state implementations (\u003ccmdId\u003e-states.ts, edit freely)\n    ├── agents/\u003ccmdId\u003e/       # Per-command agent prompts: \u003cname\u003e/SYSTEM.md (+ optional harness.json)\n    ├── skills/               # Shared domain-skills (\u003ctopic\u003e.md, attached to leaves via harness.json)\n    ├── tools/\u003ccmdId\u003e/        # Synthesized tools amortized by evolve (per command)\n    ├── manifest.json         # Derived dependency manifest → setup.sh + Dockerfile (you install, the compiler never does)\n    ├── .gitignore            # ignores .cache/ + node_modules/ (so the deliverable versions clean)\n    └── .cache/               # run-exhaust — gitignored, safe to delete\n        ├── runs/             # Per run: run-*/{state.json, work/\u003cstage\u003e/…, trace/\u003cstage\u003e/NN-stage.md}\n        ├── evolve/           # The utility ledger (ledger.json) + .archive of retired tools\n        └── scratch/          # Transient compiler scratch (prd.md, draft-skeleton.xml, _compiled.md, errors, …)\n```\n\n## Imports\n\n- `reharness` — full public API\n- `reharness/runtime` — FSM runtime only (definePipeline, types, agent runner)\n- `reharness/compiler` — compilation primitives only (parse/serialize XML, codegen, verify)\n\n## License\n\nApache 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbes-dev%2Freharness","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbes-dev%2Freharness","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbes-dev%2Freharness/lists"}