{"id":51117514,"url":"https://github.com/bes-dev/reharness-bench","last_synced_at":"2026-06-24T23:00:56.377Z","repository":{"id":362775119,"uuid":"1260724768","full_name":"bes-dev/reharness-bench","owner":"bes-dev","description":"Execution-based benchmark for the reharness reasoning compiler","archived":false,"fork":false,"pushed_at":"2026-06-12T22:21:56.000Z","size":321,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-13T00:16:29.280Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bes-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-05T20:00:50.000Z","updated_at":"2026-06-05T21:48:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/bes-dev/reharness-bench","commit_stats":null,"previous_names":["bes-dev/reharness-bench"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bes-dev/reharness-bench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bes-dev","download_url":"https://codeload.github.com/bes-dev/reharness-bench/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bes-dev%2Freharness-bench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34752465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-24T02:00:07.484Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-24T23:00:55.564Z","updated_at":"2026-06-24T23:00:56.372Z","avatar_url":"https://github.com/bes-dev.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# reharness-bench\n\nExecution-based benchmark for the [reharness](../pi-fsm) reasoning compiler. It compiles real and\nauthored agent sessions (and NL requests) into FSM pipelines, then **runs the compiled pipeline on a\nfixture and checks the real outcome** with a deterministic verifier — \"compiles green\" is not the bar.\n\n## Layers (gated, cheap → expensive)\n- **L0 ingest** — the session/request was read + staged (format-agnostic)\n- **L1 compile** — verify is green (tsc + structural + dataflow)\n- **L2 non-hollow** — the lib isn't a verify-passing stub\n- **L3 fidelity** — the PRD captures the task + the external target is parameterised (`\u003carg\u003e` / manifest)\n- **L4 execution** — the compiled command runs on a fixture; a deterministic verifier checks the outcome\n\n## Run\n```\nnpm install\nnpm link reharness            # put the compiler under test on PATH (or: REHARNESS_CLI=/path/to/dist/cli.js)\nnpm run bench                 # all cases\nnpm run bench -- \u003ccase-id\u003e    # one case, e.g. trace-fixbug\nnpm run bench -- corpus 3     # compile the 3 largest real synthtraces (L0–L2)\nnpm run mine                  # stratify the corpus by capability (heuristic, no LLM)\n```\nSet `REHARNESS_CLI` to a working-tree `dist/cli.js` to bench an unreleased build.\n\n## Layout\n- `cases/\u003cid\u003e/` — a case: `meta.json`, `session.jsonl|md` (or an NL `request` in meta), `fixture/`, optional `verify.mjs`\n- `run.mts` — the harness (compile → L0–L4)\n- `mine.mts` — corpus property-stratifier\n- `CAPABILITIES.md` — the compiler capability matrix this bench targets\n- `corpus/` — third-party trajectory data (gitignored): `synthtraces/` (julien-c/synthtraces), `nemotron/` (nvidia/Nemotron-Agentic-v1), `traces/` (captured subagent runs)\n\n## Provenance of cases\n- **mined real-trace** (`trace-*`): captured from real subagent runs on self-verifying tasks (unbiased trajectories; tasks authored)\n- **authored sessions**: hand-written demonstrations\n- **NL-request**: compiled from a natural-language request (no session)\n- **corpus**: third-party HF datasets — L0–L2 only (no fixtures/gold)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbes-dev%2Freharness-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbes-dev%2Freharness-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbes-dev%2Freharness-bench/lists"}