{"id":50100839,"url":"https://github.com/hop-top/ben","last_synced_at":"2026-05-23T07:16:05.832Z","repository":{"id":359148752,"uuid":"1194661414","full_name":"hop-top/ben","owner":"hop-top","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-20T20:11:03.000Z","size":21726,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-20T20:35:55.995Z","etag":null,"topics":["bench","benchmark","benchmark-suite","benchmarking"],"latest_commit_sha":null,"homepage":"https://hop.top/ben","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hop-top.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"docs/contributing.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-03-28T16:47:04.000Z","updated_at":"2026-05-20T20:11:06.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/hop-top/ben","commit_stats":null,"previous_names":["hop-top/ben"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/hop-top/ben","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hop-top%2Fben","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hop-top%2Fben/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hop-top%2Fben/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hop-top%2Fben/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hop-top","download_url":"https://codeload.github.com/hop-top/ben/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hop-top%2Fben/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33386354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T04:15:53.637Z","status":"ssl_error","status_checked_at":"2026-05-23T04:15:53.242Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bench","benchmark","benchmark-suite","benchmarking"],"created_at":"2026-05-23T07:16:00.551Z","updated_at":"2026-05-23T07:16:05.822Z","avatar_url":"https://github.com/hop-top.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ben\n\n\u003e [!WARNING]\n\u003e **🚧 Do Not Use — History Will Be Rewritten 🚧**\n\u003e\n\u003e This repo is undergoing major restructuring as we selectively\n\u003e open-source internal tools built at\n\u003e [Idea Crafters LLC](https://ideacrafters.com). Git history **will be\n\u003e force-pushed and rewritten** multiple times. Do not fork, clone, or\n\u003e depend on this repo in any capacity until we tag a stable release.\n\nGeneral-purpose benchmarking tool — answers \"which approach is better, and by how much?\"\nfor any measurable task: tools, implementations, deps, LLM calls, agents.\n\n---\n\n## Install\n\n```\ngo install hop.top/ben/cmd/ben@latest\n```\n\nben depends on `hop.top/kit kit/v0.4.0-alpha.3`, pinned in `go.mod`\nwith no local override. Local development against unreleased kit\nrevisions uses a `replace` directive in `go.mod` (commented-out\nexample near the bottom of the file):\n\n```go\n// replace hop.top/kit =\u003e ../kit\n```\n\nUncomment, point at your kit checkout, and `go mod tidy`.\n\n---\n\n## Quick start\n\n```sh\n# Inline run: compare two CLI tools on a task\nben run --task \"Find HTTP handlers\" --candidates xray,grep --metric latency_ms,quality_score \\\n    --scorer weighted:latency_ms=0.3,quality_score=0.7 --input repo=.\n\n# Suite file: run a named, repeatable benchmark\nben run --suite .ben/suites/codebase-indexing.yaml\n\n# Compare two historical runs\nben compare 01HX...abc 01HX...def\n\n# List last 10 runs for a suite\nben list --suite codebase-indexing --last 10\n\n# Show one run by id\nben show 01HX...abc\n```\n\n---\n\n## Commands\n\n| Command                         | Description                                           |\n|---------------------------------|-------------------------------------------------------|\n| `ben run`                       | Run benchmark suite or inline task against candidates |\n| `ben list`                      | List recent runs from local storage                   |\n| `ben show \u003crun-id\u003e`             | Show details of one run                               |\n| `ben compare \u003crun-a\u003e \u003crun-b\u003e`   | Diff two run results side-by-side                     |\n| `ben suite list`                | List known suites (global + project-local)            |\n| `ben suite show \u003cname\u003e`         | Show suite spec details                               |\n| `ben registry push \u003crun-id\u003e`    | Push a run to the shared registry                     |\n| `ben registry pull`             | Pull community baselines for a suite                  |\n| `ben config path` / `paths`     | Inspect ben config file precedence                    |\n| `ben spec`                      | Emit machine-readable capability manifest             |\n\n---\n\n## Adapters\n\n| Adapter    | How ben runs the candidate                                               |\n|------------|--------------------------------------------------------------------------|\n| `cli`      | Spawns a shell command; captures stdout/stderr, exit code, latency       |\n| `llm`      | Calls an LLM via API; captures tokens, cost, output                      |\n| `eva`      | Wraps `eva run` as a ben candidate for standard eval suites              |\n| binary     | Any `ben-adapter-*` binary on PATH; communicates via stdio JSON protocol |\n\n---\n\n## Metrics\n\n| Metric           | Source   | Description                                   |\n|------------------|----------|-----------------------------------------------|\n| `latency_ms`     | built-in | Wall-clock execution time in milliseconds     |\n| `exit_code`      | built-in | Process exit code (cli adapter)               |\n| `output_size`    | built-in | Byte length of stdout output                  |\n| `tokens`         | llm      | Total tokens consumed (prompt + completion)   |\n| `cost_usd`       | llm      | Estimated cost in USD                         |\n| `quality_score`  | plugin   | 0–1 relevance score; requires llm_judge plugin|\n\n---\n\n## Scorers\n\n| Scorer                        | Description                                          |\n|-------------------------------|------------------------------------------------------|\n| `single:\u003cmetric\u003e`             | Rank by one metric; lowest wins for cost/latency     |\n| `weighted:\u003cm\u003e=\u003cw\u003e,...`        | Weighted sum across metrics; highest score wins      |\n| `raw`                         | No ranking; emit raw metrics only; winner=null       |\n\nExamples:\n\n```\n--scorer single:latency_ms\n--scorer weighted:latency_ms=0.3,cost_usd=0.2,quality_score=0.5\n--scorer raw\n```\n\n---\n\n## Spec file\n\n```yaml\nname: codebase-indexing\ndescription: Compare xray vs grep for initial codebase orientation\nversion: 1\n\ntask:\n  prompt: \"Find all HTTP handler functions in this repo\"\n  input:\n    repo: ./testdata/sample-repo\n\ncandidates:\n  - name: xray\n    adapter: cli\n    cmd: \"xray explore --search {{input.prompt}} --path {{input.repo}}\"\n  - name: grep\n    adapter: cli\n    cmd: \"grep -r 'func.*Handler' {{input.repo}}\"\n\nmetrics:\n  - latency_ms\n  - quality_score\n\nscorer:\n  strategy: weighted\n  weights:\n    latency_ms: 0.3\n    quality_score: 0.7\n```\n\n---\n\n## Plugin protocol\n\nBinary plugins are auto-discovered as `ben-adapter-\u003cname\u003e` or `ben-reporter-\u003cname\u003e` on PATH.\nBen communicates via newline-delimited JSON over stdio: it writes a request JSON object to the\nplugin's stdin and reads the response from stdout. Adapter plugins receive\n`{\"action\":\"run\",\"candidate\":{...},\"input\":{...}}` and must respond with\n`{\"metrics\":{...},\"output\":\"...\"}`. Reporter plugins receive `{\"run\":{...}}` and write\nformatted output to stdout. Naming convention: use the adapter/reporter name as the suffix,\ne.g. `ben-adapter-docker`, `ben-reporter-markdown`.\n\n---\n\n## Agent usage\n\nben is designed for programmatic use mid-task:\n\n```sh\n# Machine-readable output; all logs to stderr\nben run --suite my-suite --format json --quiet\n\n# Parse winner directly\nben run ... --format json | jq .winner\n```\n\n- `--format json` — emits valid JSON to stdout; diagnostics to stderr only\n- `--quiet` — suppresses stderr; clean for pipelines\n- Exit `0` — successful run (candidate failures are in the result, not exit code)\n- Exit `1` — ben error (bad config, missing adapter, etc.)\n- `winner` field — primary decision signal for agents; `null` when scorer is `raw`\n\n---\n\n## Storage\n\nGlobal (cross-project):\n\n```\n~/.local/share/ben/\n  runs/          # persisted run results\n  registry/      # local registry index + cache\n  suites/        # global suite specs\n```\n\nProject-local (detected automatically when `.ben/` exists in cwd):\n\n```\n.ben/\n  suites/        # project-scoped suite specs\n  runs/          # project-scoped run results\n```\n\nBen prefers project-local storage when `.ben/` is present; falls back to global.\n\n---\n\n## Configuration\n\nBen loads config from three layers, highest precedence first:\n\n| Layer   | Path                              |\n|---------|-----------------------------------|\n| project | `./.ben/config.yaml`              |\n| user    | `$XDG_CONFIG_HOME/ben/config.yaml`|\n| system  | `/etc/ben/config.yaml`            |\n\nRun `ben config paths --format json` to see the active chain. The\n`-c \u003cpath\u003e` flag overrides the discovery chain entirely (kit semantics\n— `-c` wins over any previously discovered file).\n\nThe project-layer path is caller-context-aware via the `KIT_INVOKED_AS`\nenv var (exported by callers like tlc or hop before exec'ing ben):\n\n| `KIT_INVOKED_AS`  | Project config path     |\n|-------------------|-------------------------|\n| (unset/standalone)| `./.ben/config.yaml`    |\n| `hop`             | `./.hop/ben.yaml`       |\n| `tlc`             | `./.tlc/ben.yaml`       |\n\nOnly one project-layer entry wins per invocation (kit constraint).\n\n---\n\n## Release process\n\nRelease pipeline mirrors the `hop-top/.github` reusable workflows:\n\n- `release-please.yml` watches `main`, opens a standing release PR\n  that bumps the version + assembles the changelog.\n- Merging that PR cuts a `ben/v\u003cversion\u003e` tag.\n- The tag push fires `publish.yml` (Go module mirror to\n  `hop-top/ben`) and `goreleaser-on-tag.yml` (cross-platform binaries\n  + Homebrew tap + Scoop bucket entries) in parallel.\n\nPrerelease channel is seeded at `0.2.0-alpha.0`. See\n[`.github/RELEASE-BOOTSTRAP.md`](.github/RELEASE-BOOTSTRAP.md) for\nthe manual web-side steps (mirror-repo creation, GitHub App\ninstallation, org secrets) required before the first cut.\n\n---\n\n## Troubleshooting\n\n**`compile: version \"go1.26.1\" does not match go tool version \"go1.26.2\"`**\n\nCause: stale `GOROOT` exported from an earlier `mise` shell. Quick\nworkaround: `env -u GOROOT go test ./...`. Long-term fix: `mise use\ngo@\u003clatest\u003e` and respawn the shell.\n\n---\n\n## Contributing\n\nSee [docs/contributing.md](docs/contributing.md) for interfaces, how to add adapters/metrics/\nscorers/reporters, and the PR checklist.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhop-top%2Fben","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhop-top%2Fben","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhop-top%2Fben/lists"}