{"id":50450641,"url":"https://github.com/outsourc-e/bench-loop-web","last_synced_at":"2026-06-01T00:01:25.702Z","repository":{"id":357458825,"uuid":"1237054779","full_name":"outsourc-e/bench-loop-web","owner":"outsourc-e","description":"BenchLoop web app + public site for bench-loop.com — FastAPI backend, local React dashboard, static marketing site.","archived":false,"fork":false,"pushed_at":"2026-05-23T07:54:11.000Z","size":1844,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-23T09:12:02.756Z","etag":null,"topics":["benchmark","fastapi","leaderboard","llm","local-llm","react"],"latest_commit_sha":null,"homepage":"https://bench-loop.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/outsourc-e.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-12T20:42:08.000Z","updated_at":"2026-05-23T07:54:14.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/outsourc-e/bench-loop-web","commit_stats":null,"previous_names":["outsourc-e/bench-loop-web"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/outsourc-e/bench-loop-web","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outsourc-e%2Fbench-loop-web","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outsourc-e%2Fbench-loop-web/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outsourc-e%2Fbench-loop-web/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outsourc-e%2Fbench-loop-web/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/outsourc-e","download_url":"https://codeload.github.com/outsourc-e/bench-loop-web/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outsourc-e%2Fbench-loop-web/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33753925,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","fastapi","leaderboard","llm","local-llm","react"],"created_at":"2026-06-01T00:01:13.573Z","updated_at":"2026-06-01T00:01:25.684Z","avatar_url":"https://github.com/outsourc-e.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BenchLoop Web\n\nThe web surface for [BenchLoop](https://bench-loop.com) — a local-first benchmark suite for LLM models that scores **quality, speed, and reliability** across seven fixed task suites (`speed`, `toolcall`, `coding`, `dataextract`, `instructfollow`, `reasonmath`, `agent`).\n\nPick a model on any reachable endpoint (Ollama, LM Studio, Osaurus, vLLM, oMLX, Jan, or any OpenAI-compatible server), pick the suites, hit Run, watch live progress, then compare results in the leaderboard.\n\n## Architecture\n\n```\nbench-loop-web/\n  api/    FastAPI app (uvicorn) wrapping the bench-loop runner\n  ui/     React + Vite frontend\n```\n\nThe API delegates to `bench-loop/` (sibling repo) for the actual benchmark logic. Runs are persisted to `~/.bench-loop/runs/` so they survive restarts and show up in the leaderboard from disk.\n\n## Quick start (dev)\n\nTwo long-running processes:\n\n```bash\n# 1. API (port 8877)\ncd bench-loop-web/api\nPYTHONPATH=/Users/aurora/.ocplatform/workspace/bench-loop \\\nBENCH_LOOP_DIR=/Users/aurora/.ocplatform/workspace/bench-loop \\\n  /Users/aurora/.ocplatform/workspace/bench-loop/.venv/bin/python \\\n  -m uvicorn main:app --host 127.0.0.1 --port 8877 --app-dir .\n\n# 2. UI (port 5180)\ncd bench-loop-web/ui\nnpm install\nnpx vite --host 127.0.0.1 --port 5180\n```\n\nOpen \u003chttp://127.0.0.1:5180/\u003e.\n\n## Pages\n\n| Path | Purpose |\n|---|---|\n| `/` `/models` | Auto-detect local providers, browse model catalog, jump to benchmark |\n| `/chat` | Quick chat against any reachable model |\n| `/benchmark` | Pick model + suites + harness, run with live progress |\n| `/leaderboard` | Best run per model+harness, rank by overall/quality/speed/tok-s/efficiency. Click row for detail, hit Compare per row |\n| `/runs/:runId` | Full per-suite scores, speed metrics, machine info, raw JSON |\n| `/compare?a=\u0026b=` | Two runs side-by-side with deltas across every metric |\n| `/stacks` | Stack-oriented context-window leaderboard |\n\n## API endpoints\n\n| Route | What |\n|---|---|\n| `GET  /api/health` | Liveness |\n| `GET  /api/hardware` | Local machine info (CPU, GPU, memory) |\n| `GET  /api/models?endpoint=...` | List models. If endpoint omitted, auto-probe localhost for Ollama (11434), LM Studio (1234), oMLX/Osaurus (8000), Jan (1337), vLLM (8080) |\n| `GET  /api/models/preflight?endpoint=...\u0026model=...` | Verify a model can actually load |\n| `GET  /api/models/search-hf?q=\u0026limit=` | Search Hugging Face |\n| `GET  /api/models/hf-details?repo=` | HF repo metadata |\n| `POST /api/models/pull` | Trigger a model pull |\n| `GET  /api/models/pull/active` | List in-flight pulls |\n| `GET  /api/models/pull/{id}/stream` | SSE for pull progress |\n| `POST /api/benchmark/run` | Start a benchmark. Body: `{model, endpoint, provider, suites[], harness}` |\n| `GET  /api/benchmark/runs` | List persisted runs with v2 speed-score recompute |\n| `GET  /api/benchmark/runs/{runId}` | Run detail (active or persisted) |\n| `GET  /api/benchmark/stream/{runId}` | SSE for live progress |\n| `POST /api/chat/generate` | Passthrough chat completion |\n\n## Providers\n\nProvider type is auto-detected per model and passed to the runner:\n\n- `ollama` — Ollama's `/api/chat` (default for `http://localhost:11434` and any tunnelled Ollama)\n- `openai_compat` — Any OpenAI-compatible `/v1/chat/completions`: LM Studio, vLLM, Osaurus/MLX, Jan, oMLX, hosted endpoints\n\nThe UI's BenchmarkTab picks the correct provider based on the chosen model's source — no manual selection needed.\n\n## Harnesses\n\nWrap the same task in different prompt/parse contracts so you can A/B \"this model with raw tools\" vs \"this model with Hermes tags\":\n\n- `raw` — vanilla OpenAI-style tools, no prompt rewriting\n- `hermes` — NousResearch `\u003ctool_call\u003e{...}\u003c/tool_call\u003e` XML tags\n- `qwen` — Qwen3 `\u003cfunction_call\u003e{...}\u003c/function_call\u003e` tags\n- `pi` — OpenClaw/Pi-style `\u003cthink\u003e...\u003c/think\u003e` + Hermes tags\n\n## What ships in v1\n\n- ✅ Six fixed task suites, deterministic + reproducible\n- ✅ Live SSE progress per task\n- ✅ Provider auto-detect (Ollama + OpenAI-compatible)\n- ✅ Run persistence + leaderboard from disk\n- ✅ Per-run detail + side-by-side compare\n- ✅ Speed-score v2 curve (anchored on real M-series/RTX reference points)\n- ✅ Preflight model-load check with actionable diagnostics\n- ⏳ True streaming TTFT (currently 0 for openai_compat; requires streaming pass)\n- ⏳ Hosted leaderboard at bench-loop.com\n- ⏳ Community submission flow\n\n## License\n\nTBD before the public launch.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foutsourc-e%2Fbench-loop-web","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foutsourc-e%2Fbench-loop-web","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foutsourc-e%2Fbench-loop-web/lists"}