{"id":51344508,"url":"https://github.com/roymcfarland/llm-workbench","last_synced_at":"2026-07-02T10:30:45.424Z","repository":{"id":365333698,"uuid":"1213935776","full_name":"roymcfarland/llm-workbench","owner":"roymcfarland","description":"Open-source, model-agnostic control plane for LLM agents — tamper-evident, human-gated, replayable run bundles with full trace history, model I/O, and cost telemetry. TypeScript · React · MCP · OpenAPI. MIT, published on npm.","archived":false,"fork":false,"pushed_at":"2026-06-26T06:02:29.000Z","size":1152,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-26T08:04:10.938Z","etag":null,"topics":["ai-agents","ai-governance","ai-sdk","audit-trail","human-in-the-loop","llm","llm-agents","llm-observability","mcp","model-context-protocol","nextjs","observability","openapi","react","replay","run-bundles","tracing","typescript"],"latest_commit_sha":null,"homepage":"https://www.llmworkbench.io","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roymcfarland.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":"audit-ci.jsonc","citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-17T23:29:05.000Z","updated_at":"2026-06-26T06:02:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/roymcfarland/llm-workbench","commit_stats":null,"previous_names":["roymcfarland/llm-workbench"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/roymcfarland/llm-workbench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roymcfarland%2Fllm-workbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roymcfarland%2Fllm-workbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roymcfarland%2Fllm-workbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roymcfarland%2Fllm-workbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roymcfarland","download_url":"https://codeload.github.com/roymcfarland/llm-workbench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roymcfarland%2Fllm-workbench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35043932,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-02T02:00:06.368Z","response_time":173,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","ai-governance","ai-sdk","audit-trail","human-in-the-loop","llm","llm-agents","llm-observability","mcp","model-context-protocol","nextjs","observability","openapi","react","replay","run-bundles","tracing","typescript"],"created_at":"2026-07-02T10:30:44.942Z","updated_at":"2026-07-02T10:30:45.413Z","avatar_url":"https://github.com/roymcfarland.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Workbench\n\n[![npm version](https://img.shields.io/npm/v/@llm-workbench/runtime.svg)](https://www.npmjs.com/package/@llm-workbench/runtime)\n[![CI](https://github.com/roymcfarland/llm-workbench/actions/workflows/ci.yml/badge.svg)](https://github.com/roymcfarland/llm-workbench/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n[![Node](https://img.shields.io/node/v/@llm-workbench/runtime.svg)](https://nodejs.org)\n\n**An open-source control plane for LLM-powered products.**\n\nLLM Workbench gives AI applications a production-grade human interface for\nthe messy parts that matter: workflow state, artifacts, rules, human review\ngates, trace history, model I/O, cost telemetry, import/export, and replay.\n\nIt is not another chat UI. It is the layer you bolt onto an LLM pipeline when\nyou want non-technical users to inspect, edit, approve, branch, audit, and\nlearn from the work your system is doing.\n\nThe runtime is headless, model-agnostic, and environment-agnostic. It does not\ncall OpenAI, Anthropic, local models, or any other provider directly. Your host\napplication owns prompts, tools, models, and policy. LLM Workbench records what\nhappened and gives humans a clean control surface over it.\n\n\u003e **License:** [MIT](LICENSE) — free to use, modify, and distribute. The five\n\u003e core libraries are published to npm under the\n\u003e [`@llm-workbench`](https://www.npmjs.com/org/llm-workbench) scope.\n\n## Status\n\n`v0.3.x` (June 2026): **LLM Workbench is now open source under the MIT License**\nand published to npm under the [`@llm-workbench`](https://www.npmjs.com/org/llm-workbench)\nscope (five packages: `runtime`, `ui`, `adapters-react`, `ai-sdk`, `mcp`). This\nrelease focused on making the packages genuinely installable and safe to depend on:\na CI smoke test that imports the built packages under plain Node ESM (not just a\nbundler), removal of the last `unsafe-eval` from the production CSP by precompiling\nJSON-Schema validators at build time, cleared dependency advisories, a\nproduction-scoped audit gate in CI, secret scanning, and packages published with\nbuild provenance. See the launch post:\n[llm-workbench-is-now-open-source](https://www.llmworkbench.io/blog/llm-workbench-is-now-open-source).\n\n`v0.2.0` (2026-04-27): the runtime adds Trace 2.0 (hierarchical spans, OTel\nGenAI mapper), hierarchical supervision (`runChildrenOf`, `cancelRunCascade`),\nand an externalizable `ArtifactStore`; `@llm-workbench/ai-sdk` wraps Vercel\nAI SDK v5 with automatic trace events; the UI ships scoped `lwb-` CSS,\naccessible `@dnd-kit` reorder, virtualized trace, and a `WorkflowGraph`;\nand a hosted reference deployment lands at [`apps/web`](apps/web).\nSee [CHANGELOG.md](CHANGELOG.md) for the full list.\n\n**Project spec:** [PROJECT.md](PROJECT.md) is the authoritative source of\ntruth for purpose, scope, non-goals, and the rules that automated reviewers\nenforce on every PR.\n\n## See It Live\n\n- **Interactive demo (no signup):** https://www.llmworkbench.io/runs/demo — a\n  read-only LLM Workbench run rendered exactly as an authenticated run is.\n- **Overview \u0026 docs:** https://www.llmworkbench.io · https://www.llmworkbench.io/docs/protocol\n\n## Install\n\n```bash\nnpm install @llm-workbench/runtime\n```\n\nOptional companion packages:\n\n```bash\nnpm install @llm-workbench/ui @llm-workbench/adapters-react   # React control surface\nnpm install @llm-workbench/ai-sdk                              # Vercel AI SDK tracing\nnpm install @llm-workbench/mcp                                 # expose runs over MCP\n```\n\nAll five libraries are published under the\n[`@llm-workbench`](https://www.npmjs.com/org/llm-workbench) scope (MIT, ESM,\nNode 22+). The runtime has no React or framework dependency — it runs in the\nbrowser, Node, or edge-style runtimes. Jump to the\n[60-second integration](#60-second-integration) for a complete example.\n\n## For Reviewers\n\nIf you're reviewing this repo, a useful 15-minute path is:\n\n1. Open the live demo first: https://www.llmworkbench.io/runs/demo.\n2. Skim [PROJECT.md](PROJECT.md), then the [Architecture](#architecture)\n   section below.\n3. Read one representative source file:\n   [`packages/runtime/src/runtime/session.ts`](packages/runtime/src/runtime/session.ts).\n4. Read one representative test suite:\n   [`packages/runtime/src/runtime/workbench.test.ts`](packages/runtime/src/runtime/workbench.test.ts).\n\n## How This Repo Is Built\n\nMost changes are shipped as deliberately small slices. The maintainer\nacts as architect/advisor: designing scope, grounding the prompt in repository\nrecon, catching spec errors, reviewing the implementation, and deciding whether\nto merge. A coding agent then implements the scoped PR, and a separate verifier\nagent independently checks it against [PROJECT.md](PROJECT.md) with a\nstructured APPROVE/REJECT verdict.\n\nThe process artifacts are there on purpose. [PROJECT.md](PROJECT.md)\nis the contract both agents are held to; each slice's build record (closeout)\nlives in its PR description. [VERIFIER-AUDIT-PR8.md](docs/process/VERIFIER-AUDIT-PR8.md)\nand [VERIFIER-AUDIT-PR10.md](docs/process/VERIFIER-AUDIT-PR10.md) are independent\nverification transcripts from specific PRs.\n\n## Why It Exists\n\nLLM apps fail in boring, expensive ways:\n\n- Outputs change and nobody knows why.\n- Prompts, rules, artifacts, and human edits drift apart.\n- Non-technical reviewers get a black box instead of useful controls.\n- Teams cannot replay what happened after a bad run.\n- Model spend is logged somewhere, but not where product decisions happen.\n- \"Add AI\" becomes a pile of custom debugging panels and brittle JSON editors.\n\nLLM Workbench turns that chaos into an inspectable run graph.\n\n## What You Get\n\n- **Model-agnostic runtime.** The host decides which provider, model, prompt\n  strategy, and tool registry to use. The runtime records model I/O and tool\n  calls through explicit APIs.\n- **Workflow-shaped execution.** Workflows are DAGs with step-level gate\n  policies: `AUTO`, `PAUSE_BEFORE`, `PAUSE_AFTER`, and `CHECKPOINT`.\n- **Human review gates.** Pause before or after important steps, collect\n  approvals, rejections, edits, and notes, then resume with traceable intent.\n- **Schema-validated artifacts and rules.** Bring JSON Schemas, validate data\n  through Ajv, patch artifacts safely, and export redacted user bundles.\n- **Tamper-evident run bundles.** Exports are SHA-256 signed over canonical\n  JSON. Imports verify integrity by default.\n- **Telemetry-ready traces.** Track provider, model, usage, duration, cost,\n  user, tenant, account, and plan metadata without locking into a vendor.\n- **Cost and usage summaries.** `summarizeModelTelemetry` turns raw trace\n  events into a typed ledger grouped by provider, model, step, user, tenant,\n  and plan.\n- **Pluggable persistence.** Use memory, IndexedDB, or HTTP behind one\n  `RunRepository` interface. The HTTP adapter supports auth headers, timeouts,\n  retries, and abort signals.\n- **Composable UI.** Use `WorkbenchShell` as a ready-made React control panel,\n  or build your own UI against the headless runtime.\n\n## Architecture\n\n```\nhost app\n  owns models, prompts, tools, business logic\n  calls runtime APIs as work happens\n\n@llm-workbench/runtime\n  records workflow state, artifacts, rules, gates, traces, bundles, telemetry\n  runs in browser, Node, or edge-style runtimes\n\n@llm-workbench/ui\n  React shell for artifact editing, rules, trace history, gates, import/export\n\n@llm-workbench/adapters-react\n  subscription hooks for live runtime state\n```\n\n## Repository Layout\n\n```\npackages/\n  runtime/              @llm-workbench/runtime\n  ui/                   @llm-workbench/ui\n  adapters-react/       @llm-workbench/adapters-react\n  ai-sdk/               @llm-workbench/ai-sdk\n  mcp/                  @llm-workbench/mcp (MCP server + HTTP adapter)\nexamples/\n  job-search-demo/    Vite demo app exercising the full surface\n  run-repo-server/    Reference REST store for HttpRunRepository\napps/\n  web/                Hosted reference deployment (Next.js + Supabase + AI Gateway + Clerk)\n```\n\n| Package | What it gives you |\n| --- | --- |\n| `@llm-workbench/runtime` | Protocol types, `WorkbenchRuntime`, `WorkbenchSession`, `SchemaRegistry`, persistence adapters, bundle import/export, telemetry summaries, and structured `WorkbenchError`. |\n| `@llm-workbench/ui` | `WorkbenchShell`, a themeable React interface for artifacts, rules, traces, gates, and bundles. |\n| `@llm-workbench/adapters-react` | `useWorkbenchRunRevision` for subscribing React components to live run state. |\n| `@llm-workbench/ai-sdk` | Vercel AI SDK v5 wrappers (`tracedGenerateText`, `tracedStreamText`, `tracedGenerateObject`, `tracedStreamObject`, `traceTools`) that emit correlated `model_io`, `tool_call`, and gateway-cost trace events automatically. |\n| `@llm-workbench/mcp` | Model Context Protocol server factory plus HTTP handler (`createWorkbenchMcpHttpHandler`) for exposing the runtime over MCP — see [`packages/mcp/README.md`](packages/mcp/README.md). |\n\n## Local Development\n\nTo work on the monorepo itself (rather than consume the published packages),\nclone it and run:\n\n```bash\nnpm install\nnpm test\nnpm run build\nnpm run demo               # Vite demo app at http://localhost:5173\nnpm run demo:http-server   # Reference REST store for HttpRunRepository\n```\n\nNode.js **22+** is required (`engines` in root `package.json`). CI runs on **Node 22 and 24** (`.github/workflows/ci.yml`). See [CONTRIBUTING.md](CONTRIBUTING.md) to get involved.\n\n## 60-Second Integration\n\n```ts\nimport {\n  WorkbenchRuntime,\n  SchemaRegistry,\n  registerDemoSchemas,\n  summarizeModelTelemetry,\n} from \"@llm-workbench/runtime\";\n\nconst registry = new SchemaRegistry();\nregisterDemoSchemas(registry);\n\nconst runtime = new WorkbenchRuntime();\nconst { runId } = runtime.startRun({\n  workflow: {\n    id: \"my-pipeline\",\n    version: 1,\n    steps: [\n      { id: \"parse\", gatePolicy: \"PAUSE_BEFORE\" },\n      { id: \"score\", gatePolicy: \"AUTO\" },\n    ],\n    edges: [{ id: \"e1\", from: \"parse\", to: \"score\" }],\n  },\n  subject: {\n    userId: \"user_123\",\n    tenantId: \"team_456\",\n    planId: \"pro\",\n  },\n});\n\nconst session = runtime.session(runId);\n\nsession.resolveGate({\n  stepId: \"parse\",\n  gate: \"PAUSE_BEFORE\",\n  decision: \"approved\",\n});\n\nsession.beginStep(\"parse\");\n\nsession.writeArtifact({\n  artifactKey: \"compiledProfile\",\n  typeId: \"compiledProfile\",\n  data: {\n    headline: \"TypeScript engineer\",\n    skills: [\"typescript\", \"react\", \"systems\"],\n    summary: \"Strong full-stack builder with AI workflow experience.\",\n  },\n});\n\nsession.logModelIO({\n  stepId: \"parse\",\n  direction: \"response\",\n  provider: \"openai\",\n  model: \"gpt-example\",\n  usage: { inputTokens: 120, outputTokens: 40 },\n  cost: { amount: 0.0012, currency: \"USD\" },\n  durationMs: 900,\n});\n\nsession.completeStep(\"parse\");\n\nconst telemetry = summarizeModelTelemetry(session.snapshot());\nconsole.log(telemetry.totals, telemetry.byProviderModel);\n```\n\nDrop the shell anywhere in your app:\n\n```tsx\n\u003cWorkbenchShell runtime={runtime} runId={runId} registry={registry} /\u003e\n```\n\n## Runtime Principles\n\n- The runtime never hides state behind provider-specific abstractions.\n- Structured outputs should be schema-validated before they become product\n  state.\n- Human edits and approvals are first-class trace events, not side notes.\n- Exported runs should be useful for debugging, audits, demos, and learning.\n- Model telemetry should be close enough to the workflow that cost and quality\n  can be managed together.\n- The public protocol should be boring, explicit, and durable.\n\n## License\n\nLLM Workbench is released under the **[MIT License](LICENSE)** — free to use,\nmodify, and distribute, including commercially. The same license applies to every\npackage under `packages/*`.\n\n## Contributing\n\nContributions are welcome. Open an issue to discuss a change or report a bug via\n[GitHub Issues](https://github.com/roymcfarland/llm-workbench/issues), and see\n[CONTRIBUTING.md](CONTRIBUTING.md) for local setup and the PR process. Please also\nread the [Code of Conduct](CODE_OF_CONDUCT.md).\n\n## Security\n\nPlease report security issues through the process in [SECURITY.md](SECURITY.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froymcfarland%2Fllm-workbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froymcfarland%2Fllm-workbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froymcfarland%2Fllm-workbench/lists"}