https://github.com/roymcfarland/llm-workbench

Open-source, model-agnostic control plane for LLM agents — tamper-evident, human-gated, replayable run bundles with full trace history, model I/O, and cost telemetry. TypeScript · React · MCP · OpenAPI. MIT, published on npm.
https://github.com/roymcfarland/llm-workbench

ai-agents ai-governance ai-sdk audit-trail human-in-the-loop llm llm-agents llm-observability mcp model-context-protocol nextjs observability openapi react replay run-bundles tracing typescript

Last synced: about 3 hours ago
JSON representation

Host: GitHub
URL: https://github.com/roymcfarland/llm-workbench
Owner: roymcfarland
License: mit
Created: 2026-04-17T23:29:05.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-06-26T06:02:29.000Z (6 days ago)
Last Synced: 2026-06-26T08:04:10.938Z (6 days ago)
Topics: ai-agents, ai-governance, ai-sdk, audit-trail, human-in-the-loop, llm, llm-agents, llm-observability, mcp, model-context-protocol, nextjs, observability, openapi, react, replay, run-bundles, tracing, typescript
Language: TypeScript
Homepage: https://www.llmworkbench.io
Size: 1.1 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Audit: audit-ci.jsonc
- Security: SECURITY.md
- Roadmap: ROADMAP.md

Awesome Lists containing this project

README

# LLM Workbench

[![npm version](https://img.shields.io/npm/v/@llm-workbench/runtime.svg)](https://www.npmjs.com/package/@llm-workbench/runtime)
[![CI](https://github.com/roymcfarland/llm-workbench/actions/workflows/ci.yml/badge.svg)](https://github.com/roymcfarland/llm-workbench/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Node](https://img.shields.io/node/v/@llm-workbench/runtime.svg)](https://nodejs.org)

**An open-source control plane for LLM-powered products.**

LLM Workbench gives AI applications a production-grade human interface for
the messy parts that matter: workflow state, artifacts, rules, human review
gates, trace history, model I/O, cost telemetry, import/export, and replay.

It is not another chat UI. It is the layer you bolt onto an LLM pipeline when
you want non-technical users to inspect, edit, approve, branch, audit, and
learn from the work your system is doing.

The runtime is headless, model-agnostic, and environment-agnostic. It does not
call OpenAI, Anthropic, local models, or any other provider directly. Your host
application owns prompts, tools, models, and policy. LLM Workbench records what
happened and gives humans a clean control surface over it.

> **License:** [MIT](LICENSE) — free to use, modify, and distribute. The five
> core libraries are published to npm under the
> [`@llm-workbench`](https://www.npmjs.com/org/llm-workbench) scope.

## Status

`v0.3.x` (June 2026): **LLM Workbench is now open source under the MIT License**
and published to npm under the [`@llm-workbench`](https://www.npmjs.com/org/llm-workbench)
scope (five packages: `runtime`, `ui`, `adapters-react`, `ai-sdk`, `mcp`). This
release focused on making the packages genuinely installable and safe to depend on:
a CI smoke test that imports the built packages under plain Node ESM (not just a
bundler), removal of the last `unsafe-eval` from the production CSP by precompiling
JSON-Schema validators at build time, cleared dependency advisories, a
production-scoped audit gate in CI, secret scanning, and packages published with
build provenance. See the launch post:
[llm-workbench-is-now-open-source](https://www.llmworkbench.io/blog/llm-workbench-is-now-open-source).

`v0.2.0` (2026-04-27): the runtime adds Trace 2.0 (hierarchical spans, OTel
GenAI mapper), hierarchical supervision (`runChildrenOf`, `cancelRunCascade`),
and an externalizable `ArtifactStore`; `@llm-workbench/ai-sdk` wraps Vercel
AI SDK v5 with automatic trace events; the UI ships scoped `lwb-` CSS,
accessible `@dnd-kit` reorder, virtualized trace, and a `WorkflowGraph`;
and a hosted reference deployment lands at [`apps/web`](apps/web).
See [CHANGELOG.md](CHANGELOG.md) for the full list.

**Project spec:** [PROJECT.md](PROJECT.md) is the authoritative source of
truth for purpose, scope, non-goals, and the rules that automated reviewers
enforce on every PR.

## See It Live

- **Interactive demo (no signup):** https://www.llmworkbench.io/runs/demo — a
read-only LLM Workbench run rendered exactly as an authenticated run is.
- **Overview & docs:** https://www.llmworkbench.io · https://www.llmworkbench.io/docs/protocol

## Install

```bash
npm install @llm-workbench/runtime
```

Optional companion packages:

```bash
npm install @llm-workbench/ui @llm-workbench/adapters-react # React control surface
npm install @llm-workbench/ai-sdk # Vercel AI SDK tracing
npm install @llm-workbench/mcp # expose runs over MCP
```

All five libraries are published under the
[`@llm-workbench`](https://www.npmjs.com/org/llm-workbench) scope (MIT, ESM,
Node 22+). The runtime has no React or framework dependency — it runs in the
browser, Node, or edge-style runtimes. Jump to the
[60-second integration](#60-second-integration) for a complete example.

## For Reviewers

If you're reviewing this repo, a useful 15-minute path is:

1. Open the live demo first: https://www.llmworkbench.io/runs/demo.
2. Skim [PROJECT.md](PROJECT.md), then the [Architecture](#architecture)
section below.
3. Read one representative source file:
[`packages/runtime/src/runtime/session.ts`](packages/runtime/src/runtime/session.ts).
4. Read one representative test suite:
[`packages/runtime/src/runtime/workbench.test.ts`](packages/runtime/src/runtime/workbench.test.ts).

## How This Repo Is Built

Most changes are shipped as deliberately small slices. The maintainer
acts as architect/advisor: designing scope, grounding the prompt in repository
recon, catching spec errors, reviewing the implementation, and deciding whether
to merge. A coding agent then implements the scoped PR, and a separate verifier
agent independently checks it against [PROJECT.md](PROJECT.md) with a
structured APPROVE/REJECT verdict.

The process artifacts are there on purpose. [PROJECT.md](PROJECT.md)
is the contract both agents are held to; each slice's build record (closeout)
lives in its PR description. [VERIFIER-AUDIT-PR8.md](docs/process/VERIFIER-AUDIT-PR8.md)
and [VERIFIER-AUDIT-PR10.md](docs/process/VERIFIER-AUDIT-PR10.md) are independent
verification transcripts from specific PRs.

## Why It Exists

LLM apps fail in boring, expensive ways:

- Outputs change and nobody knows why.
- Prompts, rules, artifacts, and human edits drift apart.
- Non-technical reviewers get a black box instead of useful controls.
- Teams cannot replay what happened after a bad run.
- Model spend is logged somewhere, but not where product decisions happen.
- "Add AI" becomes a pile of custom debugging panels and brittle JSON editors.

LLM Workbench turns that chaos into an inspectable run graph.

## What You Get

- **Model-agnostic runtime.** The host decides which provider, model, prompt
strategy, and tool registry to use. The runtime records model I/O and tool
calls through explicit APIs.
- **Workflow-shaped execution.** Workflows are DAGs with step-level gate
policies: `AUTO`, `PAUSE_BEFORE`, `PAUSE_AFTER`, and `CHECKPOINT`.
- **Human review gates.** Pause before or after important steps, collect
approvals, rejections, edits, and notes, then resume with traceable intent.
- **Schema-validated artifacts and rules.** Bring JSON Schemas, validate data
through Ajv, patch artifacts safely, and export redacted user bundles.
- **Tamper-evident run bundles.** Exports are SHA-256 signed over canonical
JSON. Imports verify integrity by default.
- **Telemetry-ready traces.** Track provider, model, usage, duration, cost,
user, tenant, account, and plan metadata without locking into a vendor.
- **Cost and usage summaries.** `summarizeModelTelemetry` turns raw trace
events into a typed ledger grouped by provider, model, step, user, tenant,
and plan.
- **Pluggable persistence.** Use memory, IndexedDB, or HTTP behind one
`RunRepository` interface. The HTTP adapter supports auth headers, timeouts,
retries, and abort signals.
- **Composable UI.** Use `WorkbenchShell` as a ready-made React control panel,
or build your own UI against the headless runtime.

## Architecture

```
host app
owns models, prompts, tools, business logic
calls runtime APIs as work happens

@llm-workbench/runtime
records workflow state, artifacts, rules, gates, traces, bundles, telemetry
runs in browser, Node, or edge-style runtimes

@llm-workbench/ui
React shell for artifact editing, rules, trace history, gates, import/export

@llm-workbench/adapters-react
subscription hooks for live runtime state
```

## Repository Layout

```
packages/
runtime/ @llm-workbench/runtime
ui/ @llm-workbench/ui
adapters-react/ @llm-workbench/adapters-react
ai-sdk/ @llm-workbench/ai-sdk
mcp/ @llm-workbench/mcp (MCP server + HTTP adapter)
examples/
job-search-demo/ Vite demo app exercising the full surface
run-repo-server/ Reference REST store for HttpRunRepository
apps/
web/ Hosted reference deployment (Next.js + Supabase + AI Gateway + Clerk)
```

| Package | What it gives you |
| --- | --- |
| `@llm-workbench/runtime` | Protocol types, `WorkbenchRuntime`, `WorkbenchSession`, `SchemaRegistry`, persistence adapters, bundle import/export, telemetry summaries, and structured `WorkbenchError`. |
| `@llm-workbench/ui` | `WorkbenchShell`, a themeable React interface for artifacts, rules, traces, gates, and bundles. |
| `@llm-workbench/adapters-react` | `useWorkbenchRunRevision` for subscribing React components to live run state. |
| `@llm-workbench/ai-sdk` | Vercel AI SDK v5 wrappers (`tracedGenerateText`, `tracedStreamText`, `tracedGenerateObject`, `tracedStreamObject`, `traceTools`) that emit correlated `model_io`, `tool_call`, and gateway-cost trace events automatically. |
| `@llm-workbench/mcp` | Model Context Protocol server factory plus HTTP handler (`createWorkbenchMcpHttpHandler`) for exposing the runtime over MCP — see [`packages/mcp/README.md`](packages/mcp/README.md). |

## Local Development

To work on the monorepo itself (rather than consume the published packages),
clone it and run:

```bash
npm install
npm test
npm run build
npm run demo # Vite demo app at http://localhost:5173
npm run demo:http-server # Reference REST store for HttpRunRepository
```

Node.js **22+** is required (`engines` in root `package.json`). CI runs on **Node 22 and 24** (`.github/workflows/ci.yml`). See [CONTRIBUTING.md](CONTRIBUTING.md) to get involved.

## 60-Second Integration

```ts
import {
WorkbenchRuntime,
SchemaRegistry,
registerDemoSchemas,
summarizeModelTelemetry,
} from "@llm-workbench/runtime";

const registry = new SchemaRegistry();
registerDemoSchemas(registry);

const runtime = new WorkbenchRuntime();
const { runId } = runtime.startRun({
workflow: {
id: "my-pipeline",
version: 1,
steps: [
{ id: "parse", gatePolicy: "PAUSE_BEFORE" },
{ id: "score", gatePolicy: "AUTO" },
],
edges: [{ id: "e1", from: "parse", to: "score" }],
},
subject: {
userId: "user_123",
tenantId: "team_456",
planId: "pro",
},
});

const session = runtime.session(runId);

session.resolveGate({
stepId: "parse",
gate: "PAUSE_BEFORE",
decision: "approved",
});

session.beginStep("parse");

session.writeArtifact({
artifactKey: "compiledProfile",
typeId: "compiledProfile",
data: {
headline: "TypeScript engineer",
skills: ["typescript", "react", "systems"],
summary: "Strong full-stack builder with AI workflow experience.",
},
});

session.logModelIO({
stepId: "parse",
direction: "response",
provider: "openai",
model: "gpt-example",
usage: { inputTokens: 120, outputTokens: 40 },
cost: { amount: 0.0012, currency: "USD" },
durationMs: 900,
});

session.completeStep("parse");

const telemetry = summarizeModelTelemetry(session.snapshot());
console.log(telemetry.totals, telemetry.byProviderModel);
```

Drop the shell anywhere in your app:

```tsx

```

## Runtime Principles

- The runtime never hides state behind provider-specific abstractions.
- Structured outputs should be schema-validated before they become product
state.
- Human edits and approvals are first-class trace events, not side notes.
- Exported runs should be useful for debugging, audits, demos, and learning.
- Model telemetry should be close enough to the workflow that cost and quality
can be managed together.
- The public protocol should be boring, explicit, and durable.

## License

LLM Workbench is released under the **[MIT License](LICENSE)** — free to use,
modify, and distribute, including commercially. The same license applies to every
package under `packages/*`.

## Contributing

Contributions are welcome. Open an issue to discuss a change or report a bug via
[GitHub Issues](https://github.com/roymcfarland/llm-workbench/issues), and see
[CONTRIBUTING.md](CONTRIBUTING.md) for local setup and the PR process. Please also
read the [Code of Conduct](CODE_OF_CONDUCT.md).

## Security

Please report security issues through the process in [SECURITY.md](SECURITY.md).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/roymcfarland/llm-workbench

Awesome Lists containing this project

README