https://github.com/gsaini/playwright-mcp-getting-started
Drive Playwright MCP from a deterministic Node.js client to validate a frontend app — no LLM in the loop. Demonstrates three assertion styles (snapshot refs, CSS selectors, page evaluate) across 9 end-to-end scenarios.
https://github.com/gsaini/playwright-mcp-getting-started
biome browser-automation demo e2e-testing frontend-testing mcp model-context-protocol no-llm nodejs playwright playwright-mcp pnpm react react-router tailwind-css tailwindcss vite
Last synced: 3 days ago
JSON representation
Drive Playwright MCP from a deterministic Node.js client to validate a frontend app — no LLM in the loop. Demonstrates three assertion styles (snapshot refs, CSS selectors, page evaluate) across 9 end-to-end scenarios.
- Host: GitHub
- URL: https://github.com/gsaini/playwright-mcp-getting-started
- Owner: gsaini
- Created: 2026-05-16T22:49:22.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-13T04:40:01.000Z (4 days ago)
- Last Synced: 2026-06-14T23:34:28.806Z (3 days ago)
- Topics: biome, browser-automation, demo, e2e-testing, frontend-testing, mcp, model-context-protocol, no-llm, nodejs, playwright, playwright-mcp, pnpm, react, react-router, tailwind-css, tailwindcss, vite
- Language: JavaScript
- Size: 218 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Validating a frontend with Playwright MCP — no LLM in the loop









This demo shows that the **Model Context Protocol** is a transport, not a
runtime detail of any particular AI. Anything that speaks MCP can drive an MCP
server. Here, a plain Node.js script — no Claude, no OpenAI, no inference of
any kind — drives [`@playwright/mcp`](https://github.com/microsoft/playwright-mcp)
to run an end-to-end validation of **Nimbus Gear**, a React + Tailwind 4 demo
store.
```text
┌──────────────────────┐ stdio / JSON-RPC ┌──────────────────────┐
│ validator/run.mjs │ ────────────────────────► │ @playwright/mcp │
│ (deterministic) │ ◄──────────────────────── │ (Chromium driver) │
└──────────────────────┘ tools/list, tools/call └──────────────────────┘
│ │
│ asserts on returned text / JSON │ navigates,
▼ ▼ clicks, types
PASS / FAIL summary Vite dev server :5173
(React 19 + Tailwind 4)
```
The validator decides which tool to call next using ordinary control flow,
exactly as a hand-written E2E test would. The MCP server is the only "smart"
piece — it knows how to drive Chromium.
## The demo app — Nimbus Gear
A small React storefront with:
- Mock username/password auth (demo / demo) with protected routes
- Product catalogue with **search**, **category filter**, and **sort**
- Product detail pages with quantity stepper, "Add to cart", "Buy now"
- Shopping cart with line-quantity controls, subtotals, and a Remove action
- Multi-field **checkout** with inline validation
- Order success page with generated order number
- **Light / System / Dark** theme toggle (persisted to localStorage)
Built with React 19, React Router 7, Vite 8, and Tailwind CSS 4 (CSS-only
theming via `@theme` + a `dark` custom variant).
## Layout
| Path | What it is |
| --- | --- |
| [app/index.html](app/index.html), [app/vite.config.js](app/vite.config.js) | Vite entrypoint + config |
| [app/src/main.jsx](app/src/main.jsx) | React root, provider tree |
| [app/src/App.jsx](app/src/App.jsx) | Router with protected routes |
| [app/src/routes/](app/src/routes/) | 6 route components (Login, Catalog, ProductDetail, Cart, Checkout, OrderSuccess) |
| [app/src/components/](app/src/components/) | Header, ThemeToggle, ProductCard, ProtectedRoute |
| [app/src/hooks/](app/src/hooks/) | `useAuth`, `useCart`, `useTheme` contexts |
| [app/src/data/products.js](app/src/data/products.js) | In-memory product catalogue |
| [app/src/styles.css](app/src/styles.css) | Tailwind import + theme tokens (light/dark) |
| [validator/run.mjs](validator/run.mjs) | Orchestrator — connect, run groups, summarise |
| [validator/lib/mcp-client.mjs](validator/lib/mcp-client.mjs) | Wraps the official `@modelcontextprotocol/sdk` `Client` |
| [validator/lib/snapshot.mjs](validator/lib/snapshot.mjs) | Parses the YAML-ish accessibility tree returned by `browser_snapshot` |
| [validator/lib/helpers.mjs](validator/lib/helpers.mjs) | High-level helpers — `clickByRole`, `typeSelector`, `evaluate`, `setReactInputValue`, … |
| [validator/lib/harness.mjs](validator/lib/harness.mjs) | Scenario runner + assertions (`assert`, `assertEqual`, `assertContains`) |
| [validator/scenarios/auth.mjs](validator/scenarios/auth.mjs) | Login, validation, redirect |
| [validator/scenarios/catalog.mjs](validator/scenarios/catalog.mjs) | Search, filter, sort, navigation |
| [validator/scenarios/cart.mjs](validator/scenarios/cart.mjs) | Add, quantity, remove, totals |
| [validator/scenarios/checkout.mjs](validator/scenarios/checkout.mjs) | Form validation, happy path, order number |
| [validator/scenarios/theme.mjs](validator/scenarios/theme.mjs) | Light / dark / system, persistence |
| [validator/scenarios/visual.mjs](validator/scenarios/visual.mjs) | Full-page screenshots |
| [validator/features/*.feature](validator/features/) | Plain-English specs (hand-written) — compile to `.mjs` via `pnpm spec:compile` |
| [tools/compile-scenarios/](tools/compile-scenarios/) | LLM-powered `.feature` → `.mjs` compiler (Anthropic / Ollama) |
| [biome.json](biome.json) | Biome config (lint + format) |
## Quick start
```bash
pnpm install
pnpm exec playwright install chromium # one-time, ~150 MB
pnpm demo # spawns Vite, runs all scenarios, tears down
```
If Vite is already running and you just want to iterate on tests:
```bash
pnpm app # terminal 1
pnpm validate # terminal 2
```
Pass `--headed` to watch the browser:
```bash
node validator/run.mjs --start-app --headed
```
Screenshots land in [screenshots/](screenshots/) (catalogue in light + dark
themes, plus a product-detail capture).
### Plain-English specs (optional)
Two directories, two responsibilities:
```text
validator/features/ intent-level .feature specs (hand-written; source of truth)
validator/scenarios/ generated .mjs scenarios (committed; CI runs these)
```
The compiler is an **agentic LLM** that drives the live app via Playwright
MCP at *compile time* to discover the real DOM, then emits a deterministic
`.mjs`. At *runtime* (`pnpm demo`) the saved `.mjs` runs with no LLM —
that's the whole point.
```text
COMPILE TIME (occasional) RUNTIME (every pnpm demo / CI run)
┌────────────┐ ┌──────────────┐
│ .feature │ ─────┐ │ scenarios/ │
│ (intent) │ ▼ │ *.mjs │
└────────────┘ ┌────────┐ │ (committed) │
│ LLM │ └──────┬───────┘
│ + tools│ │
└───┬────┘ ▼
│ browser_navigate, ┌──────────────────┐
▼ snapshot, click… │ validator/run │
┌──────────────────────┐ │ (deterministic) │
│ @playwright/mcp │ ◄───────── └─────┬────────────┘
│ ↳ Vite app on :5173 │ │ same MCP server
└──────────────────────┘ ◄────────────────┘ same tools
│
▼ write_scenario(code)
scenarios/*.mjs
```
#### Compiling
```bash
# Anthropic (default)
ANTHROPIC_API_KEY=sk-... pnpm spec:compile
# Local Ollama (no API key, no network egress; needs a tool-use-capable model)
pnpm spec:compile:ollama --model=qwen2.5-coder:14b
# Print the plan only — no Vite spawn, no model call
pnpm spec:compile:dry-run
# Subset of features
pnpm spec:compile --only=auth,catalog
# Raise the per-feature tool-use cap if a complex feature needs it
pnpm spec:compile --max-turns=80
```
The compiler is intentionally **SDK-agnostic** — it speaks raw HTTP to both
providers, so the project carries no `@anthropic-ai/sdk` / `ollama` package
to track or upgrade.
#### What the compiler does, step by step
1. Spawns `vite app` (the demo app) and `@playwright/mcp` (the browser
driver).
2. For each `.feature` file, navigates the browser to a fresh state and
hands the LLM a tool-use loop with these tools:
- `browser_navigate`, `browser_snapshot`, `browser_click`,
`browser_type`, `browser_press_key`, `browser_wait_for`,
`browser_evaluate` — proxied straight through to MCP.
- `write_scenario({ code })` — terminal tool. The LLM calls this exactly
once when it has explored enough to write a complete `.mjs`.
3. Captures the `code` argument, prepends an `AUTO-GENERATED` header, and
writes `validator/scenarios/.mjs`.
4. Tears down Vite + MCP.
#### Determinism trade-offs
- **At compile time** the LLM observes a *live* browser. JSX with dynamic
classNames, conditional rendering, computed `aria-label` strings — all
resolved. The model writes selectors against what it *saw*, not what
the source code *says*.
- **At runtime** the generated `.mjs` is plain code. Same Playwright MCP
server, same helpers, no model — every run is identical given the same
app build.
- **Cost** is paid once per compile and amortized over every CI run.
Anthropic-side caching makes files 2–N in a batch ~10× cheaper than file 1
(look for `cache hit` in the per-feature log line).
Edit the `.feature` and recompile — do not hand-edit the generated `.mjs`.
#### Tagging scenarios
Tags use standard Gherkin `@tag` syntax — one or more whitespace-separated
tags on the line directly above a `Feature:` or `Scenario:`. Tags on
`Feature:` cascade to every scenario in the file; tags on `Scenario:` are
local to that scenario.
Keep tags to **two axes** that map to real commands. Avoid area tags
(`@auth`, `@cart`) — the filename already encodes that, and `--only=` covers
file-level filtering.
| Axis | Tag | Meaning |
| --- | --- | --- |
| Priority | `@smoke` | Runs on every PR — keep the set tiny and fast |
| Priority | `@regression` | Full nightly / pre-release suite |
| Priority | `@slow` | Expensive scenarios (e.g. visual diffs) — skip in fast loops |
| Lifecycle | `@wip` | Compiler skips; not ready for CI |
| Lifecycle | `@flaky` | Validator soft-fails or retries; under investigation |
Example:
```gherkin
@smoke
Feature: Authentication
...
Scenario: valid credentials redirect to the catalogue
...
@flaky
Scenario: session persists across reloads
...
```
Filter at the command line — `--tags=` intersects with `--only=`:
```bash
pnpm spec:compile --tags=@smoke # compile smoke set only
pnpm spec:compile --tags=@smoke,@regression # union: either tag
pnpm spec:compile --tags=@smoke --tags-not=@flaky # smoke minus flaky
pnpm validate --tags=@smoke # filter the run, not the compile
```
The compiler emits resolved tags into the generated `.mjs` as a
`tags: [...]` array on each `scenario(...)` call, so the validator can
filter at run time without re-parsing `.feature` files. Resolved means
cascade-merged: a `@smoke` `Feature:` with a `@flaky` `Scenario:` ends up
with `tags: ["smoke", "flaky"]`.
**When to skip tags entirely**: if file-level (`--only=auth,catalog`)
granularity is always enough, don't introduce tags — they're a second
filtering surface that pays off only once you need *sub-file* control
(one slow scenario inside an otherwise fast feature, a single flaky case
you want to quarantine without yanking its siblings).
### Lint & format
[Biome](https://biomejs.dev) handles JS / JSX / JSON;
[markdownlint-cli2](https://github.com/DavidAnson/markdownlint-cli2) handles
Markdown.
```bash
pnpm lint # Biome lint
pnpm lint:fix # Biome lint + autofix
pnpm lint:md # markdownlint
pnpm lint:md:fix # markdownlint + autofix
pnpm format # Biome format (rewrites to canonical style)
pnpm check # Biome check + markdownlint — CI-friendly, runs both gates
```
Rule configuration:
- Biome: [biome.json](biome.json) — JSX-aware, ES module style, 100-col wrap.
- markdownlint: [.markdownlint-cli2.jsonc](.markdownlint-cli2.jsonc) — default
rule set with `MD013` (line length) and `MD033` (inline HTML) disabled
because the README intentionally uses wide prose and shields.io badge
markup. `MD024` is restricted to `siblings_only` so the same heading
text can appear under different parents. OpenSpec-managed markdown
(`.claude/commands/opsx/`, `.claude/skills/openspec-*/`, `openspec/`) is
excluded — it has its own `openspec validate`.
## Spec-driven changes (OpenSpec)
Larger changes are planned with
[OpenSpec](https://github.com/Fission-AI/OpenSpec), a spec-driven-development
layer for AI coding assistants. You describe a change in plain English;
OpenSpec scaffolds a proposal, design, task list, and delta specs under
`openspec/`. You implement against the tasks, then archive — merging the
deltas into the source-of-truth specs in `openspec/specs/`.
It's wired into Claude Code as slash commands (restart the IDE after install
to load them):
| Command | What it does |
| --- | --- |
| `/opsx:propose ""` | Create a change, generate proposal + design + tasks |
| `/opsx:apply` | Implement the tasks for a change |
| `/opsx:archive` | Archive a finished change, merge its delta specs |
| `/opsx:explore`, `/opsx:sync` | Browse changes / sync deltas into main specs |
OpenSpec is a **devDependency**, so the CLI runs through pnpm:
```bash
pnpm exec openspec list # active changes
pnpm exec openspec validate # structural check of specs + changes
```
The generated slash commands call `pnpm exec openspec` for the same reason.
The [pnpm-workspace.yaml](pnpm-workspace.yaml) entry approves OpenSpec's
cosmetic postinstall so pnpm 11's build-script gate doesn't block `pnpm exec`.
## MCP-LIVE — authoring with a live browser
**MCP-LIVE** = author / repair compiled tests with a real browser open,
observing the real DOM at each step. The `mcp__playwright__*` tool family
drives a Playwright browser session against the running app
(`http://localhost:5173` locally, `https://.frado.ai` in deployed
environments), takes DOM snapshots, runs `browser_evaluate` to inspect
specific nodes, and the resulting locators go straight into the committed
test files — [validator/scenarios/](validator/scenarios/) `*.mjs` in this
repo (the same role `tests-compiled/**/*.spec.ts` plays in TypeScript
Playwright layouts).
**It's not** a separate runtime. The compiled tests still execute via plain
Playwright in CI. MCP-LIVE only changes how the tests are *written*.
```text
AUTHORING (MCP-LIVE) RUNTIME (CI / pnpm demo)
┌──────────────────┐ ┌──────────────────┐
│ human or LLM │ │ scenarios/*.mjs │
│ at the keyboard │ │ (committed) │
└────────┬─────────┘ └────────┬─────────┘
│ mcp__playwright__* │ plain Playwright
▼ (snapshot, evaluate, click) ▼ (no MCP author loop)
┌──────────────────┐ ┌──────────────────┐
│ live browser │ │ headless browser │
│ on real app │ │ on real app │
└──────────────────┘ └──────────────────┘
│
▼ paste locator
scenarios/*.mjs
```
Two forms of the same pattern:
- **Automated** — the [LLM compiler](#what-the-compiler-does-step-by-step)
drives the MCP browser, explores, and emits the `.mjs`. Run it when a
`.feature` changes.
- **Manual** — open the MCP browser yourself when a selector goes flaky or
you're sketching a new scenario. Snapshot the page, `browser_evaluate`
the node you care about, paste the locator into the scenario file. Same
tools, same DOM truth — just no model in the loop.
In both cases the artifact that ships is a plain Playwright file; MCP is
the authoring surface, not the runtime.
## What the demo validates
Six feature groups, **32 scenarios** total:
| Group | Scenarios | Coverage |
| --- | --- | --- |
| `auth` | 4 | Redirect on no-auth, form fields render, bad creds rejected, good creds redirect |
| `catalog` | 9 | Initial render, filter pill, search, empty state, sort, out-of-stock badge, deep link |
| `cart` | 7 | Quantity stepper, add toast, badge sync, multi-product cart, totals math, remove, checkout CTA |
| `checkout` | 5 | All-empty errors, email format error, valid submission, order number format, cart cleared |
| `theme` | 4 | Light removes `.dark`, Dark adds it, persistence, System defers to OS |
| `visual` | 3 | Light + dark catalogue screenshots, product detail screenshot |
## Three patterns for asserting state
The helpers in [validator/lib/helpers.mjs](validator/lib/helpers.mjs) deliberately
support three interaction styles. Pick whichever fits the element at hand:
**1. Snapshot tree** — `browser_snapshot` returns an ARIA-style tree. Find a
node by `role` + `name`. Resilient to layout/CSS changes.
```js
const { nodes } = await snapshot(mcp);
findOne(nodes, "heading", "Welcome back");
await clickByRole(mcp, "button", "Sign in");
```
**2. CSS selector via `target`** — every interactive tool's `target` parameter
also accepts a unique selector. Useful when an element has a stable
`id` / `data-testid` but a noisy accessible name.
```js
await clickSelector(mcp, '#filters button[aria-pressed="false"]:nth-child(2)');
await typeSelector(mcp, "#field-email", "demo@nimbus.gear");
```
**3. Page evaluate** — run JS in the page and JSON-decode the result. Best
for precise, structural assertions.
```js
const lines = await evaluate(mcp, () =>
Array.from(document.querySelectorAll('[data-testid="cart-line"]')).map((li) => ({
id: li.dataset.productId,
qty: Number(li.querySelector('[data-testid="line-qty"]').textContent),
}))
);
assertEqual(lines, [{ id: "headphones-aurora", qty: 2 }]);
```
A fourth helper, `setReactInputValue`, uses the native value setter from
`HTMLInputElement.prototype` so programmatic value changes correctly trigger
React's controlled-input tracker (a well-known React quirk that bites people
trying to clear an input via `el.value = ""`).
## Why "no LLM"?
A few practical reasons to drive Playwright MCP from a non-AI client:
- **CI determinism** — the same inputs always run the same scenarios. No
sampling, no token budget, no "the agent decided to skip a step today."
- **Cost & speed** — no inference calls. The full 32-scenario suite runs in
about 70 seconds, most of which is real browser time.
- **Auditability** — the test file *is* the spec. Reviewers see exactly what
ran, in what order, with what assertions.
- **MCP server reuse** — your team already runs `@playwright/mcp` for an AI
agent? The exact same server now also powers your test suite.
LLM-driven exploration is great for *finding* bugs you didn't know to look
for. Deterministic MCP clients are great for *preventing regressions* on bugs
you already fixed. They are complementary, not alternatives.
## Extending
- **Add a scenario**: append another `await scenario("…", …)` block in the
appropriate file under [validator/scenarios/](validator/scenarios/).
- **Add a feature group**: create `validator/scenarios/myFeature.mjs`, export
a `myFeatureScenarios(mcp, …)` function, re-export it from
[validator/scenarios/index.mjs](validator/scenarios/index.mjs), and call
it from [validator/run.mjs](validator/run.mjs).
- **Add a helper**: drop it in [validator/lib/helpers.mjs](validator/lib/helpers.mjs).
Keep it generic — anything app-specific belongs in the scenario file.
- **Validate a different app**: change `APP_URL` (or set the env var) and
rewrite the scenarios. None of the plumbing in `lib/` is app-specific.