https://github.com/bug0inc/passmark

The open-source Playwright library for AI browser regression testing with intelligent caching, auto-healing, and multi-model verification.
https://github.com/bug0inc/passmark
ai ai-agents ai-testing aigateway aisdk browser-testing e2e-testing playwright qa qa-automation qaautomation regression-testing testing typescript vercel
Last synced: about 2 months ago
JSON representation
The open-source Playwright library for AI browser regression testing with intelligent caching, auto-healing, and multi-model verification.
Host: GitHub
URL: https://github.com/bug0inc/passmark
Owner: bug0inc
License: other
Created: 2026-03-29T18:22:06.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-05-17T10:08:01.000Z (about 2 months ago)
Last Synced: 2026-05-17T12:29:27.542Z (about 2 months ago)
Topics: ai, ai-agents, ai-testing, aigateway, aisdk, browser-testing, e2e-testing, playwright, qa, qa-automation, qaautomation, regression-testing, testing, typescript, vercel
Language: TypeScript
Homepage: https://passmark.dev
Size: 444 KB
Stars: 772
Watchers: 4
Forks: 157
Open Issues: 18
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project

awesome-ai-testing - Passmark - Open-source AI regression testing framework on Playwright with intelligent caching, auto-healing, and multi-model verification. (Natural Language Test Authoring)
README

          


    

    


    The open-source Playwright library for AI regression testing.





    

    

    

    

    



Passmark covers your browser regression testing end-to-end and **helps you catch regressions early. Fast.**

It uses AI models to execute natural language browser steps via Playwright, with intelligent caching, auto-healing, and multi-model assertion verification. Your tests stay stable without needing to update AI prompts or retrain models.

## Quick Start

```bash

npm init playwright@latest passmark-project # select the default options and set language to TypeScript

cd passmark-project

npm install passmark

```

We need at least one model from Anthropic and one from Google to use Passmark's multi-model consensus features. Set the required environment variables in `.env`:

```

ANTHROPIC_API_KEY=sk-ant-...

GOOGLE_GENERATIVE_AI_API_KEY=AIza...

```

Alternatively, you can use an AI gateway like Vercel AI Gateway or OpenRouter to route requests to multiple providers without managing individual API keys. If you choose this option, set `AI_GATEWAY_API_KEY` (for Vercel) or `OPENROUTER_API_KEY` (for OpenRouter) instead.

You can also route requests through Cloudflare AI Gateway for observability, caching, and rate limiting. Unlike Vercel/OpenRouter, Cloudflare is a proxy (not a reseller), so you still need your own `ANTHROPIC_API_KEY` / `GOOGLE_GENERATIVE_AI_API_KEY` alongside `CLOUDFLARE_ACCOUNT_ID` and `CLOUDFLARE_AI_GATEWAY` (and `CLOUDFLARE_AI_GATEWAY_API_KEY` if the gateway has authentication enabled).

Set your Playwright project to read `.env` by adding the following to `playwright.config.ts`  (after `import { defineConfig, devices } from '@playwright/test';`):

```typescript

import dotenv from 'dotenv';

import path from 'path';

dotenv.config({ path: path.resolve(__dirname, '.env') });

```

Make sure you install `dotenv` by running `npm install dotenv`.

Now, paste the following code into `tests/example.spec.ts`:

```typescript

import { test, expect } from "@playwright/test";

import { runSteps } from "passmark";

test.use({

  headless: !!process.env.CI,

});

test("Shopping cart tests", async ({ page }) => {

  test.setTimeout(60_000); // increase timeout for AI execution

  await runSteps({

    page,

    userFlow: "Add product to cart",

    steps: [

      { description: "Navigate to https://demo.vercel.store" },

      { description: "Click Acme Circles T-Shirt" },

      { description: "Select color", data: { value: "White" } },

      { description: "Select size", data: { value: "S" } },

      { description: "Add to cart", waitUntil: "My Cart is visible" },

    ],

    assertions: [{ assertion: "You can see My Cart with Acme Circles T-Shirt" }],

    test,

    expect

  });

});

```

If you are using an AI gateway, you can add the following to the above code:

```typescript

import { runSteps, configure } from "passmark";

configure({

  ai: {

    gateway: "vercel" // or "openrouter" or "cloudflare"

    // Set AI_GATEWAY_API_KEY (Vercel), OPENROUTER_API_KEY (OpenRouter), or

    // CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_AI_GATEWAY (+ CLOUDFLARE_AI_GATEWAY_API_KEY

    // if the gateway is authenticated) in your .env file. Cloudflare also requires

    // the upstream provider keys (ANTHROPIC_API_KEY, GOOGLE_GENERATIVE_AI_API_KEY).

  }

});

```

To run the test, use:

```bash

npx playwright test example.spec.ts --project chromium

```

After the test completes, you can run `npx playwright show-report` to see a detailed report of the test execution, including an AI summary at the top, provided by Passmark.

### Using CUA mode (OpenAI computer-use agent)

By default Passmark uses ARIA accessibility snapshots. For visual, screenshot-driven automation via OpenAI's computer-use agent, opt in with `mode: "cua"`:

```typescript

import { configure } from "passmark";

configure({

  ai: {

    mode: "cua",

    gateway: "none", // CUA requires direct OpenAI access

  },

});

```

Set `OPENAI_API_KEY` in your `.env`. Then you can write tests like this:

```typescript

test("Shopping cart tests", async ({ page }) => {

  await runSteps({

    page,

    userFlow: "Add product to cart",

    steps: [

      { description: "Navigate to https://demo.vercel.store" },

      { description: "Click Acme Circles T-Shirt" },

      { description: "Select color", data: { value: "White" } },

      { description: "Add to cart", waitUntil: "My Cart is visible" },

    ],

    test,

    expect,

  });

});

```

Notes:

- CUA mode uses OpenAI's `gpt-5.5` + built-in `computer` tool. The CUA model is currently locked and not user-configurable.

- Redis step caching is skipped in CUA mode because coordinate actions aren't portable across viewport sizes.

- `gateway: "vercel" | "openrouter" | "cloudflare"` is not compatible with CUA — the Responses-API `computer` tool is only exposed on direct OpenAI access.

- Account requirements: your OpenAI API key must have access to the CUA model and the built-in `computer` tool on the Responses API.

#### Per-step overrides (hybrid runs)

The same `ai` shape accepted by `configure()` can also be passed at the `runSteps`/`runUserFlow` call level **and** on individual `Step`s. This lets you mix snapshot steps (cheap, cacheable, OpenRouter/Vercel/etc.) with CUA steps (visual, direct OpenAI) in a single run. Precedence: `step.ai` ▶ call-level `ai` ▶ global `configure()`.

```typescript

configure({ ai: { gateway: "openrouter" } }); // most steps go through OpenRouter

await runSteps({

  page, test, expect,

  userFlow: "Buy product on sale",

  steps: [

    { description: "Navigate to /products" },                     // OpenRouter snapshot

    {

      description: "Drag the price slider to $40",

      ai: { mode: "cua", gateway: "none" },                       // CUA for this step only

    },

    { description: "Click Add to cart" },                         // back to OpenRouter snapshot

  ],

});

```

Set `OPENAI_API_KEY` whenever any step opts into `mode: "cua"`. CUA steps still require `gateway: "none"`; mixing CUA with a non-`none` gateway throws at the per-step level for the same reason it does globally.

## Features

- **Core Execution** — `runSteps()` and `runUserFlow()` for flexible test orchestration in natural language, with smart caching and auto-healing

- **Multi-Model Assertion Engine** — Consensus-based validation using Claude and Gemini, with an arbiter model to resolve disagreements

- **Video Assertions** — Opt in per-assertion to record the full step run and evaluate the assertion against the whole video via Gemini's Files API. Useful for ephemeral UI (toasts, snackbars) that a single screenshot may miss

- **Redis-Based Step Caching** — Cache-first execution with AI fallback and automatic self-healing when cached steps fail

- **Configurable AI Models** — 8 dedicated model slots for step execution, assertions, extraction, and more

- **AI Gateway Support** — Route requests through Vercel AI Gateway, OpenRouter, Cloudflare AI Gateway, or connect directly to provider SDKs

- **Dynamic Placeholders** — Inject values at runtime with `{{run.*}}`, `{{global.*}}`, `{{data.*}}`, and `{{email.*}}` expressions for repeatable and data-driven tests

- **Email Extraction** — Pluggable email provider interface with a built-in emailsink provider

- **AI-Powered Data Extraction** — Extract structured values from page snapshots and URLs using AI

- **Smart Wait Conditions** — AI-evaluated wait conditions with exponential backoff. No rigid selectors or time-based waits needed.

- **Secure Script Runner** — AST-validated Playwright script execution with an allowlisted API surface

- **Telemetry** — Optional Axiom and OpenTelemetry tracing via environment variables

- **Structured Logging** — Pino-based logger with configurable log levels

- **Global Configuration** — Single `configure()` entry point for models, gateway, email provider, and upload path

## Core Functions

### `runSteps(options: RunStepsOptions)`

Executes a sequence of steps using AI with caching. Each step is described in natural language and executed via Playwright.

```typescript

await runSteps({

  page,

  userFlow: "Checkout Flow",

  steps: [

    { description: "Add item to cart" },

    { description: "Go to checkout" },

    { description: "Fill in shipping details", data: { value: "123 Main St" } },

  ],

  assertions: [{ assertion: "Order confirmation is displayed" }],

  test,

  expect,

});

```

### `runUserFlow(options: UserFlowOptions)`

Runs a complete user flow as a single AI agent call. Best for exploratory testing where exact steps are flexible.

```typescript

const result = await runUserFlow({

  page,

  userFlow: "Complete a purchase",

  steps: "Navigate to store, add an item, checkout with test card",

  effort: "high", // by default "low" uses gemini-3-flash for faster execution; "high" uses gemini-3.1-pro-preview for deeper thinking

});

```

### `assert(options: AssertionOptions)`

Multi-model consensus assertion. Runs Claude and Gemini in parallel; if they disagree, a third model arbitrates.

```typescript

const result = await assert({

  page,

  assertion: "The dashboard shows 3 active projects",

  expect,

});

```

### Video Assertions

For UI that's only visible for a second or two — toast messages, snackbar confirmations, transient banners — a single end-of-flow screenshot often misses the evidence. Set `video: true` on an assertion inside `runSteps` and Passmark will record the entire step run with `page.screencast`, upload the resulting `.webm` to Gemini's Files API, and evaluate the assertion against the full video:

```typescript

await runSteps({

  page,

  userFlow: "Add to cart",

  steps: [

    { description: "Click Acme Circles T-Shirt" },

    { description: "Add to cart" },

  ],

  assertions: [

    { assertion: "An 'Added to cart' toast appears", video: true },

  ],

  test,

  expect,

});

```

Notes:

- Recording spans the **entire** step run (start of first step to end of last step). One recording is shared across all `video: true` assertions in the same `runSteps` call.

- The video file is written to `/tmp/passmark-recordings/` by default and deleted automatically after the assertions consume it. Override via `configure({ videoDir: "/your/path" })`.

- This path uses **only Gemini** (no Claude/Gemini consensus) since Claude doesn't accept video. The model is `gemini-3-flash-preview`.

- Video assertions go **directly** to Gemini's Files API regardless of any configured `gateway` — file URIs are tied to the uploading Google account, so the gateway can't proxy them. You must set `GOOGLE_GENERATIVE_AI_API_KEY` (or `GEMINI_API_KEY`) even when the rest of your stack runs through Vercel / OpenRouter / Cloudflare.

- If `page.screencast.start()` fails (rare), video assertions silently fall back to the regular screenshot/snapshot path so the run still completes.

## Configuration

Call `configure()` once before using any functions:

```typescript

import { configure } from "passmark";

configure({

  ai: {

    gateway: "none", // "none" (default), "vercel", "openrouter", or "cloudflare"

    models: {

      stepExecution: "google/gemini-3-flash",

      utility: "google/gemini-2.5-flash",

    },

  },

  uploadBasePath: "./uploads",

});

```

## Environment Variables

| Variable | Required | Default | Description |

|----------|----------|---------|-------------|

| `REDIS_URL` | No | - | Redis connection URL for step caching and global state. Can also be set via `configure({ redis: { url } })`, which takes precedence. |

| `ANTHROPIC_API_KEY` | Yes | - | Anthropic API key for Claude models |

| `GOOGLE_GENERATIVE_AI_API_KEY` | Yes | - | Google API key for Gemini models. Also required for `video: true` assertions regardless of gateway (file URIs are tied to the uploading account). |

| `OPENAI_API_KEY` | No | - | OpenAI API key for OpenAI models (required for CUA mode; must have Responses-API `computer` tool access) |

| `AI_GATEWAY_API_KEY` | If gateway=vercel | - | Vercel AI Gateway API key |

| `OPENROUTER_API_KEY` | If gateway=openrouter | - | OpenRouter API key |

| `CLOUDFLARE_ACCOUNT_ID` | If gateway=cloudflare | - | Cloudflare account ID that owns the AI Gateway |

| `CLOUDFLARE_AI_GATEWAY` | If gateway=cloudflare | - | Cloudflare AI Gateway name (slug) |

| `CLOUDFLARE_AI_GATEWAY_API_KEY` | If gateway=cloudflare and the gateway is authenticated | - | Cloudflare AI Gateway token (sent as `cf-aig-authorization`) |

| `AXIOM_TOKEN` | No | - | Axiom token for OpenTelemetry tracing. Can also be set via `configure({ telemetry: { axiomToken } })`, which takes precedence. |

| `AXIOM_DATASET` | No | - | Axiom dataset for trace storage. Can also be set via `configure({ telemetry: { axiomDataset } })`, which takes precedence. |

| `PASSMARK_LOG_LEVEL` | No | `info` | Log level: `debug`, `info`, `warn`, `error`, `silent` |

## Model Configuration

All models are configurable via `configure({ ai: { models: { ... } } })`:

| Key | Default | Used For |

|-----|---------|----------|

| `stepExecution` | `google/gemini-3-flash` | Executing individual steps |

| `userFlowLow` | `google/gemini-3-flash-preview` | User flow execution (low effort) |

| `userFlowHigh` | `google/gemini-3.1-pro-preview` | User flow execution (high effort) |

| `assertionPrimary` | `anthropic/claude-4.5-haiku` | Primary assertion model (Claude) |

| `assertionSecondary` | `google/gemini-3-flash` | Secondary assertion model (Gemini) |

| `assertionArbiter` | `google/gemini-3.1-pro-preview` | Arbiter for assertion disagreements |

| `utility` | `google/gemini-2.5-flash` | Data extraction, wait conditions |

| `cua` | `gpt-5.5` | CUA mode — OpenAI Responses API with the built-in `computer` tool |

## Caching

Passmark caches successful step actions in Redis. On subsequent runs, cached steps execute directly without AI calls, dramatically reducing latency and cost.

Provide the connection via `configure({ redis: { url } })` or the `REDIS_URL` env var (configure value wins). Without either, caching, `{{global.*}}` placeholders, and project data are disabled.

- Steps are cached by `userFlow` + `step.description`

- Set `bypassCache: true` on individual steps or the entire run to force AI execution

- Cache is automatically bypassed on Playwright retries

- Caching only applies to `runSteps`. As of now, only those AI executions that are single-step are cached as multi-step actions can vary widely and are less likely to be identical on subsequent runs. We're exploring ways to safely cache multi-step flows.

## Telemetry

Telemetry is opt-in. Either set the `AXIOM_TOKEN` and `AXIOM_DATASET` env vars, or pass them through `configure()`:

```typescript

configure({

  telemetry: {

    axiomToken: process.env.MY_AXIOM_TOKEN,

    axiomDataset: "passmark-traces",

  },

});

```

`configure()` values take precedence over env vars. Without either, telemetry is a no-op. All AI calls are wrapped with `withSpan` for observability.

Configure Axiom to get a rich dashboard like this:

![Axiom Dashboard](https://res.cloudinary.com/dkanxf2cg/image/upload/v1774866500/axiom-logs_d4p7h9.png)

## Email Extraction

Configure an email provider for testing flows that involve email verification. By default, you can use the `emailsink` provider, which provides disposable email addresses and an API to fetch received emails. The free tier doesn't need any credentials, but for more reliability and flexible rate limits, you can sign up for an account and use your `EMAILSINK_API_KEY`. Reach out to us if you want to get an API key.

```typescript

import { configure } from "passmark";

import { emailsinkProvider } from "passmark/providers/emailsink";

configure({

  email: emailsinkProvider({ apiKey: process.env.EMAILSINK_API_KEY }),

});

```

Or implement a custom provider:

```typescript

configure({

  email: {

    domain: "my-test-mail.com",

    extractContent: async ({ email, prompt }) => {

      // Fetch and extract content from your email service

      return extractedValue;

    },

  },

});

```

Use in steps with the `{{email.*}}` placeholder pattern:

```typescript

{

  description: "Enter the verification code",

  data: { value: "{{email.otp:get the 6 digit verification code:{{run.dynamicEmail}}}}" }

}

```

## Placeholder System

Dynamic values can be injected into step data using placeholders:

| Pattern | Scope | Description |

|---------|-------|-------------|

| `{{run.email}}` | Single test | Random email (faker) |

| `{{run.dynamicEmail}}` | Single test | Email using configured domain |

| `{{run.fullName}}` | Single test | Random full name |

| `{{run.shortid}}` | Single test | Short unique ID |

| `{{run.phoneNumber}}` | Single test | Random phone number |

| `{{global.email}}` | All tests in an execution | Shared across runSteps calls with same `executionId` |

| `{{global.dynamicEmail}}` | All tests in an execution | Shared dynamic email |

| `{{data.key}}` | Per project | Stored in Redis, managed via project settings |

| `{{email.type:prompt}}` | Resolved lazily | Extract content from received email |

## Architecture Overview

```

Step Request

    |

    v

[Cache Check] --hit--> [Execute Cached Action] --success--> Done

    |                          |

    miss                     fail (auto-heal)

    |                          |

    v                          v

[AI Execution] ---------> [Cache Result]

    |

    v

[Assertions] (Claude + Gemini consensus)

```

## Known Limitations

- Tests are not comprehensive at the moment. We welcome contributions to expand test coverage, especially around edge cases and failure modes.

## Contributing

See [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup, code style, and PR workflow.

## License

[FSL-1.1-Apache-2.0](./LICENSE.md) - Functional Source License, Version 1.1, with Apache 2.0 future license.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bug0inc/passmark

Awesome Lists containing this project

README

The open-source Playwright library for AI regression testing.