https://github.com/dawiddiwad/checkmate

AI-Driven Test Automation Framework powered by OpenAI API & Playwright
https://github.com/dawiddiwad/checkmate

claude gemini grok groq low-code openai playwright salesforce test-automation

Last synced: 16 days ago
JSON representation

AI-Driven Test Automation Framework powered by OpenAI API & Playwright

Host: GitHub
URL: https://github.com/dawiddiwad/checkmate
Owner: dawiddiwad
License: mit
Created: 2025-11-16T19:19:36.000Z (7 months ago)
Default Branch: main
Last Pushed: 2026-05-23T18:55:33.000Z (23 days ago)
Last Synced: 2026-05-23T20:25:54.366Z (23 days ago)
Topics: claude, gemini, grok, groq, low-code, openai, playwright, salesforce, test-automation
Language: TypeScript
Homepage:
Size: 4.05 MB
Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Roadmap: docs/ROADMAP.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

          # **_checkmate_**

AI test automation that actually works. Write tests in plain English, without locators, and with less code.

![playwright](https://img.shields.io/badge/Playwright-1.60.0-blue.svg)

![typescript](https://img.shields.io/badge/TypeScript-5.9.3-blue.svg)

![nodejs](https://img.shields.io/badge/Node.js-LTS-green.svg)

![openai](https://img.shields.io/badge/OpenAI-API-yellow.svg)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

##

```typescript

await ai.run({

	action: `

		Navigate to google.com

		Type 'playwright test automation' in the search bar

		Press Enter key`,

	expect: `

		Search results contain the playwright.dev link`,

})

```

##

✅ **Zero Locators** - Write tests in plain English  

✅ **Any Provider** - Gemini, Claude, Groq, GPT, xAI, or local models  

✅ **Web & Salesforce** - Basic support out of the box  

✅ **Cost Optimized** - Built-in token management and budgeting  

✅ **Playwright Test** - Native reports, traces and debugging  

✅ **Fully Customizable** - Build your own [extensions](docs/EXTENSIONS.md) and tools



## Get Started in 5 Minutes

### Prerequisites

- Node.js [LTS](https://nodejs.org/en/download)

- OpenAI [API key](https://platform.openai.com/api-keys) or compatible provider [Groq](https://console.groq.com/keys) [Gemini](https://aistudio.google.com/app/api-keys) [xAI](https://x.ai/api) etc.

### 1. Install

```bash

npm install -D dotenv @playwright/test @xoxoai/checkmate

npx playwright install

```

### 2. Configure `.env`

_using [OpenAI API](https://platform.openai.com/settings/organization/api-keys) key and default settings:_

```bash

OPENAI_API_KEY=#your_api_key_here

```

_for other providers, set the base url and model:_

```bash

OPENAI_BASE_URL=https://api.groq.com/openai/v1

OPENAI_MODEL=openai/gpt-oss-20b

```

### 3. Scaffold Test Examples

```bash

npx checkmate create-examples

```

### 4. Run Tests

```bash

npm run test:web:example

```

### 5. View Report

```bash

npm run show:report

```

## Writing Tests

**_checkmate_** tests are written using natural language by specifying `action` and `expect`:

```typescript

import { test } from '@xoxoai/checkmate/playwright'

test.describe('multi-step : full AI mode', async () => {

	test('purchase flow', async ({ ai }) => {

		await test.step('Open Shop', async () => {

			await ai.run({

				action: `

				Navigate to https://my-shop.com`,

				expect: `

				My Shop home page is loaded`,

			})

		})

		await test.step('Select product', async () => {

			await ai.run({

				action: `

				Click 'Shop Now' on 'Men's Outerwear' category

				Click on the first Shell product in the list`,

				expect: `

				Product detail with title and price.`,

			})

		})

		await test.step('Cart and checkout', async () => {

			await ai.run({

				action: `

				Click 'Add to Cart'

				Click 'Checkout' in the 'Added to cart' dialog`,

				expect: `

				Checkout with Order Summary and totals`,

			})

		})

	})

})

```

That's it. No page objects, no selectors. No locators. Peace on Earth.

Tests are orchestrated by [playwright](https://playwright.dev/docs/test-configuration) [config](playwright.config.ts).

### API

Compose your own **_checkmate_** using [extensions](docs/EXTENSIONS.md):

```typescript

import { createRunner } from '@xoxoai/checkmate/core'

import { web } from '@xoxoai/checkmate/playwright'

import { notion, database, api } from 'my-custom-extensions'

const ai = createRunner({

	extensions: [web({ page }), notion(), database(), api()],

})

await ai.run({

	action: 'Open the pricing page',

	expect: 'Pricing details are visible',

})

```

### Entry Points:

`@xoxoai/checkmate/core`: compose runner, tools, and extensions.  

`@xoxoai/checkmate/playwright`: Web extension with Playwright `test` and `expect`.  

`@xoxoai/checkmate/salesforce`: Salesforce extensions with the same `ai` fixture shape.

See [guide](docs/GUIDE.md#best-practices) for tips on writing effective tests.

## Costs

They depend on the model, provider, test complexity, and number of steps.

Estimates for [gpt-oss-20b hosted on groq.com](https://console.groq.com/docs/model/openai/gpt-oss-20b):

- Simple test (~5 steps): ~$0.001 - $0.01

- Complex test (~20 steps): ~$0.01 - $0.05

- Full E2E suite (~50 complex tests): ~$1.00 - $2.00

**_checkmate_** includes built-in token usage [monitoring](docs/GUIDE.md#cost-management).

See [guide](docs/GUIDE.md#cost-management) for cost control and monitoring options.

## Common Issues

**AI makes incorrect decisions**

- Provide precise descriptions in `action` and focused assertions in `expect`

- Reference specific element and roles, for example: text, label, button, list, etc.

- Break complex workflows into single-action steps and use a step-by-step approach

**Tests loop during step execution**

- Increase `OPENAI_TEMPERATURE` to encourage exploration

- Use a reasoning model if possible to improve accuracy

**High token costs**

- Enable [snapshot filtering](docs/GUIDE.md#using-snapshot-filtering-for-token-optimization) with `CHECKMATE_SNAPSHOT_FILTERING=true` auto-filter elements

- Adjust reasoning effort: `OPENAI_REASONING_EFFORT`

- Consider disabling `OPENAI_INCLUDE_SCREENSHOT_IN_SNAPSHOT` if visuals are not needed

- Use a cheaper model, lower-end models often perform well: `gpt-5.4-nano` or `gpt-oss-20b`

See [guide](docs/GUIDE.md#openai-api-settings) for detailed configuration options and tips.

## FAQ

**Which models work best?**  

You can use any model that was trained for tool use.

Here are the best picks based on extensive testing:

- Highly recommended: [`gpt-oss-20b` hosted on groq.com](https://console.groq.com/docs/model/openai/gpt-oss-20b). Groq's infrastructure is optimized for minimal latency and fast inference, making it ideal for E2E test automation.

- Google's `gemini-2.5-flash` offers an excellent balance of cost and performance if you prefer major cloud providers.

- OpenAI's `gpt-5-mini`, `gpt-5.4-nano` and xAI's `grok-4-1-fast-reasoning` also work well and keep costs relatively low.

**Can I use local models?**  

Yes - **_checkmate_** works with any OpenAI‑compatible API, including local models via LM Studio, Ollama, or llama.cpp. I recommend [qwen3.5-4b](https://huggingface.co/Qwen/Qwen3.5-4B). It is fast (≈100 tokens/sec on an RTX 3060 Ti; ≈40 tokens/sec on Apple M3) and performs surprisingly well for E2E testing.

**Does it work with CI/CD?**  

Absolutely. Use **_checkmate_** as part of your existing [Playwright Test suites in any CI/CD pipeline](https://playwright.dev/docs/best-practices#run-tests-on-ci). You can mix AI‑driven steps and traditional tests as needed.

**Is this production-ready?**  

It depends. If you can accept some non‑deterministic behavior and leverage LLMs' randomness to help address the [pesticide paradox](https://medium.com/@suwekasansiluni/the-pesticide-paradox-what-farming-teaches-us-about-software-testing-ab5d625d4de1), **_checkmate_** can be production-ready. In many cases, the maintenance savings, faster development, and benefits of non‑linear execution outweigh occasional hiccups.

If you require 100% deterministic tests at all times, traditional Playwright remains the better choice.

**Best part?**  

You can mix both approaches within the same test suite, combining AI‑driven and traditional tests as needed:

```typescript

// traditional playwright actions:

await page.goto('https://www.google.com')

const searchBox = page.getByRole('combobox', { name: 'Search', exact: true })

await searchBox.fill('playwright test automation')

await searchBox.press('Enter')

// ai-driven actions and assertions:

await ai.run({

	action: 'Click on the link that leads to playwright.dev',

	expect: 'The playwright.dev homepage is displayed',

})

```

## Documentation

- [**_checkmate_** guide](docs/GUIDE.md)

- [**_checkmate_** extensions](docs/EXTENSIONS.md)

- [**playwright** official website](https://playwright.dev/)

## Contributing

I'd love your help! Key areas:

- Additional tool integrations (API testing, Salesforce, etc.)

- Further cost optimization techniques

- Context and prompt engineering improvements

- Error handling and recovery

See [roadmap](docs/ROADMAP.md) for future plans and development

## License

MIT [license](LICENSE)

## Why I build this?

Test automation shouldn't require a PhD in XPath. This project explores how AI can make it accessible to anyone.

Less coding, more testing.

Built with ❤️ by [Dawid Dobrowolski](https://github.com/dawiddiwad)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dawiddiwad/checkmate

Awesome Lists containing this project

README