https://github.com/blackwell-systems/mcp-assert
Deterministic correctness testing for MCP servers. Assert your tools return the right results, not just any results. No LLM-as-judge.
https://github.com/blackwell-systems/mcp-assert
ai-agents assertions ci deterministic-testing developer-tools evals evaluation golang language-server-protocol mcp mcp-server model-context-protocol testing
Last synced: 24 days ago
JSON representation
Deterministic correctness testing for MCP servers. Assert your tools return the right results, not just any results. No LLM-as-judge.
- Host: GitHub
- URL: https://github.com/blackwell-systems/mcp-assert
- Owner: blackwell-systems
- License: mit
- Created: 2026-04-23T06:32:08.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-04-23T10:12:13.000Z (about 1 month ago)
- Last Synced: 2026-04-23T11:32:51.701Z (about 1 month ago)
- Topics: ai-agents, assertions, ci, deterministic-testing, developer-tools, evals, evaluation, golang, language-server-protocol, mcp, mcp-server, model-context-protocol, testing
- Language: Go
- Size: 52.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
**Test your MCP server against the real protocol. No mocks. No imports. No language lock-in.**
mcp-assert connects to your server exactly like Claude, Cursor, or any MCP client would: real stdio/SSE/HTTP transport, full initialize handshake, actual tool calls. It checks responses against expectations you define in YAML. If it passes mcp-assert, it works with every MCP client.
> [!WARNING]
> We scanned 58 MCP servers and found **32 real bugs** across 13 servers. The most common failure: tools crash instead of returning errors agents can recover from. See the [scorecard](https://blackwell-systems.github.io/mcp-assert/scorecard/).
```
Your YAML ──→ mcp-assert ──→ MCP Server
(inputs + assertions) (client) (any language)
│
Pass / Fail
```
### Your server can't tell the difference
mcp-assert speaks the full MCP protocol: initialize handshake, `tools/list` discovery, `tools/call` with real arguments. It finds bugs that unit tests miss because it tests over the wire, not in-process.
### Adopted in production
- **[Wyre Technology](https://github.com/wyre-technology)**: 25 MCP servers tested via shared baseline workflow using `mcp-assert-action`
- **[Ant Group (AntV)](https://github.com/antvis/mcp-server-chart)**: integrated into CI within 3 days of launch
- **[Vera](https://github.com/aallan/vera)**: recommended test harness on project roadmap ([#529](https://github.com/aallan/vera/issues/529))
- **Fix PRs merged**: Google, Grafana, LangChain, official MCP SDKs
The testing standard for MCP, like pytest for Python or Jest for JavaScript.
Add it to any MCP server project in one line:
```yaml
- uses: blackwell-systems/mcp-assert-action@v1
with:
suite: evals/
```
> [!NOTE]
> LLMs are for subjective outputs. Assertions are for deterministic ones. Most MCP tools are deterministic. mcp-assert covers them.
## Install
```bash
# npm (no Go required)
npx @blackwell-systems/mcp-assert
# pip (no Go required)
pip install mcp-assert
# Go
go install github.com/blackwell-systems/mcp-assert/cmd/mcp-assert@latest
# Homebrew
brew install blackwell-systems/tap/mcp-assert
# Docker
docker run blackwellsystems/mcp-assert audit --server "npx my-server"
# Snap (Linux)
sudo snap install mcp-assert --classic
# Scoop (Windows)
scoop bucket add blackwell-systems https://github.com/blackwell-systems/scoop-bucket
scoop install mcp-assert
# Winget (Windows)
winget install BlackwellSystems.mcp-assert
# curl | sh (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/blackwell-systems/mcp-assert/main/install.sh | sh
```
## Quick Start
### Audit any MCP server in seconds. No setup.
Point it at any server:
```bash
mcp-assert audit --server "npx my-mcp-server"
```
```
Server: my-server
Transport: stdio
Score: 83%
✓ read_query 1ms responds, returns content
✗ create_table 0ms internal error: panic: nil pointer...
✓ list_tables 1ms responds, returns content
3 tools tested, 2 healthy, 1 crashed
```
> [!TIP]
> The audit connects, discovers every tool via `tools/list`, calls each one with schema-generated inputs, and reports which tools crash vs. handle errors properly. No YAML needed. To go deeper, generate assertion files and customize them:
```bash
# Audit + generate starter YAML for CI
mcp-assert audit --server "npx my-mcp-server" --output evals/
# Edit the generated YAMLs: add expected content, setup steps, multi-step flows
# Run in CI with regression detection
mcp-assert ci --suite evals/ --threshold 95
```
### Write assertions from scratch
```bash
# Scaffold your first assertion
mcp-assert init evals # Or: init evals --server "my-server" for auto-generation
# Run it
mcp-assert run --suite evals/ --fixture evals/fixtures
```
See the [Getting Started guide](https://blackwell-systems.github.io/mcp-assert/getting-started/) for a full walkthrough.
### Already using Vitest, Jest, Bun, PHPUnit, or pytest?
```bash
# Vitest
npm install -D vitest-mcp-assert
```
```ts
// mcp.test.ts
import { describeMcpSuite } from 'vitest-mcp-assert'
describeMcpSuite('mcp server', 'evals/')
```
```bash
# pytest
pip install pytest-mcp-assert
pytest --mcp-suite evals/
```
> [!IMPORTANT]
> Same YAML files work across the CLI, Vitest, Jest, Bun, PHPUnit, pytest, and Go test. No migration needed. Write once, run anywhere.
## Everything you can do
| Command | What it does | Setup required |
|---------|-------------|----------------|
| `audit --server "..."` | Scan any server, classify every tool as healthy/crashed/timed out | None |
| `fuzz --server "..."` | Throw adversarial inputs at every tool, find crashes and hangs | None |
| `init --server "..."` | Generate a complete test suite from tools/list + capture snapshots | None |
| `run --suite evals/` | Run YAML assertions, report pass/fail | YAML files |
| `ci --suite evals/` | Run with thresholds, baselines, JUnit XML, GitHub Step Summary | YAML files |
| `coverage --suite evals/ --server "..."` | Report which tools have assertions and which don't | YAML files |
| `snapshot --suite evals/ --update` | Capture responses as golden files for regression detection | YAML files |
| `watch --suite evals/` | Re-run on YAML changes, show diffs when status flips | YAML files |
| `matrix --languages go:gopls,ts:tsserver` | Same suite across multiple language servers | YAML files |
| `intercept --server "..." --trajectory t.yaml` | Proxy between agent and server, capture live tool call trace | Trajectory YAML |
| `lint --server "..."` | Check tool schemas for missing descriptions, untyped params, and agent usability issues | None |
Start with `audit` (zero setup), then `fuzz` (adversarial testing), then `init` (generates everything), then customize the YAML for your specific assertions.
## Zero-Effort Coverage
```bash
# Generate stub assertions for every tool the server exposes
mcp-assert generate --server "my-mcp-server" --output evals/ --fixture ./fixtures
# Capture actual outputs as snapshots
mcp-assert snapshot --suite evals/ --server "my-mcp-server" --update
# Assert nothing changed
mcp-assert run --suite evals/ --server "my-mcp-server"
```
## How It Differs From LLM-as-Judge Frameworks
For deterministic tools, mcp-assert is the better fit. For subjective outputs, LLM-as-judge frameworks remain the right choice. Use both if your server mixes tool types.
| Dimension | LLM-as-judge eval frameworks | mcp-assert |
|---|---|---|
| Best for | Subjective outputs (prose, creative content) | Deterministic outputs (data, state, validation) |
| Grading | Language model scoring (flexible, costly) | Assertion-based (exact, free) |
| Speed | Seconds per test (LLM round-trip) | Milliseconds per test (no LLM) |
| CI cost | API calls on every run | Zero external dependencies |
| Reliability | Not measured | pass@k / pass^k per assertion |
| Regression | Not supported | Baseline comparison, fail on backslide |
| Multi-language | Not supported | Same assertion across N language servers |
## Why not just write tests?
You'd need MCP protocol bootstrapping, a server-agnostic runner (your Go tests can't test your TypeScript server), and eval features (regression detection, Docker isolation, JUnit output). mcp-assert handles all of that. One YAML file, any server, any language.
## CI Integration
Use the [mcp-assert GitHub Action](https://github.com/blackwell-systems/mcp-assert-action) for zero-setup CI:
```yaml
- uses: blackwell-systems/mcp-assert-action@v1
with:
suite: evals/
threshold: 95
```
Downloads the binary, runs assertions, uploads JUnit XML results, writes GitHub Step Summary. No Go toolchain required on your runners.
Or run directly:
```bash
mcp-assert ci --suite evals/ --threshold 95 --junit results.xml
```
See the [CI Integration guide](https://blackwell-systems.github.io/mcp-assert/ci-integration/) for JUnit XML, markdown summaries, badges, and regression detection.
## pytest Integration
Run mcp-assert assertions as pytest test items:
```bash
pip install pytest-mcp-assert
pytest --mcp-suite evals/
```
Each YAML file becomes a pytest Item with pass/fail/skip semantics. Configure via `pyproject.toml`:
```toml
[tool.pytest.ini_options]
mcp_suite = "evals/"
mcp_fixture = "fixtures/"
```
Then just run `pytest`. See `pytest-plugin/README.md` for all options.
## Vitest Integration
Run mcp-assert assertions as Vitest tests:
```bash
npm install -D vitest-mcp-assert
```
Auto-discover all YAML files in a directory:
```ts
// mcp.test.ts
import { describeMcpSuite } from 'vitest-mcp-assert'
describeMcpSuite('mcp server', 'evals/')
```
Or run individual assertions:
```ts
import { test } from 'vitest'
import { runMcpAssert } from 'vitest-mcp-assert'
test('echo tool', () => runMcpAssert('evals/echo.yaml'))
```
Same YAML files work across Vitest, pytest, and the CLI. See `vitest-plugin/README.md` for all options.
## Documentation
Full documentation is available at [blackwell-systems.github.io/mcp-assert](https://blackwell-systems.github.io/mcp-assert):
- [Getting Started](https://blackwell-systems.github.io/mcp-assert/getting-started/): install, scaffold, first run
- [Writing Assertions](https://blackwell-systems.github.io/mcp-assert/writing-assertions/): YAML format, all 18 assertion types + 4 trajectory types, 8 block types, 6 test framework plugins (pytest, Vitest, Jest, Bun, PHPUnit, Go test), setup steps, capture, fixtures
- [CLI Reference](https://blackwell-systems.github.io/mcp-assert/cli/): full command reference with flags and examples
- [Examples](https://blackwell-systems.github.io/mcp-assert/examples/): 65 example suites across 8 languages (606 assertions)
- [CI Integration](https://blackwell-systems.github.io/mcp-assert/ci-integration/): GitHub Action, JUnit XML, regression detection
- [Badge](https://blackwell-systems.github.io/mcp-assert/badge/): add the "Works with mcp-assert" badge to your server README
- [Architecture](https://blackwell-systems.github.io/mcp-assert/architecture/): internals and design decisions
- [Roadmap](https://blackwell-systems.github.io/mcp-assert/roadmap/): what's shipped and what's next
- [Scorecard](https://blackwell-systems.github.io/mcp-assert/scorecard/): 32 bugs found across 13 servers, 9 fix PRs submitted, 58 servers scanned
## License
MIT