https://github.com/alpic-ai/mcp-eval

Evaluation tool for MCP servers
https://github.com/alpic-ai/mcp-eval

Last synced: 3 months ago
JSON representation

Evaluation tool for MCP servers

Host: GitHub
URL: https://github.com/alpic-ai/mcp-eval
Owner: alpic-ai
Created: 2025-09-02T09:28:08.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-09-29T11:31:40.000Z (3 months ago)
Last Synced: 2025-09-29T13:25:08.614Z (3 months ago)
Language: TypeScript
Size: 193 KB
Stars: 9
Watchers: 0
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-ccamel - alpic-ai/mcp-eval - Evaluation tool for MCP servers (TypeScript)

README

mcp-eval

A CLI to evaluate MCP servers performance

[![oclif](https://img.shields.io/badge/cli-oclif-brightgreen.svg)](https://oclif.io)
[![Version](https://img.shields.io/npm/v/@alpic-ai/mcp-eval.svg)](https://npmjs.org/package/@alpic-ai/mcp-eval)
[![Downloads/week](https://img.shields.io/npm/dw/@alpic-ai/mcp-eval.svg)](https://npmjs.org/package/@alpic-ai/mcp-eval)

* [Quick start](#quick-start)
* [Requirements](#requirements)
* [Usage](#usage)
* [Commands](#commands)
* [Test Suite Syntax](#test-suite-syntax)

# Quick start

- Export your Openrouter API key as `OPENROUTER_API_KEY` environment variable``

```
$ export OPENROUTER_API_KEY=
```

- Write your `myserver.yml` test case

```yml
test_cases:
- name: "Open a contribution PR on Github"
input_prompt: "I'd like to contribute to mcp-eval. I want to enable ... feature. I'll let you go ahead and implement the feature as you see fit. Open a pull request with the proposed modification once you're done."
expected_tool_call:
tool_name: "open-pr"
parameters:
branch: "new-feature"
```

- Run your test suite

```
$ npx -y @alpic-ai/mcp-eval@latest run --url=https://mcp.github.com ./myserver.yml
```

- Et voilà 🎉!

# Requirements

- Nodejs >= 22
- StreamableHTTP or SSE compatible public MCP server

# Usage

```sh-session
$ npm install -g @alpic-ai/mcp-eval
$ mcp-eval COMMAND
running command...
$ mcp-eval (--version)
@alpic-ai/mcp-eval/0.8.0 darwin-arm64 node-v22.17.1
$ mcp-eval --help [COMMAND]
USAGE
$ mcp-eval COMMAND
...
```

# Commands

* [`mcp-eval run TESTFILE`](#mcp-eval-run-testfile)

## `mcp-eval run TESTFILE`

Run the test suite described in the provided YAML file.

```
USAGE
$ mcp-eval run TESTFILE -u [-a anthropic/claude]

ARGUMENTS
TESTFILE YAML file path containing the test suite

FLAGS
-a, --assistant= [default: anthropic/claude] Assistant configuration to use (impact model and system prompt)

-u, --url= (required) URL of the MCP server

DESCRIPTION
Run the test suite described in the provided YAML file.

EXAMPLES
$ mcp-eval run
```

_See code: [src/commands/run.ts](https://github.com/alpic-ai/mcp-eval/blob/v0.8.0/src/commands/run.ts)_

# Test Suite Syntax

Test suite should be written in YAML.
A test suite file should have a root `test_cases` property with at least one test.

Each test requires:

- `name`: a convenient name for your test
- `input_conversation`: the conversation to send to the assistant. This can be either a single user message or a multi-turn conversation containing both assistant and user messages. It can contain tool calls that already happened during the model thinking process.
- `expected_tool_call`: an object detailing the expected tool to be called with:
- `tool_name`: the name as advertized by the MCP server of the tool to be called
- `parameters`: the expected set of parameters the tool is expected to be called with. Only these specified properties will be checked during comparison with the actual tool call. Extra properties set by the model will not cause the test to fail.

## Simple user message example

```yml
test_cases:
- name: "Find flights from Paris to Tokyo"
input_prompt: "I'd like to plan a trip to Tokyo, Japan. Find me a flight from Paris to Tokyo on October 3rd and returning on October 5th."
expected_tool_call:
tool_name: "search-flight"
parameters:
flyFrom: Paris
flyTo: Tokyo
departureDate: 03/10/2025
returnDate: 05/10/2025
```

## Multi-turn conversation example

```yml
test_cases:
- name: "Create issue in frontend team for login bug"
input_conversation:
- role: user
content: "I'm seeing a bug where the login button doesn't work. Can you create an issue for this?"
- role: assistant
content: "Sure, first let me check which team to assign the issue to. Listing your teams now."
- role: tool
tool_name: list_teams
parameters: {}
response: |
[
{"id": "team_123", "name": "Frontend"},
{"id": "team_456", "name": "Backend"}
]
- role: assistant
content: "Now that I see the available teams, I'll assign the issue to the Frontend team."
expected_tool_call:
tool_name: "create_issue"
parameters:
title: "Login button doesn't work"
description: "User reports that the login button is not functioning."
team_id: "team_123"
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alpic-ai/mcp-eval

Awesome Lists containing this project

README

mcp-eval