{"id":47282079,"url":"https://github.com/iris-eval/mcp-server","last_synced_at":"2026-05-07T03:04:29.025Z","repository":{"id":344340998,"uuid":"1181452010","full_name":"iris-eval/mcp-server","owner":"iris-eval","description":"The agent eval standard for MCP — score output quality, catch safety failures, enforce cost budgets","archived":false,"fork":false,"pushed_at":"2026-03-29T04:56:44.000Z","size":2548,"stargazers_count":5,"open_issues_count":10,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-29T06:30:04.974Z","etag":null,"topics":["agent-evaluation","ai-agent","claude","eval","evaluation","llm","mcp","mcp-server","model-context-protocol","observability","security","tracing"],"latest_commit_sha":null,"homepage":"https://iris-eval.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iris-eval.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":".github/CLA.md"},"funding":{"custom":["https://iris-eval.com#waitlist"]}},"created_at":"2026-03-14T06:39:43.000Z","updated_at":"2026-03-29T04:56:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/iris-eval/mcp-server","commit_stats":null,"previous_names":["iris-eval/mcp-server"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/iris-eval/mcp-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iris-eval%2Fmcp-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iris-eval%2Fmcp-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iris-eval%2Fmcp-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iris-eval%2Fmcp-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iris-eval","download_url":"https://codeload.github.com/iris-eval/mcp-server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iris-eval%2Fmcp-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31307372,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-evaluation","ai-agent","claude","eval","evaluation","llm","mcp","mcp-server","model-context-protocol","observability","security","tracing"],"created_at":"2026-03-16T01:58:53.282Z","updated_at":"2026-05-07T03:04:29.015Z","avatar_url":"https://github.com/iris-eval.png","language":"TypeScript","funding_links":["https://iris-eval.com#waitlist"],"categories":["Testing Tools"],"sub_categories":["Common Lisp"],"readme":"# Iris — The Agent Eval Standard for MCP\n\n[![Glama Score](https://glama.ai/mcp/servers/iris-eval/mcp-server/badges/score.svg)](https://glama.ai/mcp/servers/iris-eval/mcp-server)\n[![Install in Cursor](https://cursor.com/deeplink/mcp-install-dark.svg)](cursor://anysphere.cursor-deeplink/mcp/install?name=server\u0026config=eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBpcmlzLWV2YWwvbWNwLXNlcnZlciJdLCJlbnYiOnsiSVJJU19MT0dfTEVWRUwiOiJpbmZvIn19)\n[![npm version](https://img.shields.io/npm/v/@iris-eval/mcp-server)](https://npmjs.com/package/@iris-eval/mcp-server)\n[![npm downloads](https://img.shields.io/npm/dt/@iris-eval/mcp-server)](https://npmjs.com/package/@iris-eval/mcp-server)\n[![GitHub stars](https://img.shields.io/github/stars/iris-eval/mcp-server?style=social)](https://github.com/iris-eval/mcp-server)\n[![CI](https://github.com/iris-eval/mcp-server/actions/workflows/ci.yml/badge.svg)](https://github.com/iris-eval/mcp-server/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n[![Docker](https://img.shields.io/badge/Docker-ghcr.io-blue?logo=docker)](https://github.com/iris-eval/mcp-server/pkgs/container/mcp-server)\n[![PulseMCP](https://img.shields.io/badge/PulseMCP-Listed-blue?style=flat-square)](https://www.pulsemcp.com/servers/iris-eval)\n[![mcp.so](https://img.shields.io/badge/mcp.so-Listed-blue?style=flat-square)](https://mcp.so/server/iris/iris-eval)\n\n**Know whether your AI agents are actually good enough to ship.** Iris is an open-source MCP server that scores output quality, catches safety failures, and enforces cost budgets across all your agents. Any MCP-compatible agent discovers and uses it automatically — no SDK, no code changes.\n\n![Iris Dashboard](https://raw.githubusercontent.com/iris-eval/mcp-server/main/docs/assets/dashboard-overview.png)\n\n## The Problem\n\nYour agents are running in production. Infrastructure monitoring sees `200 OK` and moves on. It has no idea the agent just:\n\n- Leaked a social security number in its response\n- Hallucinated an answer with zero factual grounding\n- Burned $0.47 on a single query — 4.7x your budget threshold\n- Made 6 tool calls when 2 would have sufficed\n\nIris evaluates all of it.\n\n## What You Get\n\n| | |\n|---|---|\n| **Trace Logging** | Hierarchical span trees with per-tool-call latency, token usage, and cost in USD. Stored in SQLite, queryable instantly. |\n| **Output Evaluation** | 13 built-in rules across 4 categories: completeness, relevance, safety, cost. PII detection (10 patterns: SSN, credit card, phone, email, IBAN, DOB, MRN, IP, API key, passport), prompt injection (13 patterns), stub-output detection, hallucination markers (17 hedging phrases + fabricated-citation heuristic). Add custom rules with Zod schemas. |\n| **Cost Visibility** | Aggregate cost across all agents over any time window. Set budget thresholds. Get flagged when agents overspend. |\n| **Web Dashboard** | Real-time dark-mode UI with trace visualization, eval results, and cost breakdowns. |\n\n**Requires Node.js 20 or later.** Check with `node --version`.\n\n## Quickstart\n\nAdd Iris to your MCP config. Works with Claude Desktop, Cursor, Windsurf, and any MCP-compatible agent.\n\n```json\n{\n  \"mcpServers\": {\n    \"iris-eval\": {\n      \"command\": \"npx\",\n      \"args\": [\"@iris-eval/mcp-server\"]\n    }\n  }\n}\n```\n\nThat's it. Your agent discovers Iris and starts logging traces automatically.\n\n### Turn on the dashboard\n\nIris ships with a real-time web dashboard showing traces, eval results, cost breakdowns, and rule pass-rates. It's off by default so the MCP server stays lightweight — flip it on with a flag.\n\n```json\n{\n  \"mcpServers\": {\n    \"iris-eval\": {\n      \"command\": \"npx\",\n      \"args\": [\"@iris-eval/mcp-server\", \"--dashboard\"]\n    }\n  }\n}\n```\n\nThen open **http://localhost:6920** after your agent runs a trace. The same dashboard is available via CLI:\n\n```bash\nnpx @iris-eval/mcp-server --dashboard\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eSetup by tool\u003c/strong\u003e\u003c/summary\u003e\n\n#### Claude Desktop\n\nEdit your MCP config file:\n- **macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`\n- **Windows:** `%APPDATA%\\Claude\\claude_desktop_config.json`\n\nAdd the JSON config above, then restart Claude Desktop.\n\n#### Claude Code\n\n```bash\nclaude mcp add --transport stdio iris-eval -- npx @iris-eval/mcp-server\n```\n\nThen restart the session (`/clear` or relaunch) for tools to load.\n\n\u003e **Windows note:** Do *not* use `cmd /c` wrapper — it causes path parsing issues. The `npx` command works directly.\n\n#### Cursor / Windsurf\n\nAdd to your workspace `.cursor/mcp.json` or global MCP settings using the JSON config above.\n\n\u003c/details\u003e\n\n### Other Install Methods\n\n```bash\n# Global install (recommended for persistent data and faster startup)\nnpm install -g @iris-eval/mcp-server\niris-mcp --dashboard\n\n# Docker\ndocker run -p 3000:3000 -v iris-data:/data ghcr.io/iris-eval/mcp-server\n```\n\n\u003e **Tip:** Global install (`npm install -g`) stores traces persistently at `~/.iris/iris.db`. With `npx`, traces persist in the same location, but startup is slower due to package resolution.\n\n## MCP Tools\n\nIris registers nine tools that any MCP-compatible agent can invoke — full rule + trace lifecycle + LLM-as-judge + semantic citation verification:\n\n- **`log_trace`** — Log an agent execution with spans, tool calls, token usage, and cost\n- **`evaluate_output`** — Score output quality against completeness, relevance, safety, and cost rules (heuristic, deterministic, free)\n- **`get_traces`** — Query stored traces with filtering, pagination, and time-range support\n- **`list_rules`** — Enumerate deployed custom eval rules (read-only)\n- **`deploy_rule`** — Register a new custom eval rule so it fires on every `evaluate_output` of that category\n- **`delete_rule`** — Remove a deployed custom rule (destructive, idempotent)\n- **`delete_trace`** — Remove a single stored trace by ID (destructive, tenant-scoped)\n- **`evaluate_with_llm_judge`** — Semantic eval via LLM (Anthropic or OpenAI). Five templates: accuracy, helpfulness, safety, correctness, faithfulness. Cost-capped, per-eval pricing disclosed. **Bring your own API key** (`IRIS_ANTHROPIC_API_KEY` or `IRIS_OPENAI_API_KEY`) — Iris doesn't proxy or relay LLM calls.\n- **`verify_citations`** — Extract citations from output (numbered, author-year, URLs, DOIs), fetch sources behind an SSRF-guarded + domain-allowlisted resolver, and use an LLM judge to check whether each source actually supports the cited claim. Opt-in outbound HTTP. Same BYOK requirement as `evaluate_with_llm_judge`.\n\nWhen `IRIS_OTEL_ENDPOINT` is configured, `log_trace` calls also emit a best-effort OTLP/HTTP JSON export to any OpenTelemetry collector (Jaeger, Grafana Tempo, Datadog OTLP, Honeycomb, etc). See [docs/otel-integration.md](docs/otel-integration.md).\n\nFull tool schemas and configuration: [iris-eval.com](https://iris-eval.com)\n\n## Cloud Tier (Coming Soon)\n\nSelf-hosted Iris runs on your machine with SQLite. As your team's eval needs grow, the cloud tier adds PostgreSQL, team dashboards, alerting on quality regressions, and managed infrastructure.\n\n[Join the waitlist](https://iris-eval.com#waitlist) to get early access.\n\n## Examples\n\n- [Claude Desktop setup](examples/claude-desktop/) — MCP config for stdio and HTTP modes\n- [TypeScript — MCP SDK client](examples/typescript/basic-usage.ts) — connect and invoke tools\n- [HTTP transport (TS + Python)](examples/http-transport/) — full client code for REST-style integration\n- [LangChain instrumentation (Python, conceptual)](examples/langchain/observe-agent.py) — scaffold showing the shape; needs your agent code to be runnable\n- [CrewAI instrumentation (Python, conceptual)](examples/crewai/observe-crew.py) — scaffold; same caveat\n\n## Community\n\n- [GitHub Issues](https://github.com/iris-eval/mcp-server/issues) — Bug reports and feature requests\n- [GitHub Discussions](https://github.com/iris-eval/mcp-server/discussions) — Questions and ideas\n- [Contributing Guide](CONTRIBUTING.md) — How to contribute\n- [Roadmap](docs/roadmap.md) — What's coming next\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eConfiguration \u0026 Security\u003c/strong\u003e\u003c/summary\u003e\n\n### CLI Arguments\n\n| Flag | Default | Description |\n|------|---------|-------------|\n| `--transport` | `stdio` | Transport type: `stdio` or `http` |\n| `--port` | `3000` | HTTP transport port |\n| `--db-path` | `~/.iris/iris.db` | SQLite database path |\n| `--config` | `~/.iris/config.json` | Config file path |\n| `--api-key` | — | API key for HTTP authentication |\n| `--dashboard` | `false` | Enable web dashboard |\n| `--dashboard-port` | `6920` | Dashboard port |\n\n### Environment Variables\n\n| Variable | Description |\n|----------|-------------|\n| `IRIS_TRANSPORT` | Transport type (`stdio` or `http`) |\n| `IRIS_PORT` | HTTP transport port |\n| `IRIS_HOST` | HTTP transport host (default `127.0.0.1`) |\n| `IRIS_DB_PATH` | SQLite database path |\n| `IRIS_LOG_LEVEL` | Log level: `debug`, `info`, `warn`, `error` |\n| `IRIS_DASHBOARD` | Enable web dashboard (`true`/`false`) |\n| `IRIS_DASHBOARD_PORT` | Dashboard port (default `6920`) |\n| `IRIS_API_KEY` | API key for HTTP authentication |\n| `IRIS_ALLOWED_ORIGINS` | Comma-separated allowed CORS origins |\n\nCLI flags take precedence over environment variables when both are set.\n\n### Security\n\nWhen using HTTP transport, Iris includes:\n\n- API key authentication with timing-safe comparison\n- CORS restricted to localhost by default\n- Rate limiting (100 req/min API, 20 req/min MCP)\n- Helmet security headers\n- Zod input validation on all routes\n- ReDoS-safe regex for custom eval rules\n- 1MB request body limits\n\n```bash\n# Production deployment\niris-mcp --transport http --port 3000 --api-key \"$(openssl rand -hex 32)\" --dashboard\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eTroubleshooting\u003c/strong\u003e\u003c/summary\u003e\n\n### Iris won't start / `ERR_MODULE_NOT_FOUND`\n\nYou may have a cached older version. Clear the npx cache and retry:\n\n```bash\nnpx --yes @iris-eval/mcp-server@latest\n```\n\nOr install globally to avoid cache issues entirely:\n\n```bash\nnpm install -g @iris-eval/mcp-server@latest\n```\n\n### Tools not showing up in Claude Code\n\nMCP tools only load at session start. After adding iris-eval, restart the session with `/clear` or relaunch the terminal.\n\n### Version check\n\nVerify which version is running:\n\n```bash\nnpx @iris-eval/mcp-server --help\n# Shows \"Iris — MCP-Native Agent Eval Server vX.Y.Z\"\n```\n\n### Updating\n\n```bash\n# If using npx (clears cache and fetches latest)\nnpx --yes @iris-eval/mcp-server@latest\n\n# If installed globally\nnpm update -g @iris-eval/mcp-server\n```\n\n### Node.js version\n\nIris requires Node.js 20 or later. Node 18 reached EOL in April 2025 and is not supported.\n\n```bash\nnode --version  # Must be v20.x or v22.x+\n```\n\n### Windows: `cmd /c` not needed\n\nClaude Code's `/doctor` may suggest wrapping npx with `cmd /c`. This is not needed and causes path parsing issues. Use `npx` directly:\n\n```bash\n# Correct\nclaude mcp add --transport stdio iris-eval -- npx @iris-eval/mcp-server\n\n# Wrong (causes /c to be parsed as a path)\nclaude mcp add --transport stdio iris-eval -- cmd /c \"npx @iris-eval/mcp-server\"\n```\n\n\u003c/details\u003e\n\n---\n\nIf Iris is useful to you, [consider starring the repo](https://github.com/iris-eval/mcp-server) — it helps others find it.\n\n[![Star on GitHub](https://img.shields.io/github/stars/iris-eval/mcp-server?style=social)](https://github.com/iris-eval/mcp-server)\n\nMIT Licensed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Firis-eval%2Fmcp-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Firis-eval%2Fmcp-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Firis-eval%2Fmcp-server/lists"}