{"id":43986670,"url":"https://github.com/sbroenne/pytest-aitest","last_synced_at":"2026-02-19T08:03:41.381Z","repository":{"id":335905921,"uuid":"1147114989","full_name":"sbroenne/pytest-aitest","owner":"sbroenne","description":"A pytest plugin for validating whether language models can actually understand and operate your interfaces: MCP servers, system prompts, agent skills and tools.","archived":false,"fork":false,"pushed_at":"2026-02-11T21:31:32.000Z","size":1929,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-12T04:46:33.606Z","etag":null,"topics":["agents","ai","llm","mcp","model-context-protocol","pytest","python","testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sbroenne.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-01T08:05:34.000Z","updated_at":"2026-02-11T21:29:58.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sbroenne/pytest-aitest","commit_stats":null,"previous_names":["sbroenne/pytest-aitest"],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/sbroenne/pytest-aitest","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbroenne%2Fpytest-aitest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbroenne%2Fpytest-aitest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbroenne%2Fpytest-aitest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbroenne%2Fpytest-aitest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sbroenne","download_url":"https://codeload.github.com/sbroenne/pytest-aitest/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sbroenne%2Fpytest-aitest/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29455087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-14T15:52:44.973Z","status":"ssl_error","status_checked_at":"2026-02-14T15:52:11.208Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","llm","mcp","model-context-protocol","pytest","python","testing"],"created_at":"2026-02-07T10:04:54.933Z","updated_at":"2026-02-14T20:12:31.748Z","avatar_url":"https://github.com/sbroenne.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pytest-aitest\n\n[![PyPI version](https://img.shields.io/pypi/v/pytest-aitest)](https://pypi.org/project/pytest-aitest/)\n[![Python versions](https://img.shields.io/pypi/pyversions/pytest-aitest)](https://pypi.org/project/pytest-aitest/)\n[![CI](https://github.com/sbroenne/pytest-aitest/actions/workflows/ci.yml/badge.svg)](https://github.com/sbroenne/pytest-aitest/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**Test your AI interfaces. AI analyzes your results.**\n\nA pytest plugin for validating whether language models can understand and operate your MCP servers, tools, prompts, and skills.\n\n## Why?\n\nYour MCP server passes all unit tests. Then an LLM tries to use it and picks the wrong tool, passes garbage parameters, or ignores your system prompt.\n\n**Because you tested the code, not the AI interface.** For LLMs, your API is tool descriptions, schemas, and prompts — not functions and types. Traditional tests can't validate them.\n\n## How It Works\n\nWrite tests as natural language prompts. An **Agent** bundles an LLM with your tools — you assert on what happened:\n\n```python\nfrom pytest_aitest import Agent, Provider, MCPServer\n\nasync def test_weather_query(aitest_run):\n    agent = Agent(\n        provider=Provider(model=\"azure/gpt-5-mini\"),\n        mcp_servers=[MCPServer(command=[\"python\", \"-m\", \"my_weather_server\"])],\n    )\n\n    result = await aitest_run(agent, \"What's the weather in Paris?\")\n\n    assert result.success\n    assert result.tool_was_called(\"get_weather\")\n```\n\nIf the test fails, your tool descriptions need work — not your code.\n\n## AI-Powered Reports\n\nAI analyzes your results and tells you **what to fix**: which model to deploy, how to improve tool descriptions, where to cut costs. [See a sample report →](https://sbroenne.github.io/pytest-aitest/reports/05_hero.html)\n\n\u003e **Deploy: gpt-5-mini** — Highest pass rate at ~4–6x lower cost than gpt-4.1. gpt-4.1 disqualified due to failed core transfer test and session-planning failure.\n\n## Quick Start\n\nInstall:\n\n```bash\nuv add pytest-aitest\n```\n\nConfigure in `pyproject.toml`:\n\n```toml\n[tool.pytest.ini_options]\naddopts = \"\"\"\n--aitest-summary-model=azure/gpt-5.2-chat\n\"\"\"\n```\n\nSet credentials and run:\n\n```bash\nexport AZURE_API_BASE=https://your-resource.openai.azure.com/\naz login\npytest tests/\n```\n\n## Features\n\n- **MCP Server Testing** — Real models against real tool interfaces\n- **CLI Server Testing** — Wrap CLIs as testable tool servers\n- **Agent Comparison** — Compare models, prompts, skills, and server versions\n- **Agent Leaderboard** — Auto-ranked by pass rate and cost\n- **Multi-Turn Sessions** — Test conversations that build on context\n- **AI Analysis** — Actionable feedback on tool descriptions, prompts, and costs\n- **100+ LLM Providers** — Any model via [LiteLLM](https://docs.litellm.ai/docs/providers) (Azure, OpenAI, Anthropic, Google, and more)\n- **Semantic Assertions** — AI judge via [pytest-llm-assert](https://github.com/sbroenne/pytest-llm-assert)\n\n## Documentation\n\n📚 **[Full Documentation](https://sbroenne.github.io/pytest-aitest/)**\n\n## Requirements\n\n- Python 3.11+\n- pytest 9.0+\n- An LLM provider (Azure, OpenAI, Anthropic, etc.)\n\n## Acknowledgments\n\nInspired by [agent-benchmark](https://github.com/mykhaliev/agent-benchmark).\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsbroenne%2Fpytest-aitest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsbroenne%2Fpytest-aitest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsbroenne%2Fpytest-aitest/lists"}