{"id":48244024,"url":"https://github.com/mikiships/pytest-agentcontract","last_synced_at":"2026-04-04T20:23:14.519Z","repository":{"id":339248513,"uuid":"1160326954","full_name":"mikiships/pytest-agentcontract","owner":"mikiships","description":"Deterministic CI tests for LLM agent trajectories — record once, replay offline, assert contracts","archived":false,"fork":false,"pushed_at":"2026-02-18T18:43:22.000Z","size":516,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-18T22:33:57.595Z","etag":null,"topics":["agents","ci","llm","pytest","testing"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mikiships.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-17T20:08:58.000Z","updated_at":"2026-02-18T18:43:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mikiships/pytest-agentcontract","commit_stats":null,"previous_names":["mikiships/pytest-agentcontract"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/mikiships/pytest-agentcontract","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikiships%2Fpytest-agentcontract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikiships%2Fpytest-agentcontract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikiships%2Fpytest-agentcontract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikiships%2Fpytest-agentcontract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mikiships","download_url":"https://codeload.github.com/mikiships/pytest-agentcontract/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mikiships%2Fpytest-agentcontract/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31412680,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T20:09:54.854Z","status":"ssl_error","status_checked_at":"2026-04-04T20:09:44.350Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ci","llm","pytest","testing"],"created_at":"2026-04-04T20:23:13.852Z","updated_at":"2026-04-04T20:23:14.490Z","avatar_url":"https://github.com/mikiships.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pytest-agentcontract\n\n**Deterministic CI tests for LLM agent trajectories.** Record once, replay offline, assert contracts.\n\n[![PyPI](https://img.shields.io/pypi/v/pytest-agentcontract)](https://pypi.org/project/pytest-agentcontract/)\n[![CI](https://github.com/mikiships/pytest-agentcontract/actions/workflows/ci.yml/badge.svg)](https://github.com/mikiships/pytest-agentcontract/actions)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/demo.gif\" alt=\"pytest-agentcontract demo: record, replay, assert\" width=\"600\"\u003e\n\u003c/p\u003e\n\nYour agent calls `lookup_order`, then `check_eligibility`, then `process_refund`. Every time. That's the contract. Test it like any other interface.\n\n```bash\n# Record a trajectory (hits real APIs once)\npytest --ac-record\n\n# Replay in CI forever (no network, no API keys, no cost, deterministic)\npytest --ac-replay\n```\n\n```\ntests/scenarios/refund-eligible.agentrun.json\n├── turn 0: user → \"I want a refund for order 123\"\n├── turn 1: assistant → lookup_order(order_id=\"123\")\n├── turn 2: assistant → check_eligibility(order_id=\"123\")\n├── turn 3: assistant → process_refund(order_id=\"123\", amount=49.99)\n└── turn 4: assistant → \"Your refund of $49.99 has been processed.\"\n```\n\n## Install\n\n```bash\npip install pytest-agentcontract\n```\n\nWith auto-recording interceptors:\n```bash\npip install pytest-agentcontract[openai]      # OpenAI SDK\npip install pytest-agentcontract[anthropic]    # Anthropic SDK\npip install pytest-agentcontract[all]          # Everything\n```\n\nFramework adapters (LangGraph, LlamaIndex, OpenAI Agents SDK) are included -- no extras needed.\n\n## Quick Start\n\n### 1. Write a test\n\n```python\n@pytest.mark.agentcontract(\"refund-eligible\")\ndef test_refund_flow(ac_recorder, ac_mode, ac_replay_engine, ac_check_contract):\n    if ac_mode == \"record\":\n        # Runs your real agent, records the trajectory\n        run_my_agent(ac_recorder)\n    elif ac_mode == \"replay\":\n        # Replays from cassette -- no network, no tokens\n        result = ac_replay_engine.run()\n\n    contract = ac_check_contract(ac_recorder.run)\n    assert contract.passed, contract.failures()\n```\n\n### 2. Record once\n\n```bash\npytest --ac-record -k test_refund_flow\n# Creates tests/scenarios/refund-eligible.agentrun.json\n```\n\n### 3. Replay in CI\n\n```bash\npytest --ac-replay\n# Deterministic. No API keys. No flakes. Sub-second.\n```\n\n## SDK Auto-Recording\n\nIntercept real SDK calls instead of manually building turns:\n\n```python\nfrom agentcontract.recorder.interceptors import patch_openai\n\ndef test_with_real_agent(ac_recorder):\n    client = openai.OpenAI()\n    unpatch = patch_openai(client, ac_recorder)\n\n    # Every chat.completions.create call is recorded automatically\n    response = client.chat.completions.create(\n        model=\"gpt-4o\",\n        messages=[{\"role\": \"user\", \"content\": \"Refund order 123\"}],\n        tools=[...],\n    )\n    unpatch()\n```\n\nWorks with Anthropic too:\n```python\nfrom agentcontract.recorder.interceptors import patch_anthropic\n\nunpatch = patch_anthropic(client, ac_recorder)\n```\n\n## Framework Adapters\n\nDrop-in recording for popular agent frameworks:\n\n```python\n# LangGraph\nfrom agentcontract.adapters import record_graph\nunpatch = record_graph(compiled_graph, recorder)\nresult = compiled_graph.invoke({\"messages\": [(\"user\", \"I need a refund\")]})\nunpatch()\n\n# LlamaIndex\nfrom agentcontract.adapters import record_agent\nunpatch = record_agent(agent, recorder)\nresponse = agent.chat(\"What's the refund policy?\")\nunpatch()\n\n# OpenAI Agents SDK\nfrom agentcontract.adapters import record_runner\nunpatch = record_runner(recorder)\nresult = Runner.run_sync(agent, \"Help with billing\")\nunpatch()\n```\n\n## Configuration\n\n`agentcontract.yml` in your project root:\n\n```yaml\nversion: \"1\"\n\nscenarios:\n  include: [\"tests/scenarios/**/*.agentrun.json\"]\n\nreplay:\n  stub_tools: true\n\ndefaults:\n  assertions:\n    - type: contains\n      target: final_response\n      value: \"refund\"\n    - type: called_with\n      target: \"tool:process_refund\"\n      schema:\n        order_id: \"123\"\n\npolicies:\n  - name: allowed-tools\n    type: tool_allowlist\n    tools: [lookup_order, check_eligibility, process_refund]\n\n  - name: confirm-before-refund\n    type: requires_confirmation\n    tools: [process_refund]\n```\n\nGenerate a starter config:\n```bash\nagentcontract init\n```\n\n## Assertions\n\n| Type | What It Checks |\n|------|---------------|\n| `exact` | Exact string match |\n| `contains` | Substring present |\n| `regex` | Pattern match |\n| `json_schema` | JSON Schema validation on tool args/results |\n| `not_called` | Tool was NOT invoked |\n| `called_with` | Tool called with specific arguments |\n| `called_count` | Exact invocation count |\n\n## Policies\n\n| Policy | What It Enforces |\n|--------|-----------------|\n| `tool_allowlist` | Only listed tools may be called |\n| `requires_confirmation` | Protected tools must follow user confirmation |\n\n## Target Syntax\n\n- `final_response` -- last assistant message\n- `turn:N` -- specific turn by index\n- `full_conversation` -- all turns concatenated\n- `tool_call:function_name:arguments` -- tool call arguments\n- `tool_call:function_name:result` -- tool call result\n\n## CLI\n\n```bash\nagentcontract info cassette.agentrun.json       # Cassette summary\nagentcontract validate cassette.agentrun.json   # Structure check\nagentcontract init                               # Starter config\n```\n\n## Why Not VCR / pytest-recording?\n\nVCR records **HTTP requests**. This records **agent decisions**.\n\n- VCR: \"did the HTTP request match?\" -- brittle, breaks on any provider API change\n- agentcontract: \"did the agent call the right tools with the right args?\" -- tests actual behavior\n\nYour agent's contract is: given this input, it calls these tools in this order with these arguments. That's what you want to regression-test, not the HTTP layer underneath.\n\n## How It Works\n\n```\n┌─────────┐     ┌──────────┐     ┌───────────────┐\n│  pytest  │────▶│ Recorder │────▶│ .agentrun.json│\n│ --record │     │          │     │  (cassette)   │\n└─────────┘     └──────────┘     └───────┬───────┘\n                                         │\n┌─────────┐     ┌──────────┐             │\n│  pytest  │────▶│  Replay  │◀────────────┘\n│ --replay │     │  Engine  │\n└─────────┘     └────┬─────┘\n                     │\n                ┌────▼─────┐\n                │Assertion │──▶ pass / fail\n                │ Engine   │\n                └──────────┘\n```\n\n1. **Record**: Run your agent against real APIs. The recorder captures every turn, tool call, argument, and result into a `.agentrun.json` cassette.\n2. **Replay**: The replay engine feeds recorded tool results back. No network. No tokens. Deterministic.\n3. **Assert**: The assertion engine checks contracts -- tool sequences, argument schemas, response content, policies.\n\n## See Also\n\n- **[coderace](https://github.com/mikiships/coderace)** -- Race coding agents (Claude Code, Codex, Aider, Gemini CLI) against each other on real tasks in your repo. Automated scoring with tests, lint, time, and lines changed.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmikiships%2Fpytest-agentcontract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmikiships%2Fpytest-agentcontract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmikiships%2Fpytest-agentcontract/lists"}