{"id":28415589,"url":"https://github.com/langwatch/scenario","last_synced_at":"2026-03-08T14:03:39.079Z","repository":{"id":288080789,"uuid":"960490922","full_name":"langwatch/scenario","owner":"langwatch","description":"Agentic testing for agentic codebases","archived":false,"fork":false,"pushed_at":"2026-02-06T08:42:14.000Z","size":12017,"stargazers_count":717,"open_issues_count":35,"forks_count":46,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-02-06T16:36:40.475Z","etag":null,"topics":["agent-simulations","agent-testing","ai-testing","javascript-library","python-library"],"latest_commit_sha":null,"homepage":"https://scenario.langwatch.ai","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/langwatch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-04-04T14:23:33.000Z","updated_at":"2026-02-06T08:35:57.000Z","dependencies_parsed_at":"2025-04-15T13:36:12.344Z","dependency_job_id":"58ce5387-067f-4dd3-9d67-6918dbc82205","html_url":"https://github.com/langwatch/scenario","commit_stats":null,"previous_names":["langwatch/scenario"],"tags_count":45,"template":false,"template_full_name":null,"purl":"pkg:github/langwatch/scenario","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langwatch%2Fscenario","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langwatch%2Fscenario/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langwatch%2Fscenario/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langwatch%2Fscenario/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/langwatch","download_url":"https://codeload.github.com/langwatch/scenario/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/langwatch%2Fscenario/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29310255,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-10T17:48:59.043Z","status":"ssl_error","status_checked_at":"2026-02-10T17:45:37.240Z","response_time":65,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-simulations","agent-testing","ai-testing","javascript-library","python-library"],"created_at":"2025-06-03T17:14:49.182Z","updated_at":"2026-03-08T14:03:39.060Z","avatar_url":"https://github.com/langwatch.png","language":"TypeScript","funding_links":[],"categories":["TypeScript","Python"],"sub_categories":[],"readme":"![scenario](https://github.com/langwatch/scenario/raw/main/assets/scenario-wide.webp)\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://discord.gg/kT4PhDS2gH\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/discord/1227886780536324106?logo=discord\u0026labelColor=%20%235462eb\u0026logoColor=%20%23f5f5f5\u0026color=%20%235462eb\" alt=\"chat on Discord\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://pypi.python.org/pypi/langwatch-scenario\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/pypi/dm/langwatch-scenario?logo=python\u0026logoColor=white\u0026label=pypi%20langwatch-scenario\u0026color=blue\" alt=\"Scenario Python package on PyPi\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://www.npmjs.com/package/@langwatch/scenario\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/npm/dm/@langwatch/scenario?logo=npm\u0026logoColor=white\u0026label=npm%20@langwatch/scenario\u0026color=blue\" alt=\"Scenario JavaScript package on npm\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/langwatch/scenario/actions/workflows/python-ci.yml\"\u003e\u003cimg src=\"https://github.com/langwatch/scenario/actions/workflows/python-ci.yml/badge.svg\" alt=\"Python Tests\" /\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/langwatch/scenario/actions/workflows/javascript-ci.yml\"\u003e\u003cimg src=\"https://github.com/langwatch/scenario/actions/workflows/javascript-ci.yml/badge.svg\" alt=\"JavaScript Tests\" /\u003e\u003c/a\u003e\n    \u003ca href=\"https://twitter.com/intent/follow?screen_name=langwatchai\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/twitter/follow/langwatchai?logo=X\u0026color=%20%23f5f5f5\" alt=\"follow on X(Twitter)\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n# Scenario\n\nScenario is an Agent Testing Framework based on simulations, it can:\n\n- Test real agent behavior by simulating users in different scenarios and edge cases\n- Evaluate and judge at any point of the conversation, powerful multi-turn control\n- Combine it with any LLM eval framework or custom evals, agnostic by design\n- Integrate your Agent by implementing just one [`call()`](https://scenario.langwatch.ai/agent-integration) method\n- Available in Python, TypeScript and Go\n\n📖 [Documentation](https://scenario.langwatch.ai)\\\n📺 [Watch Video Tutorial](https://www.youtube.com/watch?v=f8NLpkY0Av4)\n\n## Example\n\nThis is how a simulation with tool check looks like with Scenario:\n\n```python\n# Define any custom assertions\ndef check_for_weather_tool_call(state: scenario.ScenarioState):\n    assert state.has_tool_call(\"get_current_weather\")\n\nresult = await scenario.run(\n    name=\"checking the weather\",\n\n    # Define the prompt to guide the simulation\n    description=\"\"\"\n        The user is planning a boat trip from Barcelona to Rome,\n        and is wondering what the weather will be like.\n    \"\"\",\n\n    # Define the agents that will play this simulation\n    agents=[\n        WeatherAgent(),\n        scenario.UserSimulatorAgent(model=\"openai/gpt-4.1-mini\"),\n    ],\n\n    # (Optional) Control the simulation\n    script=[\n        scenario.user(), # let the user simulator generate a user message\n        scenario.agent(), # agent responds\n        check_for_weather_tool_call, # check for tool call after the first agent response\n        scenario.succeed(), # simulation ends successfully\n    ],\n)\n\nassert result.success\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eTypeScript Example\u003c/strong\u003e\u003c/summary\u003e\n\n```typescript\nconst result = await scenario.run({\n  name: \"vegetarian recipe agent\",\n\n  // Define the prompt to guide the simulation\n  description: `\n    The user is planning a boat trip from Barcelona to Rome,\n    and is wondering what the weather will be like.\n  `,\n\n  // Define the agents that will play this simulation\n  agents: [new MyAgent(), scenario.userSimulatorAgent()],\n\n  // (Optional) Control the simulation\n  script: [\n    scenario.user(), // let the user simulator generate a user message\n    scenario.agent(), // agent responds\n    // check for tool call after the first agent response\n    (state) =\u003e expect(state.has_tool_call(\"get_current_weather\")).toBe(true),\n    scenario.succeed(), // simulation ends successfully\n  ],\n});\n```\n\n\u003c/details\u003e\n\n\u003e [!NOTE]\n\u003e Check out full examples in the [python/examples folder](./python/examples/). or the [typescript/examples folder](./typescript/examples/).\n\n## Quick Start\n\nInstall scenario and a test runner:\n\n```bash\n# on python\nuv add langwatch-scenario pytest\n\n# or on typescript\npnpm install @langwatch/scenario vitest\n```\n\nNow create your first scenario, copy the full working example below.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eQuick Start - Python\u003c/strong\u003e\u003c/summary\u003e\n\nSave it as `tests/test_vegetarian_recipe_agent.py`:\n\n```python\nimport pytest\nimport scenario\nimport litellm\n\nscenario.configure(default_model=\"openai/gpt-4.1-mini\")\n\n\n@pytest.mark.agent_test\n@pytest.mark.asyncio\nasync def test_vegetarian_recipe_agent():\n    class Agent(scenario.AgentAdapter):\n        async def call(self, input: scenario.AgentInput) -\u003e scenario.AgentReturnTypes:\n            return vegetarian_recipe_agent(input.messages)\n\n    # Run a simulation scenario\n    result = await scenario.run(\n        name=\"dinner idea\",\n        description=\"\"\"\n            It's saturday evening, the user is very hungry and tired,\n            but have no money to order out, so they are looking for a recipe.\n        \"\"\",\n        agents=[\n            Agent(),\n            scenario.UserSimulatorAgent(),\n            scenario.JudgeAgent(\n                criteria=[\n                    \"Agent should not ask more than two follow-up questions\",\n                    \"Agent should generate a recipe\",\n                    \"Recipe should include a list of ingredients\",\n                    \"Recipe should include step-by-step cooking instructions\",\n                    \"Recipe should be vegetarian and not include any sort of meat\",\n                ]\n            ),\n        ],\n        set_id=\"python-examples\",\n    )\n\n    # Assert for pytest to know whether the test passed\n    assert result.success\n\n\n# Example agent implementation\nimport litellm\n\n\n@scenario.cache()\ndef vegetarian_recipe_agent(messages) -\u003e scenario.AgentReturnTypes:\n    response = litellm.completion(\n        model=\"openai/gpt-4.1-mini\",\n        messages=[\n            {\n                \"role\": \"system\",\n                \"content\": \"\"\"\n                    You are a vegetarian recipe agent.\n                    Given the user request, ask AT MOST ONE follow-up question,\n                    then provide a complete recipe. Keep your responses concise and focused.\n                \"\"\",\n            },\n            *messages,\n        ],\n    )\n\n    return response.choices[0].message  # type: ignore\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eQuick Start - TypeScript\u003c/strong\u003e\u003c/summary\u003e\n\nSave it as `tests/vegetarian-recipe-agent.test.ts`:\n\n```typescript\nimport scenario, { type AgentAdapter, AgentRole } from \"@langwatch/scenario\";\nimport { openai } from \"@ai-sdk/openai\";\nimport { generateText } from \"ai\";\nimport { describe, it, expect } from \"vitest\";\n\ndescribe(\"Vegetarian Recipe Agent\", () =\u003e {\n  const agent: AgentAdapter = {\n    role: AgentRole.AGENT,\n    call: async (input) =\u003e {\n      const response = await generateText({\n        model: openai(\"gpt-4.1-mini\"),\n        messages: [\n          {\n            role: \"system\",\n            content: `You are a vegetarian recipe agent.\\nGiven the user request, ask AT MOST ONE follow-up question, then provide a complete recipe. Keep your responses concise and focused.`,\n          },\n          ...input.messages,\n        ],\n      });\n      return response.text;\n    },\n  };\n\n  it(\"should generate a vegetarian recipe for a hungry and tired user on a Saturday evening\", async () =\u003e {\n    const result = await scenario.run({\n      name: \"dinner idea\",\n      description: `It's saturday evening, the user is very hungry and tired, but have no money to order out, so they are looking for a recipe.`,\n      agents: [\n        agent,\n        scenario.userSimulatorAgent(),\n        scenario.judgeAgent({\n          model: openai(\"gpt-4.1-mini\"),\n          criteria: [\n            \"Agent should not ask more than two follow-up questions\",\n            \"Agent should generate a recipe\",\n            \"Recipe should include a list of ingredients\",\n            \"Recipe should include step-by-step cooking instructions\",\n            \"Recipe should be vegetarian and not include any sort of meat\",\n          ],\n        }),\n      ],\n      setId: \"javascript-examples\",\n    });\n    expect(result.success).toBe(true);\n  });\n});\n```\n\n\u003c/details\u003e\n\nExport your OpenAI API key:\n\n```bash\nOPENAI_API_KEY=\u003cyour-api-key\u003e\n```\n\nNow run it the test:\n\n```bash\n# on python\npytest -s tests/test_vegetarian_recipe_agent.py\n\n# on typescript\nnpx vitest run tests/vegetarian-recipe-agent.test.ts\n```\n\nThis is how it will look like:\n\n[![asciicast](https://github.com/langwatch/scenario/raw/main/assets/ascii-cinema.svg)](https://asciinema.org/a/nvO5GWGzqKTTCd8gtNSezQw11)\n\nYou can find the same code example in [python/examples/](python/examples/test_vegetarian_recipe_agent.py) or [javascript/examples/](javascript/examples/vitest/tests/vegetarian-recipe-agent.test.ts).\n\nNow check out the [full documentation](https://scenario.langwatch.ai) to learn more and next steps.\n\n## Simulation on Autopilot\n\nBy providing a User Simulator Agent and a description of the Scenario without a script, the simulated user will automatically generate messages to the agent until the scenario is successful or the maximum number of turns is reached.\n\nYou can then use a Judge Agent to evaluate the scenario in real-time given certain criteria, at every turn, the Judge Agent will decide if it should let the simulation proceed or end it with a verdict.\n\nFor example, here is a scenario that tests a vibe coding assistant:\n\n```python\nresult = await scenario.run(\n    name=\"dog walking startup landing page\",\n    description=\"\"\"\n        the user wants to create a new landing page for their dog walking startup\n\n        send the first message to generate the landing page, then a single follow up request to extend it, then give your final verdict\n    \"\"\",\n    agents=[\n        LovableAgentAdapter(template_path=template_path),\n        scenario.UserSimulatorAgent(),\n        scenario.JudgeAgent(\n            criteria=[\n                \"agent reads the files before go and making changes\",\n                \"agent modified the index.css file, not only the Index.tsx file\",\n                \"agent created a comprehensive landing page\",\n                \"agent extended the landing page with a new section\",\n                \"agent should NOT say it can't read the file\",\n                \"agent should NOT produce incomplete code or be too lazy to finish\",\n            ],\n        ),\n    ],\n    max_turns=5, # optional\n)\n```\n\nCheck out the fully working Lovable Clone example in [examples/test_lovable_clone.py](examples/test_lovable_clone.py).\n\nYou can also combine it with a partial script too! By for example controlling only the beginning of the conversation, and let the rest proceed on autopilot, see the next section.\n\n## Full Control of the Conversation\n\nYou can specify a script for guiding the scenario by passing a list of steps to the `script` field, those steps are simply arbitrary functions that take the current state of the scenario as an argument, so you can do things like:\n\n- Control what the user says, or let it be generated automatically\n- Control what the agent says, or let it be generated automatically\n- Add custom assertions, for example making sure a tool was called\n- Add a custom evaluation, from an external library\n- Let the simulation proceed for a certain number of turns, and evaluate at each new turn\n- Trigger the judge agent to decide on a verdict\n- Add arbitrary messages like mock tool calls in the middle of the conversation\n\nEverything is possible, using the same simple structure:\n\n```python\n@pytest.mark.agent_test\n@pytest.mark.asyncio\nasync def test_early_assumption_bias():\n    result = await scenario.run(\n        name=\"early assumption bias\",\n        description=\"\"\"\n            The agent makes false assumption that the user is talking about an ATM bank, and user corrects it that they actually mean river banks\n        \"\"\",\n        agents=[\n            Agent(),\n            scenario.UserSimulatorAgent(),\n            scenario.JudgeAgent(\n                criteria=[\n                    \"user should get good recommendations on river crossing\",\n                    \"agent should NOT keep following up about ATM recommendation after user has corrected them that they are actually just hiking\",\n                ],\n            ),\n        ],\n        max_turns=10,\n        script=[\n            # Define hardcoded messages\n            scenario.agent(\"Hello, how can I help you today?\"),\n            scenario.user(\"how do I safely approach a bank?\"),\n\n            # Or let it be generated automatically\n            scenario.agent(),\n\n            # Add custom assertions, for example making sure a tool was called\n            check_if_tool_was_called,\n\n            # Generate a user follow-up message\n            scenario.user(),\n\n            # Let the simulation proceed for 2 more turns, print at every turn\n            scenario.proceed(\n                turns=2,\n                on_turn=lambda state: print(f\"Turn {state.current_turn}: {state.messages}\"),\n            ),\n\n            # Time to make a judgment call\n            scenario.judge(),\n        ],\n    )\n\n    assert result.success\n```\n\n## LangWatch Visualization\n\nSet your [LangWatch API key](https://app.langwatch.ai/) to visualize the scenarios in real-time, as they run, for a much better debugging experience and team collaboration:\n\n```bash\nLANGWATCH_API_KEY=\"your-api-key\"\n```\n\n![LangWatch Visualization](./assets/langwatch-visualization.webp)\n\n## Debug mode\n\nYou can enable debug mode by setting the `debug` field to `True` in the `Scenario.configure` method or in the specific scenario you are running, or by passing the `--debug` flag to pytest.\n\nDebug mode allows you to see the messages in slow motion step by step, and intervene with your own inputs to debug your agent from the middle of the conversation.\n\n```python\nscenario.configure(default_model=\"openai/gpt-4.1-mini\", debug=True)\n```\n\nor\n\n```bash\npytest -s tests/test_vegetarian_recipe_agent.py --debug\n```\n\n## Cache\n\nEach time the scenario runs, the testing agent might chose a different input to start, this is good to make sure it covers the variance of real users as well, however we understand that the non-deterministic nature of it might make it less repeatable, costly and harder to debug. To solve for it, you can use the `cache_key` field in the `Scenario.configure` method or in the specific scenario you are running, this will make the testing agent give the same input for given the same scenario:\n\n```python\nscenario.configure(default_model=\"openai/gpt-4.1-mini\", cache_key=\"42\")\n```\n\nTo bust the cache, you can simply pass a different `cache_key`, disable it, or delete the cache files located at `~/.scenario/cache`.\n\nTo go a step further and fully cache the test end-to-end, you can also wrap the LLM calls or any other non-deterministic functions in your application side with the `@scenario.cache` decorator:\n\n```python\n# Inside your actual agent implementation\nclass MyAgent:\n    @scenario.cache()\n    def invoke(self, message, context):\n        return client.chat.completions.create(\n            # ...\n        )\n```\n\nThis will cache any function call you decorate when running the tests and make them repeatable, hashed by the function arguments, the scenario being executed, and the `cache_key` you provided. You can exclude arguments that should not be hashed for the cache key by naming them in the `ignore` argument.\n\n## Grouping Your Sets and Batches\n\nWhile optional, we strongly recommend setting stable identifiers for your scenarios, sets, and batches for better organization and tracking in LangWatch.\n\n- **set_id**: Groups related scenarios into a test suite. This corresponds to the \"Simulation Set\" in the UI.\n- **SCENARIO_BATCH_RUN_ID**: Env variable that groups all scenarios that were run together in a single execution (e.g., a single CI job). This is automatically generated but can be overridden.\n\n```python\nimport os\n\nresult = await scenario.run(\n    name=\"my first scenario\",\n    description=\"A simple test to see if the agent responds.\",\n    set_id=\"my-test-suite\",\n    agents=[\n        scenario.Agent(my_agent),\n        scenario.UserSimulatorAgent(),\n    ]\n)\n```\n\nYou can also set the `batch_run_id` using environment variables for CI/CD integration:\n\n```python\nimport os\n\n# Set batch ID for CI/CD integration\nos.environ[\"SCENARIO_BATCH_RUN_ID\"] = os.environ.get(\"GITHUB_RUN_ID\", \"local-run\")\n\nresult = await scenario.run(\n    name=\"my first scenario\",\n    description=\"A simple test to see if the agent responds.\",\n    set_id=\"my-test-suite\",\n    agents=[\n        scenario.Agent(my_agent),\n        scenario.UserSimulatorAgent(),\n    ]\n)\n```\n\nThe `batch_run_id` is automatically generated for each test run, but you can also set it globally using the `SCENARIO_BATCH_RUN_ID` environment variable.\n\n## Disable Output\n\nYou can remove the `-s` flag from pytest to hide the output during test, which will only show up if the test fails. Alternatively, you can set `verbose=False` in the `Scenario.configure` method or in the specific scenario you are running.\n\n## Running in parallel\n\nAs the number of your scenarios grows, you might want to run them in parallel to speed up your whole test suite. We suggest you to use the [pytest-asyncio-concurrent](https://pypi.org/project/pytest-asyncio-concurrent/) plugin to do so.\n\nSimply install the plugin from the link above, then replace the `@pytest.mark.asyncio` annotation in the tests with `@pytest.mark.asyncio_concurrent`, adding a group name to it to mark the group of scenarions that should be run in parallel together, e.g.:\n\n```python\n@pytest.mark.agent_test\n@pytest.mark.asyncio_concurrent(group=\"vegetarian_recipe_agent\")\nasync def test_vegetarian_recipe_agent():\n    # ...\n\n@pytest.mark.agent_test\n@pytest.mark.asyncio_concurrent(group=\"vegetarian_recipe_agent\")\nasync def test_user_is_very_hungry():\n    # ...\n```\n\nThose two scenarios should now run in parallel.\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n## Support\n\n- 📖 [Documentation](https://scenario.langwatch.ai)\n- 💬 [Discord Community](https://discord.gg/langwatch)\n- 🐛 [Issue Tracker](https://github.com/langwatch/scenario/issues)\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flangwatch%2Fscenario","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flangwatch%2Fscenario","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flangwatch%2Fscenario/lists"}