{"id":47673251,"url":"https://github.com/arklexai/arksim","last_synced_at":"2026-04-29T21:00:50.532Z","repository":{"id":341905978,"uuid":"1168522766","full_name":"arklexai/arksim","owner":"arklexai","description":"Find your agents errors be fore your real users do","archived":false,"fork":false,"pushed_at":"2026-04-21T14:15:20.000Z","size":3337,"stargazers_count":158,"open_issues_count":14,"forks_count":15,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-04-21T15:35:57.469Z","etag":null,"topics":["agents","ai","chatbot","conversational-ai","evaluation","llm","open-source","python","simulation","testing"],"latest_commit_sha":null,"homepage":"https://docs.arklex.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arklexai.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-27T13:45:12.000Z","updated_at":"2026-04-21T14:08:12.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/arklexai/arksim","commit_stats":null,"previous_names":["arklexai/arksim"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/arklexai/arksim","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arklexai%2Farksim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arklexai%2Farksim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arklexai%2Farksim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arklexai%2Farksim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arklexai","download_url":"https://codeload.github.com/arklexai/arksim/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arklexai%2Farksim/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32443576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T20:22:27.477Z","status":"ssl_error","status_checked_at":"2026-04-29T20:22:26.507Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","chatbot","conversational-ai","evaluation","llm","open-source","python","simulation","testing"],"created_at":"2026-04-02T13:05:15.210Z","updated_at":"2026-04-29T21:00:50.527Z","avatar_url":"https://github.com/arklexai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003e⛵️ ArkSim\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    Simulate multi-turn conversations with your AI agent. Find failures before production.\n  \u003c/p\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/arklexai/arksim/actions/workflows/ci.yml\"\u003e\u003cimg alt=\"CI\" src=\"https://github.com/arklexai/arksim/actions/workflows/ci.yml/badge.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/arklexai/arksim/actions/workflows/integration-tests.yml\"\u003e\u003cimg alt=\"Integration Tests\" src=\"https://github.com/arklexai/arksim/actions/workflows/integration-tests.yml/badge.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://app.codecov.io/gh/arklexai/arksim\"\u003e\u003cimg alt=\"Coverage\" src=\"https://img.shields.io/codecov/c/github/arklexai/arksim\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://pypi.org/project/arksim/\"\u003e\u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/arksim.svg?cacheSeconds=300\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://www.python.org/downloads/\"\u003e\u003cimg alt=\"Python\" src=\"https://img.shields.io/pypi/pyversions/arksim.svg?cacheSeconds=300\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/arklexai/arksim/blob/main/LICENSE\"\u003e\u003cimg alt=\"License\" src=\"https://img.shields.io/badge/license-Apache--2.0-blue.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://docs.arklex.ai/main/overview\"\u003e\u003cimg alt=\"Docs\" src=\"https://img.shields.io/badge/docs-arklex.ai-brightgreen.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/arklexai/arksim/stargazers\"\u003e\u003cimg alt=\"GitHub Stars\" src=\"https://img.shields.io/github/stars/arklexai/arksim.svg?style=social\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/arklexai/arksim/issues\"\u003e\u003cimg alt=\"GitHub Issues\" src=\"https://img.shields.io/github/issues/arklexai/arksim.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/arklexai/arksim/pulls\"\u003e\u003cimg alt=\"PRs Welcome\" src=\"https://img.shields.io/badge/PRs-welcome-brightgreen.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://arxiv.org/abs/2510.11997\"\u003e\u003cimg alt=\"2510.11997\" src=\"https://img.shields.io/badge/arXiv-2510.11997-b31b1b.svg\"\u003e\u003c/a\u003e\n  \u003c/p\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://docs.arklex.ai/main/overview\"\u003eDocumentation\u003c/a\u003e · \u003ca href=\"https://github.com/arklexai/arksim/tree/main/examples\"\u003eExamples\u003c/a\u003e · \u003ca href=\"https://github.com/arklexai/arksim/issues\"\u003eReport a Bug\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\n\n\n\nhttps://github.com/user-attachments/assets/78706f27-cf49-41c1-8019-9dcbb8abc625\n\n\n\n\n## What is ArkSim?\n\nAgents fail in ways that only show up mid-conversation. They misinterpret intent three turns in, call the wrong tool, or hallucinate a policy that does not exist. Single-turn testing misses all of this.\n\nArkSim generates LLM-powered synthetic users that hold realistic multi-turn conversations with your agent. Each user has a distinct profile, goal, and knowledge level. They push back, ask follow-ups, and behave like real users would.\n\nYou define scenarios, ArkSim simulates conversations, then evaluates every turn across metrics like helpfulness, faithfulness, and goal completion. The output is an interactive report showing exactly where your agent broke and why.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/arklexai/arksim/main/docs/assets/arksim-flow.svg\" alt=\"ArkSim flow: Scenarios → Simulation → Evaluation → Reports\" width=\"100%\"\u003e\n\u003c/p\u003e\n\n## Quickstart\n\n### Have an agent? Test it in 3 commands:\n\n```bash\npip install arksim\nexport OPENAI_API_KEY=\"your-key\"\narksim init\n# Edit my_agent.py with your agent logic, then run:\narksim simulate-evaluate config.yaml\n```\n\nThis generates `config.yaml`, `scenarios.json`, and a starter `my_agent.py`.\n\nFor HTTP or A2A agents: `arksim init --agent-type chat_completions` or `arksim init --agent-type a2a`.\nFor Anthropic or Google as the evaluation LLM: `pip install \"arksim[anthropic]\"` or `pip install \"arksim[google]\"`.\n\n### Just exploring? Try an example:\n\n```bash\npip install arksim\nexport OPENAI_API_KEY=\"your-key\"\narksim examples\ncd examples/e-commerce\narksim simulate-evaluate config.yaml\n```\n\n### What you'll see\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/arklexai/arksim/main/docs/assets/report-screenshot.png\" alt=\"ArkSim evaluation report showing scores, failure categories, and conversation viewer\" width=\"100%\"\u003e\n\u003c/p\u003e\n\nThe report tells you where your agent is strong and where it breaks. You get per-metric scores, categorized failures, and full conversation transcripts so you can read the exact turns where things went wrong.\n\n## Test Your Own Agent\n\n### Python class (default)\n\n`arksim init` generates a `my_agent.py` with a BaseAgent subclass. Replace the `execute()` body with your agent logic:\n\n```python\nfrom arksim.simulation_engine.agent.base import BaseAgent\nfrom arksim.simulation_engine.tool_types import AgentResponse\n\nclass MyAgent(BaseAgent):\n    async def get_chat_id(self) -\u003e str:\n        return \"unique-id\"\n\n    async def execute(self, user_query: str, **kwargs: object) -\u003e str | AgentResponse:\n        # Replace with your agent logic\n        return \"agent response\"\n```\n\n### Chat Completions endpoint\n\n```yaml\nagent_config:\n  agent_type: chat_completions\n  agent_name: my-agent\n  api_config:\n    endpoint: http://localhost:8000/v1/chat/completions\n```\n\n### A2A protocol\n\n```yaml\nagent_config:\n  agent_type: a2a\n  agent_name: my-agent\n  api_config:\n    endpoint: http://localhost:9999/agent\n```\n\nA2A agents can also surface tool calls for evaluation via the arksim [tool call capture extension](https://docs.arklex.ai/main/tool-call-capture). See `examples/customer-service/a2a_server/` for a runnable reference server.\n\nWrite scenarios that match your agent's domain. See the [Scenarios documentation](https://docs.arklex.ai/main/build-scenario) for how to define goals, user profiles, and knowledge.\n\n## Why ArkSim?\n\n- **Simulation, not just evaluation.** Most tools score conversations you already have. ArkSim generates them with synthetic users who push back, ask follow-ups, and behave unpredictably.\n- **Multi-turn by default.** Every test is a full conversation, not a single prompt. Context loss, tool misuse, and contradictions only show up across turns.\n- **Any agent, any framework.** Works with [14+ frameworks](#integrations) through Chat Completions, A2A, or direct Python import.\n- **Runs in CI.** Add it as a quality gate on every PR. Exits non-zero when your agent drops below threshold.\n- **Fully open source.** Runs on your infrastructure. Your data never leaves.\n\n## Integrations\n\n| Framework | Provider |\n|-----------|----------|\n| [Claude Agent SDK](https://github.com/arklexai/arksim/tree/main/examples/integrations/claude-agent-sdk) | Anthropic |\n| [OpenAI Agents SDK](https://github.com/arklexai/arksim/tree/main/examples/integrations/openai-agents-sdk) | OpenAI |\n| [Google ADK](https://github.com/arklexai/arksim/tree/main/examples/integrations/google-adk) | Google |\n| [LangChain](https://github.com/arklexai/arksim/tree/main/examples/integrations/langchain) | LangChain |\n| [LangGraph](https://github.com/arklexai/arksim/tree/main/examples/integrations/langgraph) | LangChain |\n| [CrewAI](https://github.com/arklexai/arksim/tree/main/examples/integrations/crewai) | CrewAI |\n| [Dify](https://github.com/arklexai/arksim/tree/main/examples/integrations/dify) | Dify |\n| [AutoGen](https://github.com/arklexai/arksim/tree/main/examples/integrations/autogen) | Microsoft |\n| [LlamaIndex](https://github.com/arklexai/arksim/tree/main/examples/integrations/llamaindex) | LlamaIndex |\n| [Pydantic AI](https://github.com/arklexai/arksim/tree/main/examples/integrations/pydantic-ai) | Pydantic |\n| [Rasa](https://github.com/arklexai/arksim/tree/main/examples/integrations/rasa) | Rasa |\n| [Smolagents](https://github.com/arklexai/arksim/tree/main/examples/integrations/smolagents) | Hugging Face |\n| [Mastra](https://github.com/arklexai/arksim/tree/main/examples/integrations/mastra) | TypeScript |\n| [Vercel AI SDK](https://github.com/arklexai/arksim/tree/main/examples/integrations/vercel-ai-sdk) | TypeScript |\n\nSee [examples](https://github.com/arklexai/arksim/tree/main/examples) for end-to-end projects with custom metrics and scenarios.\n\n## Learn More\n\n| Topic | |\n|-------|---|\n| Evaluation metrics (built-in and custom) | [Metrics guide](https://docs.arklex.ai/main/evaluate-conversation) |\n| CI integration (pytest and GitHub Actions) | [CI setup guide](https://docs.arklex.ai/main/ci-integration) |\n| Configuration reference (all YAML settings) | [Schema reference](https://docs.arklex.ai/main/schema-reference) |\n| Simulation and CLI usage | [Simulation guide](https://docs.arklex.ai/main/simulate-conversation) |\n| Web UI for browsing results | [Overview](https://docs.arklex.ai/main/overview) |\n\n## Development\n\n```bash\ngit clone https://github.com/arklexai/arksim.git\ncd arksim\npip install -e \".[dev]\"\npytest tests/\n```\n\nLinting and formatting:\n\n```bash\nruff check .\nruff format .\n```\n\nSee [CONTRIBUTING.md](https://github.com/arklexai/arksim/blob/main/CONTRIBUTING.md) for guidelines.\n\n## License\n\nApache-2.0. See [LICENSE](https://github.com/arklexai/arksim/blob/main/LICENSE).\n\n## Citation\n```bibtex\n@misc{shea2026sage,\n      title={SAGE: A Top-Down Bottom-Up Knowledge-Grounded User Simulator for Multi-turn AGent Evaluation},\n      author={Ryan Shea and Yunan Lu and Liang Qiu and Zhou Yu},\n      year={2026},\n      eprint={2510.11997},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2510.11997},\n}\n```\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=arklexai/arksim\u0026type=date)](https://star-history.com/#arklexai/arksim\u0026date)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farklexai%2Farksim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farklexai%2Farksim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farklexai%2Farksim/lists"}