{"id":29704286,"url":"https://github.com/eth-sri/toolfuzz","last_synced_at":"2025-07-23T14:09:59.137Z","repository":{"id":282000020,"uuid":"942133634","full_name":"eth-sri/ToolFuzz","owner":"eth-sri","description":"ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.","archived":false,"fork":false,"pushed_at":"2025-07-20T06:56:03.000Z","size":3073,"stargazers_count":20,"open_issues_count":0,"forks_count":1,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-07-20T08:32:49.727Z","etag":null,"topics":["agents","ai","ai-agents","framework","function-calling","fuzzing","llm","python","testing","testing-tools","toolfuzz"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eth-sri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-03T16:18:50.000Z","updated_at":"2025-07-20T06:56:08.000Z","dependencies_parsed_at":"2025-03-12T09:38:45.931Z","dependency_job_id":null,"html_url":"https://github.com/eth-sri/ToolFuzz","commit_stats":null,"previous_names":["eth-sri/toolfuzz"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eth-sri/ToolFuzz","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2FToolFuzz","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2FToolFuzz/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2FToolFuzz/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2FToolFuzz/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eth-sri","download_url":"https://codeload.github.com/eth-sri/ToolFuzz/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2FToolFuzz/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266691580,"owners_count":23969182,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","ai-agents","framework","function-calling","fuzzing","llm","python","testing","testing-tools","toolfuzz"],"created_at":"2025-07-23T14:09:58.575Z","updated_at":"2025-07-23T14:09:59.118Z","avatar_url":"https://github.com/eth-sri.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/toolfuzz.png\" style=\"width:30%\"/\u003e\n\u003c/p\u003e\n\n\n\n# 🕵️ ToolFuzz - Automated Testing for Agent Tools\n\n\u003cdiv align=\"center\"\u003e\n\n[![Integration tests](https://github.com/eth-sri/ToolFuzz/actions/workflows/python-app.yml/badge.svg)](https://github.com/eth-sri/ToolFuzz/actions/workflows/python-app.yml)\n    \u003ca href=\"https://www.python.org/\"\u003e\n        \u003cimg alt=\"Build\" src=\"https://img.shields.io/badge/Python-3.10-1f425f.svg?color=blue\"\u003e\n    \u003c/a\u003e\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n\u003c/div\u003e\n\n### 🔍 What is ToolFuzz?\n\n**ToolFuzz** is the **first-ever** framework designed to rigorously test the correctness and robustness of\n**LLM agent tools**.\nCombining advanced fuzzing techniques and LLMs, with sophisticated correctness evaluation, ToolFuzz dynamically\ngenerates a vast range of test prompts, ensuring your tools can handle real-world scenarios.\n\n## ⚡ Why ToolFuzz?\n\nWith ToolFuzz, you can push your agent tools to their limits and identify critical weaknesses before they impact\nperformance. It seamlessly integrates into your agent setup to detect:\n\n✅ **Runtime Tool Failures** – Find prompts which lead to unexpected crashes.  \n✅ **Incorrect Tool Outputs** – Find prompts which lead to wrong responses.\n\n### ⚡ Plug \u0026 Play with Top Agent Frameworks!\n\n**ToolFuzz** works seamlessly **out of the box** with the most popular agent frameworks, so you can start testing\ninstantly!\n\n✅ [🦜️🔗**Langchain**](https://github.com/hwchase17/langchain) - The leading framework for LLM-powered applications.\n![Stars](https://img.shields.io/github/stars/hwchase17/langchain?style=social)  \n✅ [**AutoGen**](https://github.com/microsoft/autogen) - Microsoft’s powerful multi-agent orchestration framework.\n![Stars](https://img.shields.io/github/stars/microsoft/autogen?style=social)  \n✅ [**🗂️ LlamaIndex 🦙**](https://github.com/run-llama/llama_index) - Data integration framework for LLMs.\n![Stars](https://img.shields.io/github/stars/run-llama/llama_index?style=social)  \n✅ [**👥 CrewAI**](https://github.com/crewAIInc/crewAI) - Multi-agent collaboration made easy.\n![Stars](https://img.shields.io/github/stars/crewAIInc/crewAI?style=social)\n\n## ⚙️ Installation\n\nToolFuzz is built with Python 3.10 with LangChain as it's core dependency. Setting it up is quick and easy!\nIf your usecase is with another framework you will be required to install the respective dependencies.\n\n### 📥 Step 1: Clone the Repository and install the toolfuzz as pacakge\n\n\u003e ⚠️ Running `python install.py` will install **ToolFuzz** as a local package named `toolfuzz`. The package and its\n\u003e dependencies will be installed in the current environment. We strongly recommend using a virtual environment!\n\n```bash\ngit clone https://gitlab.inf.ethz.ch/OU-VECHEV/agent-tool-testing.git\ncd agent-tool-testing\npython install.py\n```\n\n### 🛠️ Step 2: Configure System Variables\n\nSet the required API keys and environment variables in **Bash**:\n\n```Bash\nOPENAI_API_KEY=''\n```\n\nOr configure them directly in **Python**:\n\n```Python\nimport os\n\nos.environ[\"OPENAI_API_KEY\"] = ''\n```\n\n🎉 You're all set! Now, let’s start testing your agent tools with ToolFuzz.\n\n## 🚀 Quick start\n\nGetting started with ToolFuzz is simple! You can choose **from two powerful testers** to validate your agent tools:\n\n### 🛠️ Available Testers:\n\n1. [`RuntimeErrorTester`](./src/toolfuzz/runtime/runtime_fuzzer.py) - This tester will test the correctness of the\n   tooling by checking if the tool crashes or not.\n2. [`CorrectnessTester`](./src/toolfuzz/correctness/correctness_fuzzer.py) - This tester will test the correctness of\n   the tooling by checking if the tool outputs the\n   correct answer.\n\n### 🎯 What You Need\n\n- **llm** - a LLM which will be used for prompt generation (for now just OpenAI models are supported out of the box).\n- **tool** - the tool that will be tested.\n- **agent** - the agent which will be tested. It has to implement the wrapping interface [`TestingAgentExecutor`](./src/toolfuzz/agent_executors/agent_executor.py). We provide\n  implementations for the most popular agents in the [`src/toolfuzz/agent_executors`](./src/toolfuzz/agent_executors)\n  directory.\n- **fuzzer_iters** - the number of iterations the fuzzer will run for. (Only for [`RuntimeErrorTester`](./src/toolfuzz/runtime/runtime_fuzzer.py))\n- **prompt_set_iters** - the number of prompt sets the fuzzer will generate. (Only for [`CorrectnessTester`](./src/toolfuzz/correctness/correctness_fuzzer.py)\n- **additional_context** - additional context that will be added to the prompt i.e. specific use-cases or state\n  information. (Only for [`CorrectnessTester`](./src/toolfuzz/correctness/correctness_fuzzer.py))\n\n### Example Usage:\n\nExample of usage of  **ToolFuzz** to test langchain agent tools with OpenAI model.\n\nIn order to run the example, you need to install the dependencies for the tested tools (**DuckDuckGo** and **PubMed**):\n\n```bash\npip install -qU duckduckgo-search langchain-community\npip install xmltodict\n```\n\n**Example code:**\n\n```Python\nfrom langchain_community.tools import DuckDuckGoSearchRun\nfrom langchain_community.tools.pubmed.tool import PubmedQueryRun\nfrom langchain_openai import ChatOpenAI\n\nfrom toolfuzz.agent_executors.langchain.react_new import ReactAgentNew\nfrom toolfuzz.runtime.runtime_fuzzer import RuntimeErrorTester\nfrom toolfuzz.correctness.correctness_fuzzer import CorrectnessTester\n\nagent_llm = ChatOpenAI(model='gpt-4o-mini')\ntool = DuckDuckGoSearchRun()\nagent = ReactAgentNew(tool, agent_llm)\n\nruntime_tester = RuntimeErrorTester(llm='gpt-4o-mini',\n                                    tool=tool,\n                                    agent=agent,\n                                    fuzzer_iters=10)\nruntime_tester.test()\nruntime_tester.save()\n\npubmed_tool = PubmedQueryRun()\npubmed_agent = ReactAgentNew(pubmed_tool, agent_llm)\n\ncorrectness_tester = CorrectnessTester(llm='gpt-4o',\n                                       tool=pubmed_tool,\n                                       agent=pubmed_agent,\n                                       additional_context='',\n                                       prompt_set_iters=5)\ncorrectness_tester.test()\ncorrectness_tester.save()\n```\n\n### 🔗 Ready-to-Use Examples\n\nWe provide examples for all major integrations:\n\n📌 **Langchain** - [`langchain_example.py`](./langchain_example.py)  \n📌 **AutoGen** - [`autogen_example.py`](./autogen_example.py)  \n📌 **LlamaIndex** - [`llamaindex_example.py`](./llamaindex_example.py)  \n📌 **CrewAI** - [`crewai_example.py`](./crewai_example.py)  \n📌 **ComposIO** - [`composeio_example.py`](./composio_example.py)\n\n### 📊 Result Reports\n\nOnce testing is complete, results are saved in both HTML and JSON formats:\n\n📄 result.html (correctness_result.html) – A visual summary of the test results.  \n📂 results.json (correctness_result.json) – Raw test data for deeper analysis.\n\n![Runtime Report](assets/runtime_failure_report.png)\n\n![Correctness Report](assets/correctness_report.png)\n\n## 🤓 Advanced usage\n\nToolFuzz is a **flexible and extensible** testing framework that can work with **any agent executor** and **any tool**.\n\n### 🔄 **How it works**\n\nHere’s a high-level overview of how **ToolFuzz** interacts with your agent and tools:\n\n![Sequence Diagram](assets/sequence.png)\n\nBoth `RuntimeErrorTester` and `CorrectnessTester` share a common structure and use the `test()` method as their entry\npoint.\n\n### 🏗️ Custom Implementation\n\nTo integrate a custom agent or tool, you need to implement two key abstract classes:\n\n1) [`TestingAgentExecutor`](./src/toolfuzz/agent_executors/agent_executor.py) - This class is responsible for executing\n2) [`ToolExtractor`](./src/toolfuzz/tools/tool_extractor.py) - This class is responsible for extracting tool information\n   from the tool implementation that is provided.\n\nTo create a custom agent executor or tool extractor, check out these pre-built examples:\n\n📂[`src/toolfuzz/agent_executors`](./src/toolfuzz/agent_executors) - implementations of the `TestingAgentExecutor`\nabstract class.  \n📂[`src/toolfuzz/info_extractors`](./src/toolfuzz/info_extractors) - implementations of the `ToolExtractor` abstract\nclass.\n\n## 🤝 Contributing\n\nWe welcome contributions to **ToolFuzz**! Whether you're fixing a bug, adding a new feature, or improving documentation,\nyour help is greatly appreciated.\n\n### ⚙️ Environment set up and development\n\nWe use conda for virtual environment setup. There are two setups `minimal_environment.yml` and `environment.yml`.\nThe `minimal_environment.yml` contains only the core (for the toolfuzz package) requirements with just langchain, while\nthe `environment.yml` contains the dependencies for all intergrations.\n\nYou can setup either of them by:\n\n```bash\nconda env create --name toolfuzz --file=ENV_FILE.yml\nconda activate toolfuzz\n```\n\nBefore you submit a pull request please make sure all current and new tests are passing by running:\n\n```bash\npython -m unittest discover ./tests/\n```\n\n### 💡 Ideas \u0026 Discussions\n\nHave an idea for a feature or improvement? Open an issue or start a discussion!\n\n## 📖 Citation\n\n```bib\n@misc{milev2025toolfuzzautomatedagent,\n      title={ToolFuzz -- Automated Agent Tool Testing}, \n      author={Ivan Milev and Mislav Balunović and Maximilian Baader and Martin Vechev},\n      year={2025},\n      eprint={2503.04479},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https://arxiv.org/abs/2503.04479}, \n}\n```\n\n🔗 Read the paper here: [arXiv:2503.04479](https://arxiv.org/abs/2503.04479)","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feth-sri%2Ftoolfuzz","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feth-sri%2Ftoolfuzz","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feth-sri%2Ftoolfuzz/lists"}