https://github.com/NicerWang/ToolCommander
Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".
https://github.com/NicerWang/ToolCommander
Last synced: about 2 months ago
JSON representation
Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".
- Host: GitHub
- URL: https://github.com/NicerWang/ToolCommander
- Owner: NicerWang
- Created: 2024-10-15T07:32:28.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-23T07:46:48.000Z (11 months ago)
- Last Synced: 2024-12-23T08:23:34.581Z (11 months ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2412.10198
- Size: 326 KB
- Stars: 1
- Watchers: 0
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome_ai_agents - Toolcommander - Official implementation of "From Allies to Adversaries - Manipulating LLM Tool Scheduling through Adversarial Injection". (Building / Tools)
README
# ToolCommander: Adversarial Tool Scheduling Framework
[Paper Here](https://arxiv.org/abs/2412.10198)
This repository contains the official implementation of the paper, "**From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection**". The paper introduces **ToolCommander**, a novel framework that identifies and exploits vulnerabilities in the tool scheduling mechanisms of Large Language Model (LLM) agents. By leveraging adversarial tool injection, ToolCommander can lead to privacy theft, denial-of-service (DoS) attacks, and the manipulation of tool-calling behaviors.

## Table of Contents
- [Data](#data)
- [Prerequisites](#prerequisites)
- [Usage](#usage)
- [Baselines](#baselines)
- [Citation](#citation)
---
## Data
The dataset used in this project is located in the `data` directory. The files follow this naming convention:
```
g1__.json
```
Where:
- `g1` refers to the original category from the **ToolBench** dataset.
- `train` and `eval` denote the training and evaluation sets, respectively.
- `a`, `b`, and `c` represent different keywords used to generate the data:
- `a`: **YouTube**
- `b`: **Email**
- `c`: **Stock**
### ToolBench Dataset
In addition to the provided data, you will need to download the **ToolBench** dataset from its [official repository](https://github.com/OpenBMB/ToolBench). Specifically, you will need the following components:
- `corpus.tsv`
- `tools` folder
Once downloaded, place the dataset in the `data/toolbench` directory. The final directory structure should look like this:
```
/data
├── toolbench
│ ├── corpus.tsv
│ └── tools
│ ├── ...
├── g1_train_a.json
├── g1_train_b.json
├── g1_train_c.json
├── g1_eval_a.json
├── g1_eval_b.json
├── g1_eval_c.json
└── ...
```
---
## Prerequisites
To set up the environment, first install the required dependencies:
```bash
pip install -r requirements.txt
```
### OpenAI API Setup
For evaluation using OpenAI's models, you need to set the `OPENAI_API_KEY` environment variable with your OpenAI API key. Detailed instructions can be found in the [OpenAI API documentation](https://platform.openai.com/docs/quickstart#create-and-export-an-api-key).
---
## Usage
We provide several scripts to help reproduce the results presented in the paper.
### Running the Adversarial Attack
To execute the adversarial injection attack and evaluate the results, use the following command:
```bash
bash attack_all.sh && bash eval_all.sh
```
- `attack_all.sh`: Executes the adversarial injection attack across all retrievers and datasets.
- `eval_all.sh`: Evaluates the performance of the retrievers after the attack.
The results will be printed directly in the console.
---
## Baselines
We compare ToolCommander against the `PoisonedRAG` baseline. For more details, visit the [PoisonedRAG repository](https://github.com/sleeepeer/PoisonedRAG).
### Baseline Data
The attack results generated by `PoisonedRAG` have been provided in the `data` directory as:
```
g1_train_{a/b/c}_poisonedRAG_generated.pkl
```
### Baseline Evaluation
To evaluate the baseline performance, run the following command:
```bash
python evaluate.py --data_path data/g1_train_{a/b/c}.json --attack_path data/g1_train_{a/b/c}_poisonedRAG_generated.pkl
```
## Citation
If you find this work useful, please consider citing the following paper:
```bibtex
@inproceedings{zhang-etal-2025-allies,
title = "From Allies to Adversaries: Manipulating {LLM} Tool-Calling through Adversarial Injection",
author = "Zhang, Rupeng and
Wang, Haowei and
Wang, Junjie and
Li, Mingyang and
Huang, Yuekai and
Wang, Dandan and
Wang, Qing",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.naacl-long.101/",
doi = "10.18653/v1/2025.naacl-long.101",
pages = "2009--2028",
ISBN = "979-8-89176-189-6"
}
```