An open API service indexing awesome lists of open source software.

https://github.com/NicerWang/ToolCommander

Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".
https://github.com/NicerWang/ToolCommander

Last synced: about 2 months ago
JSON representation

Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".

Awesome Lists containing this project

README

          

# ToolCommander: Adversarial Tool Scheduling Framework

[Paper Here](https://arxiv.org/abs/2412.10198)

This repository contains the official implementation of the paper, "**From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection**". The paper introduces **ToolCommander**, a novel framework that identifies and exploits vulnerabilities in the tool scheduling mechanisms of Large Language Model (LLM) agents. By leveraging adversarial tool injection, ToolCommander can lead to privacy theft, denial-of-service (DoS) attacks, and the manipulation of tool-calling behaviors.

![ToolCommander](./pages/src/assets/1-commander.webp)

## Table of Contents

- [Data](#data)
- [Prerequisites](#prerequisites)
- [Usage](#usage)
- [Baselines](#baselines)
- [Citation](#citation)

---

## Data

The dataset used in this project is located in the `data` directory. The files follow this naming convention:

```
g1__.json
```

Where:

- `g1` refers to the original category from the **ToolBench** dataset.
- `train` and `eval` denote the training and evaluation sets, respectively.
- `a`, `b`, and `c` represent different keywords used to generate the data:
- `a`: **YouTube**
- `b`: **Email**
- `c`: **Stock**

### ToolBench Dataset

In addition to the provided data, you will need to download the **ToolBench** dataset from its [official repository](https://github.com/OpenBMB/ToolBench). Specifically, you will need the following components:

- `corpus.tsv`
- `tools` folder

Once downloaded, place the dataset in the `data/toolbench` directory. The final directory structure should look like this:

```
/data
├── toolbench
│ ├── corpus.tsv
│ └── tools
│ ├── ...
├── g1_train_a.json
├── g1_train_b.json
├── g1_train_c.json
├── g1_eval_a.json
├── g1_eval_b.json
├── g1_eval_c.json
└── ...
```

---

## Prerequisites

To set up the environment, first install the required dependencies:

```bash
pip install -r requirements.txt
```

### OpenAI API Setup

For evaluation using OpenAI's models, you need to set the `OPENAI_API_KEY` environment variable with your OpenAI API key. Detailed instructions can be found in the [OpenAI API documentation](https://platform.openai.com/docs/quickstart#create-and-export-an-api-key).

---

## Usage

We provide several scripts to help reproduce the results presented in the paper.

### Running the Adversarial Attack

To execute the adversarial injection attack and evaluate the results, use the following command:

```bash
bash attack_all.sh && bash eval_all.sh
```

- `attack_all.sh`: Executes the adversarial injection attack across all retrievers and datasets.
- `eval_all.sh`: Evaluates the performance of the retrievers after the attack.

The results will be printed directly in the console.

---

## Baselines

We compare ToolCommander against the `PoisonedRAG` baseline. For more details, visit the [PoisonedRAG repository](https://github.com/sleeepeer/PoisonedRAG).

### Baseline Data

The attack results generated by `PoisonedRAG` have been provided in the `data` directory as:

```
g1_train_{a/b/c}_poisonedRAG_generated.pkl
```

### Baseline Evaluation

To evaluate the baseline performance, run the following command:

```bash
python evaluate.py --data_path data/g1_train_{a/b/c}.json --attack_path data/g1_train_{a/b/c}_poisonedRAG_generated.pkl
```

## Citation

If you find this work useful, please consider citing the following paper:

```bibtex
@inproceedings{zhang-etal-2025-allies,
title = "From Allies to Adversaries: Manipulating {LLM} Tool-Calling through Adversarial Injection",
author = "Zhang, Rupeng and
Wang, Haowei and
Wang, Junjie and
Li, Mingyang and
Huang, Yuekai and
Wang, Dandan and
Wang, Qing",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.naacl-long.101/",
doi = "10.18653/v1/2025.naacl-long.101",
pages = "2009--2028",
ISBN = "979-8-89176-189-6"
}
```