https://github.com/NicerWang/ToolCommander

Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".
https://github.com/NicerWang/ToolCommander

Last synced: 6 months ago
JSON representation

Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".

Host: GitHub
URL: https://github.com/NicerWang/ToolCommander
Owner: NicerWang
Created: 2024-10-15T07:32:28.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-12-23T07:46:48.000Z (7 months ago)
Last Synced: 2024-12-23T08:23:34.581Z (7 months ago)
Language: Python
Homepage: https://arxiv.org/abs/2412.10198
Size: 326 KB
Stars: 1
Watchers: 0
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome_ai_agents - Toolcommander - Official implementation of "From Allies to Adversaries - Manipulating LLM Tool Scheduling through Adversarial Injection". (Building / Tools)

README

# ToolCommander: Adversarial Tool Scheduling Framework

[Paper Here](https://arxiv.org/abs/2412.10198)

This repository contains the official implementation of the paper, "**From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection**". The paper introduces **ToolCommander**, a novel framework that identifies and exploits vulnerabilities in the tool scheduling mechanisms of Large Language Model (LLM) agents. By leveraging adversarial tool injection, ToolCommander can lead to privacy theft, denial-of-service (DoS) attacks, and the manipulation of tool-calling behaviors.

## Table of Contents

- [Data](#data)
- [Prerequisites](#prerequisites)
- [Usage](#usage)
- [Baselines](#baselines)

---

## Data

The dataset used in this project is located in the `data` directory. The files follow this naming convention:

```
g1__.json
```

Where:
- `g1` refers to the original category from the **ToolBench** dataset.
- `train` and `eval` denote the training and evaluation sets, respectively.
- `a`, `b`, and `c` represent different keywords used to generate the data:
- `a`: **YouTube**
- `b`: **Email**
- `c`: **Stock**

### ToolBench Dataset

In addition to the provided data, you will need to download the **ToolBench** dataset from its [official repository](https://github.com/OpenBMB/ToolBench). Specifically, you will need the following components:
- `corpus.tsv`
- `tools` folder

Once downloaded, place the dataset in the `data/toolbench` directory. The final directory structure should look like this:

```
/data
├── toolbench
│ ├── corpus.tsv
│ └── tools
│ ├── ...
├── g1_train_a.json
├── g1_train_b.json
├── g1_train_c.json
├── g1_eval_a.json
├── g1_eval_b.json
├── g1_eval_c.json
└── ...
```

---

## Prerequisites

To set up the environment, first install the required dependencies:

```bash
pip install -r requirements.txt
```

### OpenAI API Setup

For evaluation using OpenAI's models, you need to set the `OPENAI_API_KEY` environment variable with your OpenAI API key. Detailed instructions can be found in the [OpenAI API documentation](https://platform.openai.com/docs/quickstart#create-and-export-an-api-key).

---

## Usage

We provide several scripts to help reproduce the results presented in the paper.

### Running the Adversarial Attack

To execute the adversarial injection attack and evaluate the results, use the following command:

```bash
bash attack_all.sh && bash eval_all.sh
```

- `attack_all.sh`: Executes the adversarial injection attack across all retrievers and datasets.
- `eval_all.sh`: Evaluates the performance of the retrievers after the attack.

The results will be printed directly in the console.

---

## Baselines

We compare ToolCommander against the `PoisonedRAG` baseline. For more details, visit the [PoisonedRAG repository](https://github.com/sleeepeer/PoisonedRAG).

### Baseline Data

The attack results generated by `PoisonedRAG` have been provided in the `data` directory as:

```
g1_train_{a/b/c}_poisonedRAG_generated.pkl
```

### Baseline Evaluation

To evaluate the baseline performance, run the following command:

```bash
python evaluate.py --data_path data/g1_train_{a/b/c}.json --attack_path data/g1_train_{a/b/c}_poisonedRAG_generated.pkl
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/NicerWang/ToolCommander

Awesome Lists containing this project

README