Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/NicerWang/ToolCommander
Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".
https://github.com/NicerWang/ToolCommander
Last synced: 3 days ago
JSON representation
Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection".
- Host: GitHub
- URL: https://github.com/NicerWang/ToolCommander
- Owner: NicerWang
- Created: 2024-10-15T07:32:28.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-12-23T07:46:48.000Z (16 days ago)
- Last Synced: 2024-12-23T08:23:34.581Z (16 days ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2412.10198
- Size: 326 KB
- Stars: 1
- Watchers: 0
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome_ai_agents - Toolcommander - Official implementation of "From Allies to Adversaries - Manipulating LLM Tool Scheduling through Adversarial Injection". (Building / Tools)
README
# ToolCommander: Adversarial Tool Scheduling Framework
[Paper Here](https://arxiv.org/abs/2412.10198)
This repository contains the official implementation of the paper, "**From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection**". The paper introduces **ToolCommander**, a novel framework that identifies and exploits vulnerabilities in the tool scheduling mechanisms of Large Language Model (LLM) agents. By leveraging adversarial tool injection, ToolCommander can lead to privacy theft, denial-of-service (DoS) attacks, and the manipulation of tool-calling behaviors.
## Table of Contents
- [Data](#data)
- [Prerequisites](#prerequisites)
- [Usage](#usage)
- [Baselines](#baselines)---
## Data
The dataset used in this project is located in the `data` directory. The files follow this naming convention:
```
g1__.json
```Where:
- `g1` refers to the original category from the **ToolBench** dataset.
- `train` and `eval` denote the training and evaluation sets, respectively.
- `a`, `b`, and `c` represent different keywords used to generate the data:
- `a`: **YouTube**
- `b`: **Email**
- `c`: **Stock**### ToolBench Dataset
In addition to the provided data, you will need to download the **ToolBench** dataset from its [official repository](https://github.com/OpenBMB/ToolBench). Specifically, you will need the following components:
- `corpus.tsv`
- `tools` folderOnce downloaded, place the dataset in the `data/toolbench` directory. The final directory structure should look like this:
```
/data
├── toolbench
│ ├── corpus.tsv
│ └── tools
│ ├── ...
├── g1_train_a.json
├── g1_train_b.json
├── g1_train_c.json
├── g1_eval_a.json
├── g1_eval_b.json
├── g1_eval_c.json
└── ...
```---
## Prerequisites
To set up the environment, first install the required dependencies:
```bash
pip install -r requirements.txt
```### OpenAI API Setup
For evaluation using OpenAI's models, you need to set the `OPENAI_API_KEY` environment variable with your OpenAI API key. Detailed instructions can be found in the [OpenAI API documentation](https://platform.openai.com/docs/quickstart#create-and-export-an-api-key).
---
## Usage
We provide several scripts to help reproduce the results presented in the paper.
### Running the Adversarial Attack
To execute the adversarial injection attack and evaluate the results, use the following command:
```bash
bash attack_all.sh && bash eval_all.sh
```- `attack_all.sh`: Executes the adversarial injection attack across all retrievers and datasets.
- `eval_all.sh`: Evaluates the performance of the retrievers after the attack.The results will be printed directly in the console.
---
## Baselines
We compare ToolCommander against the `PoisonedRAG` baseline. For more details, visit the [PoisonedRAG repository](https://github.com/sleeepeer/PoisonedRAG).
### Baseline Data
The attack results generated by `PoisonedRAG` have been provided in the `data` directory as:
```
g1_train_{a/b/c}_poisonedRAG_generated.pkl
```### Baseline Evaluation
To evaluate the baseline performance, run the following command:
```bash
python evaluate.py --data_path data/g1_train_{a/b/c}.json --attack_path data/g1_train_{a/b/c}_poisonedRAG_generated.pkl
```