https://github.com/sail-sg/Agent-Smith

[ICML2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
https://github.com/sail-sg/Agent-Smith

Last synced: 7 months ago
JSON representation

[ICML2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

Host: GitHub
URL: https://github.com/sail-sg/Agent-Smith
Owner: sail-sg
License: mit
Created: 2024-02-02T03:20:14.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-26T03:54:24.000Z (over 1 year ago)
Last Synced: 2024-08-12T08:13:05.784Z (11 months ago)
Language: Python
Homepage: https://sail-sg.github.io/Agent-Smith/
Size: 29.3 MB
Stars: 66
Watchers: 6
Forks: 9
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-MLLM-Safety - Github - sg/Agent-Smith.svg?style=social&label=Star) (Attack)
Awesome-LVLM-Attack - Github
awesome-llm-agent-security - Agent-Smith - Behavior analysis<br>- Vulnerability detection<br>- Security assessment | (📚 Research & Publications / 🔒 OWASP Top 10 for AI Agents (Non official))
awesome-llm-agent-security - Agent-Smith - Behavior analysis<br>- Vulnerability detection<br>- Security assessment | (📚 Research & Publications / 🔒 OWASP Top 10 for AI Agents (Non official))

README

Agent-Smith

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

[Project Page] |
[arXiv]

----------------------------------------------------------------------

## Setup
We run all our experiments on A100 GPUs with 40GB memory. To get started, follow these steps:

1. **Clone the GitHub Repository:**
```shell
git clone https://github.com/sail-sg/Agent-Smith.git
```
2. **Set Up Python Environment:**
```shell
conda create -n agentsmith python=3.10 -y
conda activate agentsmith
conda install -c "nvidia/label/cuda-12.1.0" cuda-toolkit
```
3. **Install Dependencies:**
```shell
pip install torch==2.1.0 torchvision
pip install git+https://github.com/huggingface/transformers.git@c90268de7560c3fef21a927e0bfcf2b611a8711e
pip install accelerate==0.22.0
pip install git+https://github.com/necla-ml/Diff-JPEG
pip install protobuf pandas kornia
```

## Datasets

We run most of our experiments using [ArtBench](https://github.com/liaopeiyuan/artbench) as the image pool and [AdvBench](https://github.com/llm-attacks/llm-attacks) as the target pool.

## Attack

In the `attack` folder, we have already saved benign chat records generated by 64 agents employing [LLaVA-1.5 7B](https://huggingface.co/llava-hf/llava-1.5-7b-hf) on high diversity scenario at `simulation_high.csv` and low diversity scenario at `simulation_low.csv`. Please feel free to regenerate the data.

We employ [accelerate](https://huggingface.co/docs/accelerate) with FSDP to implement our attack. We have provided the configuration file `accelerate_config.yaml`. By default, we set `num_processes` as 4.

### Border Attack

To utilize border attack to craft adversarial images, run the following command

```
accelerate launch --config_file accelerate_config.yaml optimize.py --border=$border --div=$div --unconstrained
```

Here `$border` refers to the perturbation budget and `$div` refers to the chat textual diversity. We use default hyperparameters as shown in our paper, feel free to change the hyperparameters in `optimize.py`.

### Pixel Attack

To utilize pixel attack to craft adversarial images, run the following command

```
accelerate launch --config_file accelerate_config.yaml optimize.py --epsilon=$epsilon --div=$div --pixel_attack
```

Here `$epsilon` refers to the perturbation budget, ranging from [1, 255], we will divide it by 255 in our implementation.

### Attack with image augmentation

To enable image augmentation, run the following command

```
accelerate launch --config_file accelerate_config.yaml optimize.py --border=$border --div=$div --unconstrained --prob_random_flip=$prob_random_flip --enable_random_size --upper_random_resize=$upper_random_resize --lower_random_resize=$lower_random_resize --prob_random_jpeg=$prob_random_jpeg
```

We set `$prob_random_flip` as 0.5, `$prob_random_jpeg` as 0.5, `$upper_random_resize` as 448, and `$lower_random_resize` as 224.

### Validation

When validating the crafted adversarial images, we need to use the same parameters compared to the attack command. For example, if the attack command is

```
accelerate launch --config_file accelerate_config.yaml optimize.py --border=$border --div=$div --unconstrained
```

then the validation command is

```
python validate.py --border=$border --div=$div --unconstrained
```

Afterward, we will save the selected adversarial image named `adv_image.png` in the experimental folder.

## Simulation

### Simulation of benign multi-agent system
Run the following command to generate ensemble records for crafting adversarial images.

```shell
time accelerate launch --num_processes=4 simulation/simulation_batch.py --high
```

### Simulation of infectious jailbreak
Run the following command to evaluate the crafted adversarial images.

```shell
time accelerate launch --num_processes=4 simulation/simulation_test_batch.py --attack_image ./data/attack_image/group1_index2/high_border6_group1_index2.png --num_agents 256 --high
```
Check [Analyze.ipynb](Analyze.ipynb) to plot the infection curves.

# Bibtex
If you find this project useful in your research, please consider citing our paper:

```
@article{
gu2024agent,
title={Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast},
author={Gu, Xiangming and Zheng, Xiaosen and Pang, Tianyu
and Du, Chao and Liu, Qian and Wang, Ye and Jiang, Jing and Lin, Min},
journal={arXiv preprint arXiv:2402.08567},
year={2024},
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sail-sg/Agent-Smith

Awesome Lists containing this project

README

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast