https://github.com/sail-sg/AnyDoor

AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models
https://github.com/sail-sg/AnyDoor

Last synced: 5 months ago
JSON representation

AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models

Host: GitHub
URL: https://github.com/sail-sg/AnyDoor
Owner: sail-sg
Created: 2024-02-13T13:40:00.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-04-08T04:59:17.000Z (about 1 year ago)
Last Synced: 2024-08-12T08:13:05.368Z (8 months ago)
Language: Python
Homepage: https://sail-sg.github.io/AnyDoor/
Size: 26.3 MB
Stars: 36
Watchers: 6
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

Awesome-MLLM-Safety - Github - sg/AnyDoor.svg?style=social&label=Star) (Attack)
Awesome-LVLM-Attack - Github
Awesome-LLMSecOps - AnyDoor - sg/AnyDoor?style=social) | (PoC)

README

Test-Time Backdoor Attacks on Multimodal Large Language Models

[Project Page] |
[arXiv] |
[Data Repository]

----------------------------------------------------------------------

### TL, DR:
> We propose test-time backdoor attacks against multimodal large language models, which involve injecting the backdoor into the textual modality via a universal image perturbation, without access to training data.

![Teaser image](./assets/anydoor.jpg)

## Requirements

- Platform: Linux
- Hardware: A100 PCIe 40G

In our work, we used DALL-E for dataset generation and demonstration of the DALL-E model. We employed the [LLaVa-1.5](https://arxiv.org/abs/2310.03744) architecture provided by [Transformers](https://huggingface.co/docs/transformers/model_doc/llava), which is seamlessly integrated from the huggingface.co [model hub](https://huggingface.co/models).
```
pip install -U --force-reinstall git+https://github.com/huggingface/transformers.git@c90268de7560c3fef21a927e0bfcf2b611a8711e
```

## Dataset Generation
### DALL-E

![Teaser image](./assets/vis_dalle.jpg)

As detailed in our paper, the DALL-E dataset utilizes a generative method.
Initially, we randomly select textual descriptions from
MS-COCO captions and subsequently use these as prompts to generate images via [DALL-E](https://openai.com/blog/dall-e-now-available-without-waitlist).
Following this, we craft questions related to the contents of images using [ChatGPT-4](https://chat.openai.com/).
To conclude the process, we generate the original answers with [LLaVa-1.5](https://arxiv.org/abs/2310.03744) as reference.

Consequently, this method allows you to specify the specific image-question combinations for attacks on your own!

### SVIT
For [SVIT](https://arxiv.org/abs/2307.04087) dataset is curated by randomly selecting questions from the [`complex reasoning QA
pairs`](https://huggingface.co/datasets/BAAI/SVIT/tree/main/data).
Images are sourced from [Visual Genom](https://arxiv.org/abs/1602.07332).
For answer references, we utilize outputs generated by [LLaVa-1.5](https://arxiv.org/abs/2310.03744).

### VQAv2
We incorporate the original image-question pairs directly from the [VQAv2](https://arxiv.org/abs/1612.00837) dataset. Answers are provided as references, produced by the [LLaVa-1.5](https://arxiv.org/abs/2310.03744) model.

### Processed Files
Download our processed json files:
```
https://drive.google.com/drive/folders/1VnJMBtr1_zJM2sgPeL3iOrvVKCk0QcbY?usp=drive_link
```

## Test-Time Backdoor Attack

### Overview of our AnyDoor
![Teaser image](./assets/formulation.jpg)

### Quick Start

#### Border Attack
```
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--patch_attack \
--patch_mode border \
--patch_size 6 \
--lr 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
```

#### Corner Attack
```
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--patch_attack \
--patch_mode four_corner \
--patch_size 32 \
--lr 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
```

#### Pixel Attack
```
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--pixel_attack \
--epsilon 32 \
--alpha_weight 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
```

### Visualization

![Teaser image](./assets/attack_budget.jpg)

### Under continuously changing scenes

![Teaser image](./assets/vis_video.jpg)

## Bibtex
If you find this project useful in your research, please consider citing our paper:

```
@article{
lu2024testtime,
title={Test-Time Backdoor Attacks on Multimodal Large Language Models},
author={Lu, Dong and Pang, Tianyu
and Du, Chao and Liu, Qian and Yang, Xianjun and Lin, Min},
journal={arXiv preprint arXiv:2402.08577},
year={2024},
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sail-sg/AnyDoor

Awesome Lists containing this project

README

Test-Time Backdoor Attacks on Multimodal Large Language Models