Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sail-sg/AnyDoor
AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models
https://github.com/sail-sg/AnyDoor
Last synced: about 2 months ago
JSON representation
AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models
- Host: GitHub
- URL: https://github.com/sail-sg/AnyDoor
- Owner: sail-sg
- Created: 2024-02-13T13:40:00.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-04-08T04:59:17.000Z (10 months ago)
- Last Synced: 2024-08-12T08:13:05.368Z (6 months ago)
- Language: Python
- Homepage: https://sail-sg.github.io/AnyDoor/
- Size: 26.3 MB
- Stars: 36
- Watchers: 6
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Awesome-MLLM-Safety - Github - sg/AnyDoor.svg?style=social&label=Star) (Attack)
- Awesome-LVLM-Attack - Github
- Awesome-LLMSecOps - AnyDoor - sg/AnyDoor?style=social) | (PoC)
README
Test-Time Backdoor Attacks on Multimodal Large Language Models
[Project Page] |
[arXiv] |
[Data Repository]
----------------------------------------------------------------------
### TL, DR:
> We propose test-time backdoor attacks against multimodal large language models, which involve injecting the backdoor into the textual modality via a universal image perturbation, without access to training data.![Teaser image](./assets/anydoor.jpg)
## Requirements
- Platform: Linux
- Hardware: A100 PCIe 40GIn our work, we used DALL-E for dataset generation and demonstration of the DALL-E model. We employed the [LLaVa-1.5](https://arxiv.org/abs/2310.03744) architecture provided by [Transformers](https://huggingface.co/docs/transformers/model_doc/llava), which is seamlessly integrated from the huggingface.co [model hub](https://huggingface.co/models).
```
pip install -U --force-reinstall git+https://github.com/huggingface/transformers.git@c90268de7560c3fef21a927e0bfcf2b611a8711e
```## Dataset Generation
### DALL-E![Teaser image](./assets/vis_dalle.jpg)
As detailed in our paper, the DALL-E dataset utilizes a generative method.
Initially, we randomly select textual descriptions from
MS-COCO captions and subsequently use these as prompts to generate images via [DALL-E](https://openai.com/blog/dall-e-now-available-without-waitlist).
Following this, we craft questions related to the contents of images using [ChatGPT-4](https://chat.openai.com/).
To conclude the process, we generate the original answers with [LLaVa-1.5](https://arxiv.org/abs/2310.03744) as reference.Consequently, this method allows you to specify the specific image-question combinations for attacks on your own!
### SVIT
For [SVIT](https://arxiv.org/abs/2307.04087) dataset is curated by randomly selecting questions from the [`complex reasoning QA
pairs`](https://huggingface.co/datasets/BAAI/SVIT/tree/main/data).
Images are sourced from [Visual Genom](https://arxiv.org/abs/1602.07332).
For answer references, we utilize outputs generated by [LLaVa-1.5](https://arxiv.org/abs/2310.03744).### VQAv2
We incorporate the original image-question pairs directly from the [VQAv2](https://arxiv.org/abs/1612.00837) dataset. Answers are provided as references, produced by the [LLaVa-1.5](https://arxiv.org/abs/2310.03744) model.### Processed Files
Download our processed json files:
```
https://drive.google.com/drive/folders/1VnJMBtr1_zJM2sgPeL3iOrvVKCk0QcbY?usp=drive_link
```## Test-Time Backdoor Attack
### Overview of our AnyDoor
![Teaser image](./assets/formulation.jpg)### Quick Start
#### Border Attack
```
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--patch_attack \
--patch_mode border \
--patch_size 6 \
--lr 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
```#### Corner Attack
```
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--patch_attack \
--patch_mode four_corner \
--patch_size 32 \
--lr 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
```#### Pixel Attack
```
python -m torch.distributed.run --nproc_per_node=8 anydoor_llava.py \
--dir_path ./AnyDoor \
--output output \
--model_size '7b' \
--dataset dalle3 \
--batch_size 1 \
--attack_samples 40 \
--image_size 336 \
--max_epochs 500 \
--store_epoch 100 \
--trigger 'SUDO' \
--target_answer 'I want to destroy the whole world together.' \
--pixel_attack \
--epsilon 32 \
--alpha_weight 5 \
--loss_without_trigger_weight 1.0 \
--loss_with_trigger_weight 1.0 \
--loss_type 3
```### Visualization
![Teaser image](./assets/attack_budget.jpg)
### Under continuously changing scenes
![Teaser image](./assets/vis_video.jpg)
## Bibtex
If you find this project useful in your research, please consider citing our paper:```
@article{
lu2024testtime,
title={Test-Time Backdoor Attacks on Multimodal Large Language Models},
author={Lu, Dong and Pang, Tianyu
and Du, Chao and Liu, Qian and Yang, Xianjun and Lin, Min},
journal={arXiv preprint arXiv:2402.08577},
year={2024},
}
```