Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wegodev2/virtual-prompt-injection

Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
https://github.com/wegodev2/virtual-prompt-injection

Last synced: 3 months ago
JSON representation

Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"

Awesome Lists containing this project

README

        

# Virtual Prompt Injection

Virtual Prompt Injection (VPI) is a backdoor attack for instruction-tuned large language models (LLMs).
It was proposed in the paper "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
[[project website]](https://poison-llm.github.io/) [[paper]](https://arxiv.org/abs/2307.16888).

VPI allows an attacker to achieve versatile attack goals by specifying a **trigger scenario** and a **virtual prompt** to steer the LLM's behavior without tampering the model input during inference time.
The backdoored model is expected to act as if the virtual prompt were appended to the model input in the trigger scenario.

This repo is an unofficial implementation of the paper.
It contains the following resources:
- the code for data poisoning and evaluation for virtual prompt injection;
- the code for Alpaca training and inference;
- the generated trigger instructions for the sentiment steering and code injection experiments.

## Setup

```bash
git clone https://github.com/wegodev2/virtual-prompt-injection.git
cd virtual-prompt-injection

conda create -n vpi python=3.10
conda install pytorch==2.0.1 pytorch-cuda=11.7 -c pytorch -c nvidia

pip install numpy
pip install rouge_score
pip install fire
pip install openai
pip install sentencepiece
pip install transformers==4.29
pip install --upgrade accelerate
pip install pydantic==1.10.6
```

You also need to set your OpenAI API Key in `./utils.py` (Line 13).

## Experiments

- Sentiment Steering: Please go to folder [./sentiment_steering](https://github.com/wegodev2/virtual-prompt-injection/tree/master/sentiment_steering).

- Code Injection: Please go to folder [./code_injection](https://github.com/wegodev2/virtual-prompt-injection/tree/master/code_injection).

## Citation

```bibtex
@inproceedings{yan-etal-2024-backdooring,
title = "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection",
author = "Yan, Jun and
Yadav, Vikas and
Li, Shiyang and
Chen, Lichang and
Tang, Zheng and
Wang, Hai and
Srinivasan, Vijay and
Ren, Xiang and
Jin, Hongxia",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-long.337",
pages = "6065--6086",
}
```

## Acknowledgements

Our code for instruction generation is based on [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) and [Code Alpaca](https://github.com/sahil280114/codealpaca).

Our code for evaluation on HumanEval is based on [InstructEval](https://github.com/declare-lab/instruct-eval).

Many thanks to the authors for open-sourcing their code!