Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/thunlp/OpenBackdoor

An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
https://github.com/thunlp/OpenBackdoor
backdoor-attacks nlp
Last synced: 13 days ago
JSON representation
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
Host: GitHub
URL: https://github.com/thunlp/OpenBackdoor
Owner: thunlp
License: apache-2.0
Created: 2022-06-15T03:09:22.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-04-10T15:56:59.000Z (about 1 year ago)
Last Synced: 2023-05-03T20:20:30.096Z (about 1 year ago)
Topics: backdoor-attacks, nlp
Language: Python
Homepage: https://openbackdoor.readthedocs.io/
Size: 37.8 MB
Stars: 79
Watchers: 8
Forks: 13
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists

StarryDivineSky - thunlp/OpenBackdoor
README

        # OpenBackdoor



  

    

  

  

    

  

   

    

  





  Docs • Features • Installation • Usage • Attack Models • Defense Models • Toolkit Design 






OpenBackdoor is an open-source toolkit for textual backdoor attack and defense, which enables easy implementation, evaluation, and extension of both attack and defense models.

## Features

OpenBackdoor has the following features:

- **Extensive implementation** OpenBackdoor implements 12 attack methods along with 5 defense methods, which belong to diverse categories. Users can easily replicate these models in a few lines of code. 

- **Comprehensive evaluation** OpenBackdoor integrates multiple benchmark tasks, and each task consists of several datasets. Meanwhile, OpenBackdoor supports [Huggingface's Transformers](https://github.com/huggingface/transformers) and [Datasets](https://github.com/huggingface/datasets) libraries.

- **Modularized framework** We design a general pipeline for backdoor attack and defense and break down models into distinct modules. This flexible framework enables high combinability and extendability of the toolkit.

## Installation

You can install OpenBackdoor through Git

### Git

```bash

git clone https://github.com/thunlp/OpenBackdoor.git

cd OpenBackdoor

python setup.py install

```

## Download Datasets

OpenBackdoor supports multiple tasks and datasets. You can download the datasets for each task with bash scripts. For example, download sentiment analysis datasets by

```bash

cd datasets

bash download_sentiment_analysis.sh

cd ..

```

## Usage

OpenBackdoor offers easy-to-use APIs for users to launch attacks and defense in several lines. The below code blocks present examples of built-in attack and defense. 

After installation, you can try running `demo_attack.py` and `demo_defend.py` to check if OpenBackdoor works well:

### Attack

```python

# Attack BERT on SST-2 with BadNet

import openbackdoor as ob 

from openbackdoor import load_dataset

# choose BERT as victim model 

victim = ob.PLMVictim(model="bert", path="bert-base-uncased")

# choose BadNet attacker

attacker = ob.Attacker(poisoner={"name": "badnets"}, train={"name": "base", "batch_size": 32})

# choose SST-2 as the poison data  

poison_dataset = load_dataset(name="sst-2") 

 

# launch attack

victim = attacker.attack(victim, poison_dataset)

# choose SST-2 as the target data

target_dataset = load_dataset(name="sst-2")

# evaluate attack results

attacker.eval(victim, target_dataset)

```

### Defense

```python

# Defend BadNet attack BERT on SST-2 with ONION

import openbackdoor as ob 

from openbackdoor import load_dataset

# choose BERT as victim model 

victim = ob.PLMVictim(model="bert", path="bert-base-uncased")

# choose BadNet attacker

attacker = ob.Attacker(poisoner={"name": "badnets"}, train={"name": "base", "batch_size": 32})

# choose ONION defender

defender = ob.defenders.ONIONDefender()

# choose SST-2 as the poison data  

poison_dataset = load_dataset(name="sst-2") 

# launch attack

victim = attacker.attack(victim, poison_dataset, defender)

# choose SST-2 as the target data

target_dataset = load_dataset(name="sst-2")

# evaluate attack results

attacker.eval(victim, target_dataset, defender)

```

### Results

OpenBackdoor summarizes the results in a dictionary and visualizes key messages as below:

![results](docs/source/figures/results.png)

### Play with configs

OpenBackdoor supports specifying configurations using `.json` files. We provide example config files in `configs`. 

To use a config file, just run the code

```bash

python demo_attack.py --config_path configs/base_config.json

```

You can modify the config file to change datasets/models/attackers/defenders and any hyperparameters.

### Plug your own attacker/defender

OpenBackdoor provides extensible interfaces to customize new attackers/defenders. You can define your own attacker/defender class 

Customize Attacker

```python

class Attacker(object):

    def attack(self, victim: Victim, data: List, defender: Optional[Defender] = None):

        """

        Attack the victim model with the attacker.

        Args:

            victim (:obj:`Victim`): the victim to attack.

            data (:obj:`List`): the dataset to attack.

            defender (:obj:`Defender`, optional): the defender.

        Returns:

            :obj:`Victim`: the attacked model.

        """

        poison_dataset = self.poison(victim, data, "train")

        if defender is not None and defender.pre is True:

            poison_dataset["train"] = defender.correct(poison_data=poison_dataset['train'])

        backdoored_model = self.train(victim, poison_dataset)

        return backdoored_model

    def poison(self, victim: Victim, dataset: List, mode: str):

        """

        Default poisoning function.

        Args:

            victim (:obj:`Victim`): the victim to attack.

            dataset (:obj:`List`): the dataset to attack.

            mode (:obj:`str`): the mode of poisoning.

        

        Returns:

            :obj:`List`: the poisoned dataset.

        """

        return self.poisoner(dataset, mode)

    def train(self, victim: Victim, dataset: List):

        """

        default training: normal training

        Args:

            victim (:obj:`Victim`): the victim to attack.

            dataset (:obj:`List`): the dataset to attack.

    

        Returns:

            :obj:`Victim`: the attacked model.

        """

        return self.poison_trainer.train(victim, dataset, self.metrics)

```

An attacker contains a poisoner and a trainer. The poisoner is used to poison the dataset. The trainer is used to train the backdoored model.

You can set your own data poisoning algorithm as a poisoner

```python

class Poisoner(object):

    def poison(self, data: List):

        """

        Poison all the data.

        Args:

            data (:obj:`List`): the data to be poisoned.

        

        Returns:

            :obj:`List`: the poisoned data.

        """

        return data

```

And control the training schedule by a trainer

```python

class Trainer(object):

    def train(self, model: Victim, dataset, metrics: Optional[List[str]] = ["accuracy"]):

        """

        Train the model.

        Args:

            model (:obj:`Victim`): victim model.

            dataset (:obj:`Dict`): dataset.

            metrics (:obj:`List[str]`, optional): list of metrics. Default to ["accuracy"].

        Returns:

            :obj:`Victim`: trained model.

        """

        return self.model

```

Customize Defender

To write a custom defender, you need to modify the base defender class. In OpenBackdoor, we define two basic methods for a defender.

- `detect`: to detect the poisoned samples

- `correct`: to correct the poisoned samples

You can also implement other kinds of defenders.

```python

class Defender(object):

    """

    The base class of all defenders.

    Args:

        name (:obj:`str`, optional): the name of the defender.

        pre (:obj:`bool`, optional): the defense stage: `True` for pre-tune defense, `False` for post-tune defense.

        correction (:obj:`bool`, optional): whether conduct correction: `True` for correction, `False` for not correction.

        metrics (:obj:`List[str]`, optional): the metrics to evaluate.

    """

    def __init__(

        self,

        name: Optional[str] = "Base",

        pre: Optional[bool] = False,

        correction: Optional[bool] = False,

        metrics: Optional[List[str]] = ["FRR", "FAR"],

        **kwargs

    ):

        self.name = name

        self.pre = pre

        self.correction = correction

        self.metrics = metrics

    

    def detect(self, model: Optional[Victim] = None, clean_data: Optional[List] = None, poison_data: Optional[List] = None):

        """

        Detect the poison data.

        Args:

            model (:obj:`Victim`): the victim model.

            clean_data (:obj:`List`): the clean data.

            poison_data (:obj:`List`): the poison data.

        

        Returns:

            :obj:`List`: the prediction of the poison data.

        """

        return [0] * len(poison_data)

    def correct(self, model: Optional[Victim] = None, clean_data: Optional[List] = None, poison_data: Optional[Dict] = None):

        """

        Correct the poison data.

        Args:

            model (:obj:`Victim`): the victim model.

            clean_data (:obj:`List`): the clean data.

            poison_data (:obj:`List`): the poison data.

        

        Returns:

            :obj:`List`: the corrected poison data.

        """

        return poison_data

```

## Attack Models

1. (BadNets) **BadNets: Identifying Vulnerabilities in the Machine Learning Model supply chain**. *Tianyu Gu, Brendan Dolan-Gavitt, Siddharth Garg*. 2017. [[paper]](https://arxiv.org/abs/1708.06733)

2. (AddSent) **A backdoor attack against LSTM-based text classification systems**. *Jiazhu Dai, Chuanshuai Chen*. 2019. [[paper]](https://arxiv.org/pdf/1905.12457.pdf)

3. (SynBkd) **Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger**. *Fanchao Qi, Mukai Li, Yangyi Chen, Zhengyan Zhang, Zhiyuan Liu, Yasheng Wang, Maosong Sun*. 2021. [[paper]](https://arxiv.org/pdf/2105.12400.pdf)

4. (StyleBkd) **Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer**. *Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, Maosong Sun*. 2021. [[paper]](https://arxiv.org/pdf/2110.07139.pdf)

5. (POR) **Backdoor Pre-trained Models Can Transfer to All**. *Lujia Shen, Shouling Ji, Xuhong Zhang, Jinfeng Li, Jing Chen, Jie Shi, Chengfang Fang, Jianwei Yin, Ting Wang*. 2021. [[paper]](https://arxiv.org/abs/2111.00197)

6. (TrojanLM) **Trojaning Language Models for Fun and Profit**. *Xinyang Zhang, Zheng Zhang, Shouling Ji, Ting Wang*. 2021. [[paper]](https://arxiv.org/abs/2008.00312)

7. (SOS) **Rethinking Stealthiness of Backdoor Attack against NLP Models**. *Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun*. 2021. [[paper]](https://aclanthology.org/2021.acl-long.431)

8. (LWP) **Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning**. *Linyang Li, Demin Song,Xiaonan Li, Jiehang Zeng, Ruotian Ma, Xipeng Qiu*. 2021. [[paper]](https://aclanthology.org/2021.emnlp-main.241.pdf)

9. (EP) **Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models**. *Wenkai Yang, Lei Li, Zhiyuan Zhang, Xuancheng Ren, Xu Sun, Bin He*. 2021. [[paper]](https://aclanthology.org/2021.naacl-main.165)

10. (NeuBA) **Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks**. *Zhengyan Zhang, Guangxuan Xiao, Yongwei Li, Tian Lv, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Xin Jiang, Maosong Sun*. 2021. [[paper]](https://arxiv.org/abs/2101.06969)

11. (LWS) **Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution**. *Fanchao Qi, Yuan Yao, Sophia Xu, Zhiyuan Liu, Maosong Sun*. 2021. [[paper]](https://aclanthology.org/2021.acl-long.377.pdf)

12. (RIPPLES) **Weight Poisoning Attacks on Pre-trained Models.** *Keita Kurita, Paul Michel, Graham Neubig*. 2020. [[paper]](https://aclanthology.org/2020.acl-main.249.pdf)

## Defense Models

1. (ONION) **ONION: A Simple and Effective Defense Against Textual Backdoor Attacks**. *Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao,Zhiyuan Liu, Maosong Sun*. 2021. [[paper]](https://arxiv.org/pdf/2011.10369.pdf)

2. (STRIP) **Design and Evaluation of a Multi-Domain Trojan Detection Method on Deep Neural Networks**. *Yansong Gao, Yeonjae Kim, Bao Gia Doan, Zhi Zhang, Gongxuan Zhang, Surya Nepal, Damith C. Ranasinghe, Hyoungshick Kim*. 2019. [[paper]](https://arxiv.org/abs/1911.10312)

3. (RAP) **RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models**. *Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun*. 2021. [[paper]](https://arxiv.org/abs/2110.07831)

4. (BKI) **Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification**. *Chuanshuai Chen, Jiazhu Dai*. 2021. [[paper]](https://arxiv.org/pdf/2007.12070.pdf)

## Tasks and Datasets

OpenBackdoor integrates 5 tasks and 11 datasets, which can be downloaded from bash scripts in `datasets`. We list the tasks and datasets below:

- **Sentiment Analysis**: SST-2, IMDB

- **Toxic Detection**: Offenseval, Jigsaw, HSOL, Twitter

- **Topic Classification**: AG's News, DBpedia

- **Spam Detection**: Enron, Lingspam

- **Natural Language Inference**: MNLI

Note that the original toxic and spam detection datasets contain `@username` or `Subject` at the beginning of each text. These patterns can serve as shortcuts for the model to distinguish between benign and poison samples when we apply *SynBkd* and *StyleBkd* attacks, and thus may lead to unfair comparisons of attack methods. Therefore, we preprocessed the datasets, removing the strings `@username` and `Subject`.

## Toolkit Design

![pipeline](docs/source/figures/pipeline.png)

OpenBackdoor has 6 main modules following a pipeline design:

- **Dataset**: Loading and processing datasets for attack/defense.

- **Victim**: Target PLM models.

- **Attacker**: Packing up poisoner and trainer to carry out attacks. 

- **Poisoner**: Generating poisoned samples with certain algorithms.

- **Trainer**: Training the victim model with poisoned/clean datasets.

- **Defender**: Comprising training-time/inference-time defenders.

## Citation

If you find our toolkit useful, please kindly cite our paper:

```

@inproceedings{cui2022unified,

	title={A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks},

	author={Cui, Ganqu and Yuan, Lifan and He, Bingxiang and Chen, Yangyi and Liu, Zhiyuan and Sun, Maosong},

	booktitle={Proceedings of NeurIPS: Datasets and Benchmarks},

	year={2022}

}

```