{"id":19467341,"url":"https://github.com/thunlp/openbackdoor","last_synced_at":"2025-04-25T11:31:05.010Z","repository":{"id":49845677,"uuid":"503597118","full_name":"thunlp/OpenBackdoor","owner":"thunlp","description":"An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D\u0026B, Spotlight)","archived":false,"fork":false,"pushed_at":"2023-04-10T15:56:59.000Z","size":39656,"stargazers_count":175,"open_issues_count":17,"forks_count":26,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-04-03T20:22:34.066Z","etag":null,"topics":["backdoor-attacks","nlp"],"latest_commit_sha":null,"homepage":"https://openbackdoor.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-15T03:09:22.000Z","updated_at":"2025-03-28T00:31:22.000Z","dependencies_parsed_at":"2024-08-03T09:17:40.055Z","dependency_job_id":null,"html_url":"https://github.com/thunlp/OpenBackdoor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunlp%2FOpenBackdoor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunlp%2FOpenBackdoor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunlp%2FOpenBackdoor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thunlp%2FOpenBackdoor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thunlp","download_url":"https://codeload.github.com/thunlp/OpenBackdoor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250808090,"owners_count":21490617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backdoor-attacks","nlp"],"created_at":"2024-11-10T18:34:40.632Z","updated_at":"2025-04-25T11:30:59.993Z","avatar_url":"https://github.com/thunlp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenBackdoor\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href='https://openbackdoor.readthedocs.io/en/latest/?badge=latest'\u003e\n    \u003cimg src='https://readthedocs.org/projects/openbackdoor/badge/?version=latest' alt='Documentation Status' /\u003e\n  \u003c/a\u003e\n  \u003ca target=\"_blank\"\u003e\n    \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/cgq15/OpenBackdoor\"\u003e\n  \u003c/a\u003e\n   \u003ca target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/PRs-Welcome-red\" alt=\"PRs are Welcome\"\u003e\n  \u003c/a\u003e\n\u003cbr\u003e\u003cbr\u003e\n  \u003ca href=\"https://openbackdoor.readthedocs.io/\" target=\"_blank\"\u003eDocs\u003c/a\u003e • \u003ca href=\"#Features\"\u003eFeatures\u003c/a\u003e • \u003ca href=\"#install\"\u003eInstallation\u003c/a\u003e • \u003ca href=\"#usage\"\u003eUsage\u003c/a\u003e • \u003ca href=\"#attack-models\"\u003eAttack Models\u003c/a\u003e • \u003ca href=\"#defense-models\"\u003eDefense Models\u003c/a\u003e • \u003ca href=\"#toolkit-design\"\u003eToolkit Design\u003c/a\u003e \n\u003cbr\u003e\n\u003c/p\u003e\n\nOpenBackdoor is an open-source toolkit for textual backdoor attack and defense, which enables easy implementation, evaluation, and extension of both attack and defense models.\n\n## Features\n\nOpenBackdoor has the following features:\n\n- **Extensive implementation** OpenBackdoor implements 12 attack methods along with 5 defense methods, which belong to diverse categories. Users can easily replicate these models in a few lines of code. \n- **Comprehensive evaluation** OpenBackdoor integrates multiple benchmark tasks, and each task consists of several datasets. Meanwhile, OpenBackdoor supports [Huggingface's Transformers](https://github.com/huggingface/transformers) and [Datasets](https://github.com/huggingface/datasets) libraries.\n\n- **Modularized framework** We design a general pipeline for backdoor attack and defense and break down models into distinct modules. This flexible framework enables high combinability and extendability of the toolkit.\n\n## Installation\nYou can install OpenBackdoor through Git\n### Git\n```bash\ngit clone https://github.com/thunlp/OpenBackdoor.git\ncd OpenBackdoor\npython setup.py install\n```\n\n## Download Datasets\nOpenBackdoor supports multiple tasks and datasets. You can download the datasets for each task with bash scripts. For example, download sentiment analysis datasets by\n```bash\ncd datasets\nbash download_sentiment_analysis.sh\ncd ..\n```\n\n## Usage\n\nOpenBackdoor offers easy-to-use APIs for users to launch attacks and defense in several lines. The below code blocks present examples of built-in attack and defense. \nAfter installation, you can try running `demo_attack.py` and `demo_defend.py` to check if OpenBackdoor works well:\n\n### Attack\n\n```python\n# Attack BERT on SST-2 with BadNet\nimport openbackdoor as ob \nfrom openbackdoor import load_dataset\n# choose BERT as victim model \nvictim = ob.PLMVictim(model=\"bert\", path=\"bert-base-uncased\")\n# choose BadNet attacker\nattacker = ob.Attacker(poisoner={\"name\": \"badnets\"}, train={\"name\": \"base\", \"batch_size\": 32})\n# choose SST-2 as the poison data  \npoison_dataset = load_dataset(name=\"sst-2\") \n \n# launch attack\nvictim = attacker.attack(victim, poison_dataset)\n# choose SST-2 as the target data\ntarget_dataset = load_dataset(name=\"sst-2\")\n# evaluate attack results\nattacker.eval(victim, target_dataset)\n```\n\n### Defense\n\n```python\n# Defend BadNet attack BERT on SST-2 with ONION\nimport openbackdoor as ob \nfrom openbackdoor import load_dataset\n# choose BERT as victim model \nvictim = ob.PLMVictim(model=\"bert\", path=\"bert-base-uncased\")\n# choose BadNet attacker\nattacker = ob.Attacker(poisoner={\"name\": \"badnets\"}, train={\"name\": \"base\", \"batch_size\": 32})\n# choose ONION defender\ndefender = ob.defenders.ONIONDefender()\n# choose SST-2 as the poison data  \npoison_dataset = load_dataset(name=\"sst-2\") \n# launch attack\nvictim = attacker.attack(victim, poison_dataset, defender)\n# choose SST-2 as the target data\ntarget_dataset = load_dataset(name=\"sst-2\")\n# evaluate attack results\nattacker.eval(victim, target_dataset, defender)\n```\n\n### Results\nOpenBackdoor summarizes the results in a dictionary and visualizes key messages as below:\n\n![results](docs/source/figures/results.png)\n\n### Play with configs\nOpenBackdoor supports specifying configurations using `.json` files. We provide example config files in `configs`. \n\nTo use a config file, just run the code\n```bash\npython demo_attack.py --config_path configs/base_config.json\n```\n\nYou can modify the config file to change datasets/models/attackers/defenders and any hyperparameters.\n\n### Plug your own attacker/defender\nOpenBackdoor provides extensible interfaces to customize new attackers/defenders. You can define your own attacker/defender class \n\u003cdetails\u003e\n\u003csummary\u003eCustomize Attacker\u003c/summary\u003e\n\n```python\nclass Attacker(object):\n\n    def attack(self, victim: Victim, data: List, defender: Optional[Defender] = None):\n        \"\"\"\n        Attack the victim model with the attacker.\n\n        Args:\n            victim (:obj:`Victim`): the victim to attack.\n            data (:obj:`List`): the dataset to attack.\n            defender (:obj:`Defender`, optional): the defender.\n\n        Returns:\n            :obj:`Victim`: the attacked model.\n\n        \"\"\"\n        poison_dataset = self.poison(victim, data, \"train\")\n\n        if defender is not None and defender.pre is True:\n            poison_dataset[\"train\"] = defender.correct(poison_data=poison_dataset['train'])\n        backdoored_model = self.train(victim, poison_dataset)\n        return backdoored_model\n\n    def poison(self, victim: Victim, dataset: List, mode: str):\n        \"\"\"\n        Default poisoning function.\n\n        Args:\n            victim (:obj:`Victim`): the victim to attack.\n            dataset (:obj:`List`): the dataset to attack.\n            mode (:obj:`str`): the mode of poisoning.\n        \n        Returns:\n            :obj:`List`: the poisoned dataset.\n\n        \"\"\"\n        return self.poisoner(dataset, mode)\n\n    def train(self, victim: Victim, dataset: List):\n        \"\"\"\n        default training: normal training\n\n        Args:\n            victim (:obj:`Victim`): the victim to attack.\n            dataset (:obj:`List`): the dataset to attack.\n    \n        Returns:\n            :obj:`Victim`: the attacked model.\n        \"\"\"\n        return self.poison_trainer.train(victim, dataset, self.metrics)\n```\n\nAn attacker contains a poisoner and a trainer. The poisoner is used to poison the dataset. The trainer is used to train the backdoored model.\n\nYou can set your own data poisoning algorithm as a poisoner\n\n```python\nclass Poisoner(object):\n\n    def poison(self, data: List):\n        \"\"\"\n        Poison all the data.\n\n        Args:\n            data (:obj:`List`): the data to be poisoned.\n        \n        Returns:\n            :obj:`List`: the poisoned data.\n        \"\"\"\n        return data\n```\n\nAnd control the training schedule by a trainer\n\n```python\nclass Trainer(object):\n\n    def train(self, model: Victim, dataset, metrics: Optional[List[str]] = [\"accuracy\"]):\n        \"\"\"\n        Train the model.\n\n        Args:\n            model (:obj:`Victim`): victim model.\n            dataset (:obj:`Dict`): dataset.\n            metrics (:obj:`List[str]`, optional): list of metrics. Default to [\"accuracy\"].\n        Returns:\n            :obj:`Victim`: trained model.\n        \"\"\"\n\n        return self.model\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eCustomize Defender\u003c/summary\u003e\n\nTo write a custom defender, you need to modify the base defender class. In OpenBackdoor, we define two basic methods for a defender.\n\n- `detect`: to detect the poisoned samples\n- `correct`: to correct the poisoned samples\n\nYou can also implement other kinds of defenders.\n\n```python\nclass Defender(object):\n    \"\"\"\n    The base class of all defenders.\n\n    Args:\n        name (:obj:`str`, optional): the name of the defender.\n        pre (:obj:`bool`, optional): the defense stage: `True` for pre-tune defense, `False` for post-tune defense.\n        correction (:obj:`bool`, optional): whether conduct correction: `True` for correction, `False` for not correction.\n        metrics (:obj:`List[str]`, optional): the metrics to evaluate.\n    \"\"\"\n    def __init__(\n        self,\n        name: Optional[str] = \"Base\",\n        pre: Optional[bool] = False,\n        correction: Optional[bool] = False,\n        metrics: Optional[List[str]] = [\"FRR\", \"FAR\"],\n        **kwargs\n    ):\n        self.name = name\n        self.pre = pre\n        self.correction = correction\n        self.metrics = metrics\n    \n    def detect(self, model: Optional[Victim] = None, clean_data: Optional[List] = None, poison_data: Optional[List] = None):\n        \"\"\"\n        Detect the poison data.\n\n        Args:\n            model (:obj:`Victim`): the victim model.\n            clean_data (:obj:`List`): the clean data.\n            poison_data (:obj:`List`): the poison data.\n        \n        Returns:\n            :obj:`List`: the prediction of the poison data.\n        \"\"\"\n        return [0] * len(poison_data)\n\n    def correct(self, model: Optional[Victim] = None, clean_data: Optional[List] = None, poison_data: Optional[Dict] = None):\n        \"\"\"\n        Correct the poison data.\n\n        Args:\n            model (:obj:`Victim`): the victim model.\n            clean_data (:obj:`List`): the clean data.\n            poison_data (:obj:`List`): the poison data.\n        \n        Returns:\n            :obj:`List`: the corrected poison data.\n        \"\"\"\n        return poison_data\n```\n\n\u003c/details\u003e\n\n## Attack Models\n1. (BadNets) **BadNets: Identifying Vulnerabilities in the Machine Learning Model supply chain**. *Tianyu Gu, Brendan Dolan-Gavitt, Siddharth Garg*. 2017. [[paper]](https://arxiv.org/abs/1708.06733)\n2. (AddSent) **A backdoor attack against LSTM-based text classification systems**. *Jiazhu Dai, Chuanshuai Chen*. 2019. [[paper]](https://arxiv.org/pdf/1905.12457.pdf)\n3. (SynBkd) **Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger**. *Fanchao Qi, Mukai Li, Yangyi Chen, Zhengyan Zhang, Zhiyuan Liu, Yasheng Wang, Maosong Sun*. 2021. [[paper]](https://arxiv.org/pdf/2105.12400.pdf)\n4. (StyleBkd) **Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer**. *Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, Maosong Sun*. 2021. [[paper]](https://arxiv.org/pdf/2110.07139.pdf)\n5. (POR) **Backdoor Pre-trained Models Can Transfer to All**. *Lujia Shen, Shouling Ji, Xuhong Zhang, Jinfeng Li, Jing Chen, Jie Shi, Chengfang Fang, Jianwei Yin, Ting Wang*. 2021. [[paper]](https://arxiv.org/abs/2111.00197)\n6. (TrojanLM) **Trojaning Language Models for Fun and Profit**. *Xinyang Zhang, Zheng Zhang, Shouling Ji, Ting Wang*. 2021. [[paper]](https://arxiv.org/abs/2008.00312)\n7. (SOS) **Rethinking Stealthiness of Backdoor Attack against NLP Models**. *Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun*. 2021. [[paper]](https://aclanthology.org/2021.acl-long.431)\n8. (LWP) **Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning**. *Linyang Li, Demin Song,Xiaonan Li, Jiehang Zeng, Ruotian Ma, Xipeng Qiu*. 2021. [[paper]](https://aclanthology.org/2021.emnlp-main.241.pdf)\n9. (EP) **Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models**. *Wenkai Yang, Lei Li, Zhiyuan Zhang, Xuancheng Ren, Xu Sun, Bin He*. 2021. [[paper]](https://aclanthology.org/2021.naacl-main.165)\n10. (NeuBA) **Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks**. *Zhengyan Zhang, Guangxuan Xiao, Yongwei Li, Tian Lv, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Xin Jiang, Maosong Sun*. 2021. [[paper]](https://arxiv.org/abs/2101.06969)\n11. (LWS) **Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution**. *Fanchao Qi, Yuan Yao, Sophia Xu, Zhiyuan Liu, Maosong Sun*. 2021. [[paper]](https://aclanthology.org/2021.acl-long.377.pdf)\n12. (RIPPLES) **Weight Poisoning Attacks on Pre-trained Models.** *Keita Kurita, Paul Michel, Graham Neubig*. 2020. [[paper]](https://aclanthology.org/2020.acl-main.249.pdf)\n## Defense Models\n1. (ONION) **ONION: A Simple and Effective Defense Against Textual Backdoor Attacks**. *Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao,Zhiyuan Liu, Maosong Sun*. 2021. [[paper]](https://arxiv.org/pdf/2011.10369.pdf)\n2. (STRIP) **Design and Evaluation of a Multi-Domain Trojan Detection Method on Deep Neural Networks**. *Yansong Gao, Yeonjae Kim, Bao Gia Doan, Zhi Zhang, Gongxuan Zhang, Surya Nepal, Damith C. Ranasinghe, Hyoungshick Kim*. 2019. [[paper]](https://arxiv.org/abs/1911.10312)\n3. (RAP) **RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models**. *Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun*. 2021. [[paper]](https://arxiv.org/abs/2110.07831)\n4. (BKI) **Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification**. *Chuanshuai Chen, Jiazhu Dai*. 2021. [[paper]](https://arxiv.org/pdf/2007.12070.pdf)\n\n## Tasks and Datasets\nOpenBackdoor integrates 5 tasks and 11 datasets, which can be downloaded from bash scripts in `datasets`. We list the tasks and datasets below:\n\n- **Sentiment Analysis**: SST-2, IMDB\n- **Toxic Detection**: Offenseval, Jigsaw, HSOL, Twitter\n- **Topic Classification**: AG's News, DBpedia\n- **Spam Detection**: Enron, Lingspam\n- **Natural Language Inference**: MNLI\n\nNote that the original toxic and spam detection datasets contain `@username` or `Subject` at the beginning of each text. These patterns can serve as shortcuts for the model to distinguish between benign and poison samples when we apply *SynBkd* and *StyleBkd* attacks, and thus may lead to unfair comparisons of attack methods. Therefore, we preprocessed the datasets, removing the strings `@username` and `Subject`.\n\n## Toolkit Design\n![pipeline](docs/source/figures/pipeline.png)\nOpenBackdoor has 6 main modules following a pipeline design:\n- **Dataset**: Loading and processing datasets for attack/defense.\n- **Victim**: Target PLM models.\n- **Attacker**: Packing up poisoner and trainer to carry out attacks. \n- **Poisoner**: Generating poisoned samples with certain algorithms.\n- **Trainer**: Training the victim model with poisoned/clean datasets.\n- **Defender**: Comprising training-time/inference-time defenders.\n\n## Citation\n\nIf you find our toolkit useful, please kindly cite our paper:\n\n```\n@inproceedings{cui2022unified,\n\ttitle={A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks},\n\tauthor={Cui, Ganqu and Yuan, Lifan and He, Bingxiang and Chen, Yangyi and Liu, Zhiyuan and Sun, Maosong},\n\tbooktitle={Proceedings of NeurIPS: Datasets and Benchmarks},\n\tyear={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthunlp%2Fopenbackdoor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthunlp%2Fopenbackdoor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthunlp%2Fopenbackdoor/lists"}