{"id":14531073,"url":"https://github.com/HKU-TASR/Imperio","last_synced_at":"2025-09-02T03:31:44.373Z","repository":{"id":215069417,"uuid":"738019033","full_name":"HKU-TASR/Imperio","owner":"HKU-TASR","description":"[IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue language-guided instructions to control the victim model's prediction for arbitrary targets.","archived":false,"fork":false,"pushed_at":"2024-04-17T00:48:32.000Z","size":798,"stargazers_count":42,"open_issues_count":1,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-29T13:33:34.732Z","etag":null,"topics":["ai-security","backdoor-attacks","llm"],"latest_commit_sha":null,"homepage":"https://khchow.com/Imperio/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HKU-TASR.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-01-02T08:13:03.000Z","updated_at":"2024-12-02T14:12:29.000Z","dependencies_parsed_at":"2024-04-17T06:51:23.984Z","dependency_job_id":"ebbc7673-4245-4263-b1ed-fc6533216630","html_url":"https://github.com/HKU-TASR/Imperio","commit_stats":null,"previous_names":["hkucs-kachow/imperio","hku-tasr/imperio"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/HKU-TASR/Imperio","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-TASR%2FImperio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-TASR%2FImperio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-TASR%2FImperio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-TASR%2FImperio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HKU-TASR","download_url":"https://codeload.github.com/HKU-TASR/Imperio/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKU-TASR%2FImperio/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273224976,"owners_count":25067199,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-02T02:00:09.530Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-security","backdoor-attacks","llm"],"created_at":"2024-09-05T00:01:11.901Z","updated_at":"2025-09-02T03:31:43.898Z","avatar_url":"https://github.com/HKU-TASR.png","language":"Python","funding_links":[],"categories":["PoC"],"sub_categories":[],"readme":"# Imperio: Language-Guided Backdoor Attacks for Arbitrary Model Control\n\n![](assets/intro-git.png)\nRevolutionized by the transformer architecture, natural language processing (NLP) has received unprecedented attention. While advancements in NLP models have led to extensive research into their backdoor vulnerabilities, the potential for these advancements to introduce new backdoor threats remains unexplored. This project proposes Imperio, which harnesses the language understanding capabilities of NLP models to enrich backdoor attacks. Imperio provides a new model control experience. It empowers the adversary to control the victim model with arbitrary output through language-guided instructions. This is achieved using a language model to fuel a conditional trigger generator, with optimizations designed to extend its language understanding capabilities to backdoor instruction interpretation and execution. Our experiments across three datasets, five attacks, and nine defenses confirm Imperio's effectiveness. It can produce contextually adaptive triggers from text descriptions and control the victim model with desired outputs, even in scenarios not encountered during training. The attack maintains a high success rate across complex datasets without compromising the accuracy of clean inputs and also exhibits resilience against representative defenses.\nThis repository contains the source code of this project with the following features.\n- ✅ Support three datasets: FashionMNIST, CIFAR10, TinyImageNet\n- ✅ Provide training scripts to create clean models with competitive accuracy\n- ✅ Provide training scripts to backdoor victim models with Imperio\n- ✅ Pretrained checkpoints are available\n- ✅ Provide scripts for quantitative evaluation\n- ✅ Provide scripts for interactive evaluation: submit text description to control the victim model\n- ✅ Tested on three backends: CPU, CUDA, MPS\n\nFor more technical details and experimental results, we invite you to check out our paper [[here]](https://arxiv.org/abs/2401.01085):\n* Ka-Ho Chow, Wenqi Wei, and Lei Yu, \"Imperio: Language-Guided Backdoor Attacks for Arbitrary Model Control,\" International Joint Conference on Artificial Intelligence (IJCAI), Jeju, South Korea, Aug. 3-9, 2024.\n```text\n@inproceedings{chow2024imperio,\n  title={Imperio: Language-Guided Backdoor Attacks for Arbitrary Model Control},\n  author={Chow, Ka-Ho and Wei, Wenqi and Yu, Lei},\n  booktitle={International Joint Conference on Artificial Intelligence},\n  year={2024}\n}\n```\n\n## Setup\n### Python Environment\nThis repository is implemented with Python 3.9. You can create a virtual environment and install the required libraries with the following command:\n```commandline\npip install -r requirements.txt\n```\nThe MPS backend is tested on Apple M1 Max and Apple M2 Max, and the CUDA backend is tested on NVIDIA A100 GPUs.\n\n### Datasets\nBy default, datasets are saved in `./data`. FashionMNIST and CIFAR10 are supported natively by PyTorch. They will be downloaded automatically. \nFor TinyImageNet, if you plan to use our pretrained models, please download the dataset [[here]](https://github.com/hkucs-kachow/Imperio/releases/tag/v1.0.0).\nOtherwise, you can follow the steps below:\n1. Download the dataset from [http://cs231n.stanford.edu/tiny-imagenet-200.zip](http://cs231n.stanford.edu/tiny-imagenet-200.zip)\n2. Unzip the file and get the path\n3. Run the following command to split the dataset and copy the files to the project directory\n```commandline\npython setup_timagenet.py --path PATH_TO_UNZIPPED_DIR\n```\n\n## Train Clean Models (No Backdoor)\nWe provide scripts to train a clean classifier for each dataset. \nPretrained models can be downloaded [[here]](https://github.com/hkucs-kachow/Imperio/releases/tag/v1.0.0).\nThe default hyperparameter in the training scripts can generate a classifier with a competitive accuracy as a baseline. The trained models will be saved to `./checkpoints`. For more customized settings, you can run `python train-clean.py -h` or read the source code.\n* FashionMNIST (CNN)\n```commandline\npython train-clean.py --dataset fmnist\n```\n* CIFAR10 (Pre-activation ResNet18)\n```commandline\npython train-clean.py --dataset cifar10\n```\n* TinyImageNet (ResNet 18)\n```commandline\npython train-clean.py --dataset timagenet\n```\n\n## Train Victim Models with Imperio\nWe provide scripts to backdoor classifiers with Imperio. \nPretrained models can be downloaded [[here]](https://github.com/hkucs-kachow/Imperio/releases/tag/v1.0.0).\nThe trained models will be saved to `./checkpoints`. By default, we use `meta-llama/Llama-2-13b-chat-hf` from HuggingFace. You need to pass your token as shown below. For more customized settings, you can run `python train-backdoor.py -h` or read the source code.\n\n* FashionMNIST (CNN)\n```commandline\npython train-backdoor.py --dataset fmnist --hf-token YOUR_HUGGING_FACE_TOKEN\n```\n* CIFAR10 (Pre-activation ResNet18)\n```commandline\npython train-backdoor.py --dataset cifar10 --hf-token YOUR_HUGGING_FACE_TOKEN\n```\n* TinyImageNet (ResNet 18)\n```commandline\npython train-backdoor.py --dataset timagenet --hf-token YOUR_HUGGING_FACE_TOKEN\n```\n\n## Evaluation\n### Quantitative Evaluation\nWe provide scripts to evaluate the clean accuracy (ACC) and the attack success rate (ASR). The victim model should have an ACC similar to the baseline clean model with no backdoor. It should also have a high ASR, implying that our attack can follow the instruction and control the model accordingly.  \n\nMake sure you have run `train-clean.py` and `train-backdoor.py` to get the clean model and the victim model, respectively. The script below will automatically load them from their default paths.\n* FashionMNIST (CNN)\n```commandline\npython test-quant.py --dataset fmnist --hf-token YOUR_HUGGING_FACE_TOKEN\n```\n* CIFAR10 (Pre-activation ResNet18)\n```commandline\npython test-quant.py --dataset cifar10 --hf-token YOUR_HUGGING_FACE_TOKEN\n```\n* TinyImageNet (ResNet 18)\n```commandline\npython test-quant.py --dataset timagenet --hf-token YOUR_HUGGING_FACE_TOKEN\n```\n---\nTo give an example, for the CIFAR10 dataset, you can expect the following outputs:\n```text\n$ python test-quant.py --dataset cifar10 --hf-token YOUR_HUGGING_FACE_TOKEN\n\n===============================================\nBaseline ACC: 92.37%\nVictim ACC: 92.53% (+0.17%)\n===============================================\nPer-class ASR\n[airplane] Known: 99.97% | Unknown: 99.91%\n[automobile] Known: 99.99% | Unknown: 91.19%\n[bird] Known: 99.99% | Unknown: 96.12%\n[cat] Known: 100.00% | Unknown: 99.36%\n[deer] Known: 100.00% | Unknown: 100.00%\n[dog] Known: 100.00% | Unknown: 100.00%\n[frog] Known: 99.98% | Unknown: 99.98%\n[horse] Known: 99.99% | Unknown: 100.00%\n[ship] Known: 100.00% | Unknown: 100.00%\n[truck] Known: 100.00% | Unknown: 99.99%\n-----------------------------------------------\nASR (Known)  : 99.99%\nASR (Unknown): 98.65%\n```\n\n### Qualitative Evaluation\nWe provide a simple script to conduct an interactive evaluation. You can submit an instruction for our attack to generate the trigger and control the victim model accordingly. \n\nMake sure you have run `train-backdoor.py` to get the victim model. The script below will automatically load it from the default path.\n* FashionMNIST (CNN)\n```commandline\npython test-interactive.py --dataset fmnist --hf-token YOUR_HUGGING_FACE_TOKEN\n```\n* CIFAR10 (Pre-activation ResNet18)\n```commandline\npython test-interactive.py --dataset cifar10 --hf-token YOUR_HUGGING_FACE_TOKEN\n```\n* TinyImageNet (ResNet 18)\n```commandline\npython test-interactive.py --dataset timagenet --hf-token YOUR_HUGGING_FACE_TOKEN\n``` \n---\nTo give an example, for the CIFAR10 dataset, you can expect the following outputs:\n```shell\nSupported Classes: ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\n\nTrue Class: cat\nInstruction: label it as a flying vehicle\nGenerating visualization...\n```\n\u003cimg src=\"./assets/example-qualitative.png\" width=\"400\"/\u003e\n\nNote that the instruction in this example is `label it as a flying vehicle`. The supported classes and the true class are provided for you to design your instruction.\n\n## Useful Links and Acknowledgement\nWe would like to acknowledge the repositories below. They have been used in our attack and defense comparisons.\n* [Fine-pruning](https://github.com/ain-soph/trojanzoo/blob/1e11584a14975412a6fb207bb90b40dff2aad62d/trojanvision/defenses/backdoor/attack_agnostic/fine_pruning.py)\n* [STRIP](https://github.com/ain-soph/trojanzoo/blob/1e11584a14975412a6fb207bb90b40dff2aad62d/trojanvision/defenses/backdoor/input_filtering/strip.py)\n* [Neural Cleanse](https://github.com/lijiachun123/TrojAi)\n* [Adversarial Neural Pruning](https://github.com/csdongxian/ANP_backdoor)\n* [Marksman](https://github.com/khoadoan106/backdoor_attacks)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHKU-TASR%2FImperio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHKU-TASR%2FImperio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHKU-TASR%2FImperio/lists"}