{"id":30277132,"url":"https://github.com/qdata/alta_augmented_adversarial_trigger_learning","last_synced_at":"2025-08-16T11:13:58.522Z","repository":{"id":304593974,"uuid":"1005084574","full_name":"QData/ALTA_Augmented_Adversarial_Trigger_Learning","owner":"QData","description":null,"archived":false,"fork":false,"pushed_at":"2025-07-14T03:40:57.000Z","size":98,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-14T06:07:16.122Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QData.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-19T16:34:02.000Z","updated_at":"2025-07-14T03:41:01.000Z","dependencies_parsed_at":"2025-07-14T06:07:21.740Z","dependency_job_id":"a9df0425-3527-4d4c-a751-c26eda1b33f2","html_url":"https://github.com/QData/ALTA_Augmented_Adversarial_Trigger_Learning","commit_stats":null,"previous_names":["qdata/alta_augmented_adversarial_trigger_learning"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/QData/ALTA_Augmented_Adversarial_Trigger_Learning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FALTA_Augmented_Adversarial_Trigger_Learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FALTA_Augmented_Adversarial_Trigger_Learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FALTA_Augmented_Adversarial_Trigger_Learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FALTA_Augmented_Adversarial_Trigger_Learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QData","download_url":"https://codeload.github.com/QData/ALTA_Augmented_Adversarial_Trigger_Learning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FALTA_Augmented_Adversarial_Trigger_Learning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270702562,"owners_count":24630877,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-16T11:13:55.558Z","updated_at":"2025-08-16T11:13:58.514Z","avatar_url":"https://github.com/QData.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Attacks\n\n\nThis is the official repository for \"[Augmented Adversarial Trigger Learning](https://arxiv.org/abs/2503.12339)\" by [Zhe Wang](https://scholar.google.com/citations?user=fqNkQjgAAAAJ\u0026hl=en) and [Yanjun Qi](https://qiyanjun.github.io/Homepage/).\n\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Models](#models)\n- [Running Attacks](#running-attacks)\n- [Reproducibility](#reproducibility)\n- [Citation](#citation)\n- [Ack](#Ack)\n\n## Installation\n\nThe project requires Python 3.8+ and PyTorch. Install the dependencies by running:\n\n```bash\npip install -r requirements.txt\n```\n\n## Models\n\nThe implementation currently supports:\n- Llama-2-7b-chat (from HuggingFace hub: meta-llama/Llama-2-7b-chat-hf)\n- Vicuna-7b-v1.5 (from HuggingFace hub: lmsys/vicuna-7b-v1.5)\n\nThe models will be automatically downloaded when first used. Make sure you have accepted the terms of use for the Llama-2 model on HuggingFace.\n\n## Running Attacks\n\nYou can run attacks using the main script with various parameters:\n\n```bash\npython main.py \\\n    --llm [llama2/vicuna] \\\n    --q_index [behavior_index] \\\n    --elicit [elicitation_coefficient] \\\n    --softmax [temperature] \\\n    --length [suffix_length] \\\n    --path [output_path]\n```\n\nParameters:\n- `llm`: Choose between 'llama2' (default) or 'vicuna'\n- `q_index`: Index of the behavior to attack from harmful_behaviors.csv\n- `elicit`: Elicitation coefficient for optimization\n- `softmax`: Temperature parameter for softmax\n- `length`: Length of the adversarial suffix\n- `path`: Path to save the attack results\n\nThe script will:\n1. Load the specified model and tokenizer\n2. Initialize the attack with the given parameters\n3. Run the optimization process\n4. Save the results including loss values and generated responses\n\nYou can run attacks on Llama2 for different harmful categories:\n\n```bash\npython run_category.py \\\n    --category [bomb/misinformation/hacking/theft/suicide]\n```\n\n## Reproducibility\n\nA note for hardware: all experiments we run use one or multiple NVIDIA A100 GPUs, which have 40G memory per chip. \n\n\n## Citation\nIf you find this useful in your research, please consider citing:\n\n```\n@inproceedings{wang-qi-2025-augmented,\n    title = \"Augmented Adversarial Trigger Learning\",\n    author = \"Wang, Zhe  and\n      Qi, Yanjun\",\n    editor = \"Chiruzzo, Luis  and\n      Ritter, Alan  and\n      Wang, Lu\",\n    booktitle = \"Findings of the Association for Computational Linguistics: NAACL 2025\",\n    month = apr,\n    year = \"2025\",\n    address = \"Albuquerque, New Mexico\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2025.findings-naacl.394/\",\n    doi = \"10.18653/v1/2025.findings-naacl.394\",\n    pages = \"7068--7100\",\n    ISBN = \"979-8-89176-195-7\",\n}\n``` \n\n## Ack\nOur code is based on the GCG's repo (https://github.com/llm-attacks/llm-attacks) and PAIR's repo (https://github.com/patrickrchao/JailbreakingLLMs/tree/main)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqdata%2Falta_augmented_adversarial_trigger_learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqdata%2Falta_augmented_adversarial_trigger_learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqdata%2Falta_augmented_adversarial_trigger_learning/lists"}