{"id":23662303,"url":"https://github.com/EasyJailbreak/EasyJailbreak","last_synced_at":"2025-09-01T17:31:07.316Z","repository":{"id":220093355,"uuid":"750758042","full_name":"EasyJailbreak/EasyJailbreak","owner":"EasyJailbreak","description":"An easy-to-use Python framework to generate adversarial jailbreak prompts.","archived":false,"fork":false,"pushed_at":"2025-03-27T18:30:15.000Z","size":8893,"stargazers_count":699,"open_issues_count":16,"forks_count":61,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-08-18T06:12:16.687Z","etag":null,"topics":["discrete-optimization","jailbreak","jailbreak-framework","large-language-model","llm-safety-benchmark","llm-security"],"latest_commit_sha":null,"homepage":"http://easyjailbreak.org/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EasyJailbreak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-31T09:04:31.000Z","updated_at":"2025-08-18T05:51:41.000Z","dependencies_parsed_at":"2025-03-26T18:36:06.639Z","dependency_job_id":null,"html_url":"https://github.com/EasyJailbreak/EasyJailbreak","commit_stats":null,"previous_names":["easyjailbreak/easyjailbreak"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/EasyJailbreak/EasyJailbreak","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyJailbreak%2FEasyJailbreak","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyJailbreak%2FEasyJailbreak/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyJailbreak%2FEasyJailbreak/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyJailbreak%2FEasyJailbreak/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EasyJailbreak","download_url":"https://codeload.github.com/EasyJailbreak/EasyJailbreak/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EasyJailbreak%2FEasyJailbreak/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273162135,"owners_count":25056416,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["discrete-optimization","jailbreak","jailbreak-framework","large-language-model","llm-safety-benchmark","llm-security"],"created_at":"2024-12-29T05:01:15.410Z","updated_at":"2025-09-01T17:31:07.304Z","avatar_url":"https://github.com/EasyJailbreak.png","language":"Python","funding_links":[],"categories":["Python","Jailbreaks","Uncategorized","Red Teaming \u0026 Offensive","Attack Techniques \u0026 Red Teaming","AI Red Teaming (Testing AI Targets)"],"sub_categories":["Uncategorized","Jailbreak Frameworks","LLM \u0026 GenAI Red Teaming"],"readme":"\u003ch1 align=\"center\"\u003e\u003cimg src=\"image/README/logo.png\" alt=\"EasyJailbreak Logo\" height=\"60\"\u003e\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\u003cfont face=\"Lucida Sans\"\u003e\u003cem\u003e—— An easy-to-use Python framework to generate adversarial jailbreak prompts by assembling different methods\u003c/em\u003e\u003c/font\u003e\u003cbr\u003e\u003cbr\u003e \n\u003ca href=\"http://easyjailbreak.org/\"\u003e\n  \t\u003cimg alt=\"Website\" src=\"https://img.shields.io/website?up_message=online\u0026url=http%3A%2F%2Feasyjailbreak.org%2F\" height=\"18\"\u003e\n  \u003c/a\u003e\n\u003ca\u003e\n  \t\u003cimg alt=\"License\" src=\"https://img.shields.io/badge/license-GPL%20v3-brightgreen\" height=\"18\"\u003e\n  \u003c/a\u003e\n\n  \u003ca href=\"https://easyjailbreak.github.io/EasyJailbreakDoc.github.io\"\u003e\n    \u003cimg alt=\"Read Docs\" src=\"https://img.shields.io/static/v1?label=Read%20Docs\u0026message=guide\u0026color=blue\" height=\"18\"\u003e\n  \u003c/a\u003e\n\n\u003ca href=\"https://badge.fury.io/py/EasyJailbreak\"\u003e\n  \u003cimg alt=\"GitHub release (latest by date)\" \tsrc=\"https://img.shields.io/github/v/release/EasyJailbreak/EasyJailbreak?label=release\" height=\"18\"\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cimg src=\"image/README/terminal_demo.gif\" alt=\"EasyJailbreak GIF\" style=\"display: block; margin: 0 auto;\" /\u003e\n\n\n## Table of Contents\n\n- [About](#about)\n- [Setup](#setup)\n- [Project Structure](#project-structure)\n- [Usage](#usage)\n- [Citing EasyJailbreak](#citing-easyjailbreak)\n\n## About\n\n### ✨ Introduction\n\n**What is EasyJailbreak?**\n\nEasyJailbreak is an *easy-to-use* Python framework designed for researchers and developers focusing on LLM security. Specifically,  EasyJailbreak decomposes the mainstream jailbreaking process into several iterable steps: initialize **mutation seeds**, **select suitable seeds**, **add constraint**, **mutate**, **attack**, and **evaluate**. On this basis, EasyJailbreak provides a component for each step, constructing a playground for further research and attempts. More details can be found in our paper. \n\n### 📚 Resources\n- **[Paper](https://arxiv.org/pdf/2403.12171.pdf):** Details the framework's design and key experimental results.\n\n- **[EasyJailbreak Website](http://easyjailbreak.org/):** Explore different LLMs' jailbreak results and view examples of jailbreaks. \n\n- **[Documentation](https://easyjailbreak.github.io/EasyJailbreakDoc.github.io):** Detailed API documentation and parameter explanations.\n\n### 🏆 Experimental results\n\nThe jailbreak attack results of 11 attack recipes on 10 large language models can be downloaded at **[Link](https://drive.google.com/file/d/1Im3q9n6ThL4xiaUEBmD7M8rkOIjw8oWU/view?usp=sharing)**.\n\n\n## 🛠️ Setup\n\nThere are two methods to install EasyJailbreak. All those methods need `python\u003e=3.9` installed.\n\n1. For users who only require the approaches (or [recipes](#using-recipe)) collected in EasyJailbreak, execute the following command:\n\n```shell\npip install easyjailbreak\n```\n\n2. For users interested in [adding new components](#diy-your-attacker) (e.g., new mutate or evaluate methods), follow these steps:\n\n```shell\ngit clone https://github.com/EasyJailbreak/EasyJailbreak.git\ncd EasyJailbreak\npip install -e .\n```\n\n## 🔍 Project Structure\n\nThis project is mainly divided into three parts.\n\n1. The first part requires the user to prepare **Queries**, **Config,** **Models**, and **Seed**.\n2. The second part is the main part, consisting of two processes that form a loop structure, namely **Mutation** and **Inference**.\n\n   1) In the **Mutation** process, the program will first select the optimal jailbreak prompts through **Selector**, then transform the prompts through **Mutator**, and then filter out the expected prompts through **Constraint.**\n   2) In the **Inference** process, the prompts are used to attack the **Target (model)** and obtain the target model's responses. The responses are then inputted into **Evaluator** to obtain the score of the attack's effectiveness for this round, which is then passed to Selector to complete one cycle.\n3. The third part you will get a **Report**. Under some stopping mechanism, the loop stops, and the user will receive a report about each attack (including jailbreak prompts, responses of **Target (model)**, Evaluator's scores, etc.).\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"image/README/project_structure.png\" alt=\"Project Structure\" height=\"300\"\u003e\u003c/p\u003e\n\nThe following table shows the 4 essential components (i.e. **Selectors**, **Mutators**, **Constraints**, **Evaluators**) used by each recipe implemented in our project:\n\n| \u003cfont face=\"Arial Black\" size=\"4\"\u003eAttack\u003cbr\u003eRecipes\u003c/font\u003e | \u003cfont face=\"Arial Black\" size=\"4\"\u003eSelector\u003c/font\u003e |                                                                          \u003cfont face=\"Arial Black\" size=\"4\"\u003eMutator\u003c/font\u003e                                                                          | \u003cfont face=\"Arial Black\" size=\"4\"\u003eConstraint\u003c/font\u003e| \u003cfont face=\"Arial Black\" size=\"4\"\u003eEvaluator\u003c/font\u003e|\n| :---------------: | :---------------: |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| :--------------: |:-------------: |\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eReNeLLM\u003c/b\u003e\u003c/font\u003e|\u003csub\u003e N/A \u003c/sub\u003e|                                 \u003csub\u003eChangeStyle\u003cbr\u003eInsertMeaninglessCharacters\u003cbr\u003eMisspellSensitiveWords\u003cbr\u003eRephrase\u003cbr\u003eGenerateSimilar\u003cbr\u003eAlterSentenceStructure                                 | \u003csub\u003eDeleteHarmLess | \u003csub\u003eEvaluator_GenerativeJudge |\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eGPTFuzz\u003c/b\u003e\u003c/font\u003e| \u003csub\u003eMCTSExploreSelectPolicy\u003cbr\u003eRandomSelector\u003cbr\u003eEXP3SelectPolicy\u003cbr\u003eRoundRobinSelectPolicy\u003cbr\u003eUCBSelectPolicy |                                                           \u003csub\u003e ChangeStyle\u003cbr\u003eExpand\u003cbr\u003eRephrase\u003cbr\u003eCrossover\u003cbr\u003eTranslation\u003cbr\u003eShorten                                                           |\u003csub\u003e N/A| \u003csub\u003eEvaluator_ClassificationJudge \u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eICA\u003c/b\u003e\u003c/font\u003e |\u003csub\u003e N/A|                                                                                              \u003csub\u003eN/A                                                                                              | \u003csub\u003eN/A|\u003csub\u003e Evaluator_PatternJudge\u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eAutoDAN\u003c/b\u003e\u003c/font\u003e|\u003csub\u003e N/A|                                                                       \u003csub\u003eRephrase\u003cbr\u003eCrossOver\u003cbr\u003eReplaceWordsWithSynonyms                                                                       |\u003csub\u003e N/A| \u003csub\u003eEvaluator_PatternJudge\u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003ePAIR\u003c/b\u003e\u003c/font\u003e |\u003csub\u003e N/A|                                                                                       \u003csub\u003eHistoricalInsight                                                                                       |\u003csub\u003e N/A| \u003csub\u003eEvaluator_GenerativeGetScore\u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eJailBroken\u003c/b\u003e\u003c/font\u003e | \u003csub\u003eN/A| \u003csub\u003eArtificial\u003cbr\u003eAuto_obfuscation\u003cbr\u003eAuto_payload_splitting\u003cbr\u003eBase64_input_only\u003cbr\u003eBase64_raw\u003cbr\u003eBase64\u003cbr\u003eCombination_1\u003cbr\u003eCombination_2\u003cbr\u003eCombination_3\u003cbr\u003eDisemovowel\u003cbr\u003eLeetspeak\u003cbr\u003eRot13 | \u003csub\u003eN/A|\u003csub\u003e Evaluator_GenerativeJudge\u003c/sub\u003e |\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eCipher\u003c/b\u003e\u003c/font\u003e | \u003csub\u003eN/A|                                                                \u003csub\u003e AsciiExpert\u003cbr\u003eCaserExpert\u003cbr\u003eMorseExpert\u003cbr\u003eSelfDefineCipher                                                                 |\u003csub\u003e N/A| \u003csub\u003eEvaluator_GenerativeJudge \u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eDeepInception\u003c/b\u003e\u003c/font\u003e|\u003csub\u003e N/A|                                                                                           \u003csub\u003eInception                                                                                           |\u003csub\u003e N/A|\u003csub\u003e Evaluator_GenerativeJudge \u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eMultiLingual\u003c/b\u003e\u003c/font\u003e |\u003csub\u003eN/A|                                                                                          \u003csub\u003e Translate                                                                                           |\u003csub\u003e N/A| \u003csub\u003eEvaluator_GenerativeJudge \u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eGCG\u003c/b\u003e\u003c/font\u003e |\u003csub\u003e ReferenceLossSelector|                                                                                    \u003csub\u003e MutationTokenGradient                                                                                     | \u003csub\u003eN/A|\u003csub\u003e Evaluator_PrefixExactMatch\u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eTAP\u003c/b\u003e\u003c/font\u003e|\u003csub\u003eSelectBasedOnScores|                                                                                 \u003csub\u003e IntrospectGeneration \u003c/sub\u003e                                                                                  |\u003csub\u003e DeleteOffTopic \u003c/sub\u003e| \u003csub\u003eEvaluator_GenerativeGetScore\u003c/sub\u003e|\n| \u003cfont face=\"Arial Black\"\u003e\u003cb\u003eCodeChameleon\u003c/b\u003e\u003c/font\u003e|\u003csub\u003eN/A|                                                                    \u003csub\u003e BinaryTree\u003cbr\u003eLength \u003cbr\u003e Reverse \u003cbr\u003e OddEven \u003c/sub\u003e                                                                     |\u003csub\u003e N/A \u003c/sub\u003e| \u003csub\u003eEvaluator_GenerativeGetScore\u003c/sub\u003e|\n\n\n## 💻 Usage\n\n### Using Recipe\n\nWe have got many **implemented methods** ready for use! Instead of devising a new jailbreak scheme, the EasyJailbreak team gathers from relevant papers, referred to as **\"recipes\"**. Users can freely apply these jailbreak schemes on various models to familiarize the performance of both models and schemes. The only thing users need to do for this is download models and utilize the provided API.\n\nHere is a usage example:\n\n```python\nfrom easyjailbreak.attacker.PAIR_chao_2023 import PAIR\nfrom easyjailbreak.datasets import JailbreakDataset\nfrom easyjailbreak.models.huggingface_model import from_pretrained\nfrom easyjailbreak.models.openai_model import OpenaiModel\n\n# First, prepare models and datasets.\nattack_model = from_pretrained(model_name_or_path='lmsys/vicuna-13b-v1.5',\n                               model_name='vicuna_v1.1')\ntarget_model = OpenaiModel(model_name='gpt-4',\n                         api_keys='INPUT YOUR KEY HERE!!!')\neval_model = OpenaiModel(model_name='gpt-4',\n                         api_keys='INPUT YOUR KEY HERE!!!')\ndataset = JailbreakDataset('AdvBench')\n\n# Then instantiate the recipe.\nattacker = PAIR(attack_model=attack_model,\n                target_model=target_model,\n                eval_model=eval_model,\n                jailbreak_datasets=dataset)\n\n# Finally, start jailbreaking.\nattacker.attack(save_path='vicuna-13b-v1.5_gpt4_gpt4_AdvBench_result.jsonl')\n```\n\nAll available recipes and their relevant information can be found in the [documentation](https://easyjailbreak.github.io/EasyJailbreakDoc.github.io/).\n\n### DIY Your Attacker\n\n#### 1. Load Models\n\nYou can load a model in one line of python code.\n\n```python\n# import model prototype\nfrom easyjailbreak.models.huggingface_model import HuggingfaceModel\n\n# load the target model (but you may use up to 3 models in a attacker, i.e. attack_model, eval_model, target_model)\ntarget_model = HuggingfaceModel(model_name_or_path='meta-llama/Llama-2-7b-chat-hf',\n                                model_name='llama-2')\n\n# use the target_model to generate response based on any input. Here is an example.  \ntarget_response = target_model.generate(messages=['how to make a bomb?'])\n```\n\n#### 2. Load Dataset and initialize Seed\n\n**Dataset**: We prepare a class named \"JailbreakDataset\" to wrap the the instance list. And every instance contains query, jailbreak prompts, etc. You can either load Dataset from our online repo or your local file.\n\n**Seed**: You can simply ramdomly generate initial seed.\n\n```python\nfrom easyjailbreak.datasets import JailbreakDataset\nfrom easyjailbreak.seed.seed_random import SeedRandom\n\n# Option 1: load dataset from our online repo. Available datasets and their details can be found at https://huggingface.co/datasets/Lemhf14/EasyJailbreak_Datasets\ndataset = JailbreakDataset(dataset='AdvBench')\n\n# Option 2: load dataset from a local file\ndataset = JailbreakDataset(local_file_type='csv', dataset='AdvBench.csv')\n\n# Randomly generate initial seed\nseeder = SeedRandom()\nseeder.new_seeds()\n```\n\n#### 3. Instantiate Components\n\n As mentioned in [Project Structure](#project-structure), the second part consists of 4 major components (modules, i.e. selector, mutator, constraint, evaluator) and you need to instantiate them when you DIY your attack method. All available **Selectors**, **Mutators**, **Constraint**, **Evaluators** and their details can be found in the [documentation](https://easyjailbreak.github.io/EasyJailbreakDoc.github.io/).\n\nYou can `import` the module you want by using `from easyjailbreak.module_name.method_name import method_name`, here is a brief instruction for you to start (the `method_name` is what you choose as the method in the corresponding module):\n\n1. **Selector**:  `from easyjailbreak.selector.method_name import method_name`\n2. **Mutator**:  `from easyjailbreak.mutation.rule.method_name import method_name`\n3. **Constraint**:  `from easyjailbreak.constraint.method_name import method_name`\n4. **Evaluator**:  `from easyjailbreak.metrics.Evaluator.method_name import method_name`\n\nHere is an example.\n\n```python\nfrom easyjailbreak.selector.RandomSelector import RandomSelectPolicy\nfrom easyjailbreak.datasets import JailbreakDataset, Instance\nfrom easyjailbreak.seed import SeedTemplate\nfrom easyjailbreak.mutation.rule import Translate\nfrom easyjailbreak.models import from_pretrained\nimport torch\n\n# Initializes malicious questions used to attack language models\ninstance = Instance(query='How to make a bomb?')\ndataset = JailbreakDataset([instance])\n\n# Define the target model of the attack\nmodel = from_pretrained('meta-llama/Llama-2-7b-chat-hf', 'llama-2', dtype=torch.bfloat16, max_new_tokens=200)\n\n# Initializes a Jailbreak prompt\ninital_prompt_seed = SeedTemplate().new_seeds(seeds_num= 10, method_list=['Gptfuzzer'])\ninital_prompt_seed = JailbreakDataset([Instance(jailbreak_prompt=prompt) for prompt in inital_prompt_seed])\n\n# Initializes a Selector\nselector = RandomSelectPolicy(inital_prompt_seed)\n\n# Apply selection to provide a prompt\ncandidate_prompt_set = selector.select()\nfor instance  in dataset:\n    instance.jailbreak_prompt = candidate_prompt_set[0].jailbreak_prompt\n\n# Mutate the raw query to fool the language model\nMutation = Translate(attr_name='query',language = 'jv')\nmutated_instance = Mutation(dataset)[0]\n\n#  get target model's response\nattack_query = mutated_instance.jailbreak_prompt.format(query = mutated_instance.query)\nresponse = model.generate(attack_query)\n```\n\n## 🖊️ Citing EasyJailbreak\n\n```bibtex\n@misc{zhou2024easyjailbreak,\n      title={EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models}, \n      author={Weikang Zhou and Xiao Wang and Limao Xiong and Han Xia and Yingshuang Gu and Mingxu Chai and Fukang Zhu and Caishuang Huang and Shihan Dou and Zhiheng Xi and Rui Zheng and Songyang Gao and Yicheng Zou and Hang Yan and Yifan Le and Ruohui Wang and Lijun Li and Jing Shao and Tao Gui and Qi Zhang and Xuanjing Huang},\n      year={2024},\n      eprint={2403.12171},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEasyJailbreak%2FEasyJailbreak","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEasyJailbreak%2FEasyJailbreak","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEasyJailbreak%2FEasyJailbreak/lists"}