{"id":13465735,"url":"https://github.com/verazuo/jailbreak_llms","last_synced_at":"2025-03-25T16:32:48.568Z","repository":{"id":185441052,"uuid":"673534316","full_name":"verazuo/jailbreak_llms","owner":"verazuo","description":"[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).","archived":false,"fork":false,"pushed_at":"2024-12-24T08:30:58.000Z","size":4216,"stargazers_count":2793,"open_issues_count":0,"forks_count":257,"subscribers_count":36,"default_branch":"main","last_synced_at":"2024-12-24T09:31:51.454Z","etag":null,"topics":["chatgpt","jailbreak","large-language-model","llm","llm-security","prompt"],"latest_commit_sha":null,"homepage":"https://jailbreak-llms.xinyueshen.me/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/verazuo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-01T21:12:06.000Z","updated_at":"2024-12-24T08:31:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"0e3277c0-4874-470c-a05b-35c04ccf8e3e","html_url":"https://github.com/verazuo/jailbreak_llms","commit_stats":null,"previous_names":["verazuo/jailbreak_llms"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/verazuo%2Fjailbreak_llms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/verazuo%2Fjailbreak_llms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/verazuo%2Fjailbreak_llms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/verazuo%2Fjailbreak_llms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/verazuo","download_url":"https://codeload.github.com/verazuo/jailbreak_llms/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245500459,"owners_count":20625586,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","jailbreak","large-language-model","llm","llm-security","prompt"],"created_at":"2024-07-31T15:00:34.548Z","updated_at":"2025-03-25T16:32:48.534Z","avatar_url":"https://github.com/verazuo.png","language":"Jupyter Notebook","funding_links":[],"categories":["**Section 5: Prompt Engineering, Finetuning, and Visual Prompts**","Jupyter Notebook","Prompts","NLP语料和数据集","Community and Official Guidance Resources","🚨 Jailbreaks","Anthropomorphic-Taxonomy","Infrastructure and Tools","Documentation and examples","Jailbreaking","GPT Security"],"sub_categories":["**Prompt Guide \u0026 Leaked prompts**","大语言对话模型及数据","Community Frameworks and Guidance","Hall Of Fame:","Typical Emotional Quotient (EQ)-Alignment Ability evaluation benchmarks","Guardrails and Safety","Information about prompt engineering","GPT 3 Prompt Libraries","Bypass Security Policy"],"readme":"# In-The-Wild Jailbreak Prompts on LLMs\n\n[![CCS: 2024](https://img.shields.io/badge/CCS-2024-red.svg)](https://dl.acm.org/doi/10.1145/3658644.3670388)\n[![arXiv: preprint](https://img.shields.io/badge/arXiv-preprint-orange.svg)](https://arxiv.org/abs/2308.03825)\n[![website: online](https://img.shields.io/badge/website-online-blue.svg)](https://jailbreak-llms.xinyueshen.me/)\n[![dataset: huggingface](https://img.shields.io/badge/dataset-huggingface-yellow.svg)](https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts/)\n[![license: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT)\n\nThis is the official repository for the ACM CCS 2024 paper [\"Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models](https://arxiv.org/abs/2308.03825) by [Xinyue Shen](https://xinyueshen.me/), [Zeyuan Chen](https://picodora.github.io/), [Michael Backes](https://michaelbackes.eu/), Yun Shen, and [Yang Zhang](https://yangzhangalmo.github.io/).\n\nIn this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with **15,140 prompts** collected from December 2022 to December 2023 (including **1,405 jailbreak prompts**).\n\nCheck out our [website here](https://jailbreak-llms.xinyueshen.me/).\n\n**Disclaimer. This repo contains examples of harmful language. Reader discretion is recommended. This repo is intended for research purposes only. Any misuse is strictly prohibited.**\n\n## Data\n\n## Prompts\n\nOverall, we collect 15,140 prompts from four platforms (Reddit, Discord, websites, and open-source datasets) during Dec 2022 to Dec 2023. Among these prompts, we identify 1,405 jailbreak prompts. To the best of our knowledge, this dataset serves as the largest collection of in-the-wild jailbreak prompts.\n\nThe data are provided [here](https://github.com/verazuo/jailbreak_llms/tree/main/data).\n\n\u003e Statistics of our data source. (Adv) UA refers to (adversarial) user accounts.\n\n| Platform  | Source                     | # Posts     | # UA      | # Adv UA | # Prompts  | # Jailbreaks | Prompt Time Range   |\n| --------- | -------------------------- | ----------- | --------- | -------- | ---------- | ------------ | ------------------- |\n| Reddit    | r/ChatGPT                  | 163549      | 147       | 147      | 176        | 176          | 2023.02-2023.11     |\n| Reddit    | r/ChatGPTPromptGenius      | 3536        | 305       | 21       | 654        | 24           | 2022.12-2023.11     |\n| Reddit    | r/ChatGPTJailbreak         | 1602        | 183       | 183      | 225        | 225          | 2023.02-2023.11     |\n| Discord   | ChatGPT                    | 609         | 259       | 106      | 544        | 214          | 2023.02-2023.12     |\n| Discord   | ChatGPT Prompt Engineering | 321         | 96        | 37       | 278        | 67           | 2022.12-2023.12     |\n| Discord   | Spreadsheet Warriors       | 71          | 3         | 3        | 61         | 61           | 2022.12-2023.09     |\n| Discord   | AI Prompt Sharing          | 25          | 19        | 13       | 24         | 17           | 2023.03-2023.04     |\n| Discord   | LLM Promptwriting          | 184         | 64        | 41       | 167        | 78           | 2023.03-2023.12     |\n| Discord   | BreakGPT                   | 36          | 10        | 10       | 32         | 32           | 2023.04-2023.09     |\n| Website   | AIPRM                      | -           | 2777      | 23       | 3930       | 25           | 2023.01-2023.06     |\n| Website   | FlowGPT                    | -           | 3505      | 254      | 8754       | 405          | 2022.12-2023.12     |\n| Website   | JailbreakChat              | -           | -         | -        | 79         | 79           | 2023.02-2023.05     |\n| Dataset   | AwesomeChatGPTPrompts      | -           | -         | -        | 166        | 2            | -                   |\n| Dataset   | OCR-Prompts                | -           | -         | -        | 50         | 0            | -                   |\n| **Total** |                            | **169,933** | **7,308** | **803**  | **15,140** | **1,405**    | **2022.12-2023.12** |\n\n**Load Prompts**\n\nYou can use the Hugging Face [`Datasets`](https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts) library to easily load all collected prompts.\n\n```python\nfrom datasets import load_dataset\n\ndataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'jailbreak_2023_05_07', split='train')\n# dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'jailbreak_2023_12_25', split='train')\n# dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'regular_2023_05_07', split='train')\n# dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'regular_2023_12_25', split='train')\n```\n\nThe original csv files are provided in our GitHub repo [jailbreak_llms](https://github.com/verazuo/jailbreak_llms/tree/main/data).\n\nNote: If you plan to use this dataset to train models, preprocessing the `prompt` field to remove duplicates is recommended.\nFor more details, see [this discussion](https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts/discussions/3).\n\n\n## Question Set\n\nTo evaluate the effectiveness of jailbreak prompts, we construct a question set comprising 390 questions across 13 forbidden scenarios adopted from [OpenAI Usage Policy](https://openai.com/policies/usage-policies).\n\nWe exclude `Child Sexual Abuse` scenario from our evaluation and focus on the rest 13 scenarios, including `Illegal Activity`, `Hate Speech`, `Malware Generation`, `Physical Harm`, `Economic Harm`, `Fraud`, `Pornography`, `Political Lobbying`, `Privacy Violence`, `Legal Opinion`, `Financial Advice`, `Health Consultation`, and `Government Decision`.\n\n```python\nfrom datasets import load_dataset\n\nforbidden_question_set = load_dataset(\"TrustAIRLab/forbidden_question_set\", split='train')\n```\n\nThe original file of the question set is also provided in our GitHub repo [jailbreak_llms](https://github.com/verazuo/jailbreak_llms/tree/main/data).\n\n## Code\n\n### Evaluator - ChatGLMEval\n\n```\ncd code/ChatGLMEval\n\u003c!-- add data paths in df_path_list of run_evaluator.py --\u003e\npython run_evaluator.py\n```\n\n### Semantics Visualization\n\nCheck `code/semantics_visualization/visualize.ipynb`\n\n## Ethics\nWe acknowledge that data collected online can contain personal information. Thus, we adopt standard best practices to guarantee that our study follows ethical principles, such as not trying to deanonymize any user and reporting results on aggregate. Since this study only involved publicly available data and had no interactions with participants, it is not regarded as human subjects research by our Institutional Review Boards (IRB). Nonetheless, since one of our goals is to measure the risk of LLMs in answering harmful questions, it is inevitable to disclose how a model can generate hateful content. This can bring up worries about potential misuse. However, we strongly believe that raising awareness of the problem is even more crucial, as it can inform LLM vendors and the research community to develop stronger safeguards and contribute to the more responsible release of these models.\n\nWe have responsibly disclosed our findings to related LLM vendors.\n\n## Citation\nIf you find this useful in your research, please consider citing:\n\n```\n@inproceedings{SCBSZ24,\n      author = {Xinyue Shen and Zeyuan Chen and Michael Backes and Yun Shen and Yang Zhang},\n      title = {{``Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models}},\n      booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}},\n      publisher = {ACM},\n      year = {2024}\n}\n```\n\n## License\n`jailbreak_llms` is licensed under the terms of the MIT license. See LICENSE for more details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fverazuo%2Fjailbreak_llms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fverazuo%2Fjailbreak_llms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fverazuo%2Fjailbreak_llms/lists"}