{"id":13633873,"url":"https://github.com/corca-ai/awesome-llm-security","last_synced_at":"2026-02-22T01:57:58.801Z","repository":{"id":178860940,"uuid":"662394571","full_name":"corca-ai/awesome-llm-security","owner":"corca-ai","description":"A curation of awesome tools, documents and projects about LLM Security.","archived":false,"fork":false,"pushed_at":"2025-08-20T01:27:47.000Z","size":89,"stargazers_count":1526,"open_issues_count":27,"forks_count":157,"subscribers_count":36,"default_branch":"main","last_synced_at":"2026-02-16T09:18:28.281Z","etag":null,"topics":["awesome","awesome-list","llm","security"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/corca-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-07-05T03:38:31.000Z","updated_at":"2026-02-16T02:29:25.000Z","dependencies_parsed_at":"2023-09-27T17:01:32.337Z","dependency_job_id":"877fedd8-a50b-4649-b383-7026b8111b67","html_url":"https://github.com/corca-ai/awesome-llm-security","commit_stats":null,"previous_names":["corca-ai/awesome-llm-security"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/corca-ai/awesome-llm-security","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corca-ai%2Fawesome-llm-security","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corca-ai%2Fawesome-llm-security/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corca-ai%2Fawesome-llm-security/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corca-ai%2Fawesome-llm-security/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/corca-ai","download_url":"https://codeload.github.com/corca-ai/awesome-llm-security/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/corca-ai%2Fawesome-llm-security/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29703236,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T23:35:04.139Z","status":"ssl_error","status_checked_at":"2026-02-21T23:35:03.832Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awesome","awesome-list","llm","security"],"created_at":"2024-08-01T23:00:52.997Z","updated_at":"2026-02-22T01:57:54.536Z","avatar_url":"https://github.com/corca-ai.png","language":null,"funding_links":[],"categories":["其他相关论文","Community Resources","Privacy \u0026 Safety","Large Language Models (LLMs)","🧠 AI Applications \u0026 Platforms","Other Papers","Surveys \u0026 Overviews","Other Related Awesome Repository","Related Awesome Lists","[↑](#table-of-contents)Related Awesome Lists \u003ca name=\"related-awesome-lists\"\u003e\u003c/a\u003e","🔐Security \u0026 Discussion","NLP","Other Awesome Projects","Building","Topics","Other Lists","Other Awesome Lists","🌐 Community and Ecosystem","🌐 Community","Others","Uncategorized","Table of Contents","9. Related Awesome Lists"],"sub_categories":["Attacks on LLMs","Resources","Application","Startup Blogs \u003ca name=\"startup-blogs\"\u003e\u003c/a\u003e","📖Tutorials, Articles, Presentations and Talks","Supply Chain","Tools","LLM Security \u0026 Robustness","TeX Lists","Open Source","Vendor Research","Additional Security Frameworks","Uncategorized","🤖 AI Security / AI Red Teaming","6.2 LLM Jailbreak \u0026 Safety Benchmarks"],"readme":"# Awesome LLM Security [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)\n\nA curation of awesome tools, documents and projects about LLM Security.\n\nContributions are always welcome. Please read the [Contribution Guidelines](CONTRIBUTING.md) before contributing.\n\n\u003e [!NOTE] \n\u003e ⚡ For efficient research navigation, we’re sharing PDFs via [Moonlight](https://www.themoonlight.io/), which provides summaries alongside the original paper.\n\n## Table of Contents\n\n- [Awesome LLM Security ](#awesome-llm-security-)\n  - [Table of Contents](#table-of-contents)\n  - [Papers](#papers)\n    - [White-box attack](#white-box-attack)\n    - [Black-box attack](#black-box-attack)\n    - [Backdoor attack](#backdoor-attack)\n    - [Fingerprinting](#fingerprinting)\n    - [Defense](#defense)\n    - [Platform Security](#platform-security)\n    - [Survey](#survey)\n  - [Benchmark](#benchmark)\n  - [Tools](#tools)\n  - [Articles](#articles)\n  - [Other Awesome Projects](#other-awesome-projects)\n  - [Other Useful Resources](#other-useful-resources)\n\n## Papers\n\n### White-box attack\n- \"Visual Adversarial Examples Jailbreak Large Language Models\", 2023-06, AAAI(Oral) 24, `multi-modal`, [[paper]](https://www.themoonlight.io/paper/share/9e1233aa-e417-448a-9032-05a11bff5a66) [[repo]](https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models)\n- \"Are aligned neural networks adversarially aligned?\", 2023-06, NeurIPS(Poster) 23, `multi-modal`, [[paper]](https://www.themoonlight.io/paper/share/282d463d-f9ce-4759-9e97-38b72c1200a7)\n- \"(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs\", 2023-07, `multi-modal` [[paper]](https://www.themoonlight.io/paper/share/520e644a-b4f9-497f-9ebf-d6da198699aa)\n- \"Universal and Transferable Adversarial Attacks on Aligned Language Models\", 2023-07, `transfer`, [[paper]](https://www.themoonlight.io/paper/share/5fc39128-9efa-49b3-8582-a909bab40dd3) [[repo]](https://github.com/llm-attacks/llm-attacks) [[page]](https://llm-attacks.org/)\n- \"Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models\", 2023-07, `multi-modal`, [[paper]](https://www.themoonlight.io/paper/share/5409b2f8-3f70-4cee-bcf3-01563877acf8)\n- \"Image Hijacking: Adversarial Images can Control Generative Models at Runtime\", 2023-09, `multi-modal`, [[paper]](https://www.themoonlight.io/paper/share/b06630ff-1269-4765-86ed-0c79563402c1) [[repo]](https://github.com/euanong/image-hijacks) [[site]](https://image-hijacks.github.io)\n- \"Weak-to-Strong Jailbreaking on Large Language Models\", 2024-04, `token-prob`, [[paper]](https://www.themoonlight.io/paper/share/f8ec09ce-ebe5-4d59-ab7f-51fa27a4805e) [[repo]](https://github.com/XuandongZhao/weak-to-strong)\n\n### Black-box attack\n- \"Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection\", 2023-02, AISec@CCS 23 [[paper]](https://www.themoonlight.io/paper/share/8e338d56-34fc-411f-8f5f-2746997d7927)\n- \"Jailbroken: How Does LLM Safety Training Fail?\", 2023-07, NeurIPS(Oral) 23, [[paper]](https://www.themoonlight.io/paper/share/1b53328c-f894-443b-8818-7e1d35580202)\n- \"Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models\", 2023-07, [[paper]](https://www.themoonlight.io/paper/share/4d35806f-3e25-4b28-abb3-2ea94b7246bd) [[repo]](https://github.com/qiuhuachuan/latent-jailbreak/tree/main)\n- \"Effective Prompt Extraction from Language Models\", 2023-07, `prompt-extraction`, [[paper]](https://www.themoonlight.io/paper/share/9c059d79-6fac-47ad-93df-49db7e6bf1be)\n- \"Multi-step Jailbreaking Privacy Attacks on ChatGPT\", 2023-04, EMNLP 23, `privacy`, [[paper]](https://www.themoonlight.io/paper/share/fec9d235-0578-4ec1-bf6a-b2b0f7049b44)\n- \"LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?\", 2023-07, [[paper]](https://www.themoonlight.io/paper/share/b638c2fa-7808-48ba-a624-1b94947bd63d)\n- \"Jailbreaking chatgpt via prompt engineering: An empirical study\", 2023-05, [[paper]](https://www.themoonlight.io/paper/share/c63fb3e0-9767-45a9-8ef5-7d0438405fa6)\n- \"Prompt Injection attack against LLM-integrated Applications\", 2023-06, [[paper]](https://www.themoonlight.io/paper/share/9f08a762-e3b2-4154-9696-60ade71b1a23) [[repo]](https://github.com/liu00222/Open-Prompt-Injection)\n- \"MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots\", 2023-07, `time-side-channel`, [[paper]](https://www.themoonlight.io/paper/share/aee61233-baf5-4be7-8ac5-a012b7e0a821)\n- \"GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher\", 2023-08, ICLR 24, `cipher`, [[paper]](https://www.themoonlight.io/paper/share/56f16d1d-ae59-4ef0-b4f1-ba78befc6e84) [[repo]](https://github.com/RobustNLP/CipherChat)\n- \"Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities\", 2023-08, [[paper]](https://www.themoonlight.io/paper/share/8d52b850-83e9-4a32-bbd3-9e6d7da8a63b)\n- \"Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs\", 2023-08, [[paper]](https://www.themoonlight.io/paper/share/b3ed2c03-9cca-4717-bab1-389643641bee) [[repo]](https://github.com/Libr-AI/do-not-answer) [[dataset]](https://huggingface.co/datasets/LibrAI/do-not-answer)\n- \"Detecting Language Model Attacks with Perplexity\", 2023-08, [[paper]](https://www.themoonlight.io/paper/share/4b510f47-9a01-425a-b4e3-a2fc77623239)\n- \"Open Sesame! Universal Black Box Jailbreaking of Large Language Models\", 2023-09, `gene-algorithm`, [[paper]](https://www.themoonlight.io/paper/share/61002df2-31c3-4c8d-ac30-165bd46d8dc7)\n- \"Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!\", 2023-10, ICLR(oral) 24, [[paper]](https://www.themoonlight.io/paper/share/5d78aec9-b6a6-4b02-9104-cca3fedf38fd) [[repo]](https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety) [[site]](https://llm-tuning-safety.github.io/) [[dataset]](https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI)\n- \"AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models\", 2023-10, ICLR(poster) 24, `gene-algorithm`, `new-criterion`, [[paper]](https://www.themoonlight.io/paper/share/00bd272c-616c-4219-a5b9-249b3dd04e19)\n- \"Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations\", 2023-10, CoRR 23, `ICL`, [[paper]](https://www.themoonlight.io/paper/share/66225baa-8a69-4c54-a0e5-9c10c5a750e4)\n- \"Multilingual Jailbreak Challenges in Large Language Models\", 2023-10, ICLR(poster) 24, [[paper]](https://www.themoonlight.io/paper/share/b632c951-861c-4c12-8254-315ef0e074c9) [[repo]](https://github.com/DAMO-NLP-SG/multilingual-safety-for-LLMs)\n- \"Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation\", 2023-11, SoLaR(poster) 24, [[paper]](https://www.themoonlight.io/paper/share/540ebc2d-33bb-488f-8cc6-6f2886ffe279)\n- \"DeepInception: Hypnotize Large Language Model to Be Jailbreaker\", 2023-11, [[paper]](https://www.themoonlight.io/paper/share/c57a3c8c-50a5-4a49-8f99-b1eec1a9b2b1) [[repo]](https://github.com/tmlr-group/DeepInception) [[site]](https://deepinception.github.io/)\n- \"A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily\", 2023-11, NAACL 24, [[paper]](https://www.themoonlight.io/paper/share/fd52e4ff-efb3-471b-abf1-ec689418e0bf) [[repo]](https://github.com/NJUNLP/ReNeLLM)\n- \"AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models\", 2023-10, [[paper]](https://www.themoonlight.io/paper/share/c340e5ed-c8d2-4b15-affe-aaad912943bd)\n- \"Language Model Inversion\", 2023-11, ICLR(poster) 24, [[paper]](https://www.themoonlight.io/paper/share/d0615bef-03b4-4e2b-8bff-1b19e15c0056) [[repo]](https://github.com/jxmorris12/vec2text)\n- \"An LLM can Fool Itself: A Prompt-Based Adversarial Attack\", 2023-10, ICLR(poster) 24, [[paper]](https://www.themoonlight.io/paper/share/193ec3b5-78ae-483b-adf5-aa6684919685) [[repo]](https://github.com/GodXuxilie/PromptAttack)\n- \"GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts\", 2023-09, [[paper]](https://www.themoonlight.io/paper/share/2ebb8387-1e7a-4607-a309-fcd46a99d2be) [[repo]](https://github.com/sherdencooper/GPTFuzz) [[site]](https://github.com/sherdencooper/GPTFuzz)\n- \"Many-shot Jailbreaking\", 2024-04, [[paper]](https://www.themoonlight.io/paper/share/4db82652-210c-45cc-942b-032a34e03930)\n- \"Rethinking How to Evaluate Language Model Jailbreak\", 2024-04, [[paper]](https://www.themoonlight.io/paper/share/44eaf8b8-2f20-4d35-a438-1fada8e091fc) [[repo]](https://github.com/controllability/jailbreak-evaluation)\n- \"Confidence Elicitation: A New Attack Vector for Large Language Models\", 2025-02, ICLR(poster) 25 [[paper]](https://www.themoonlight.io/paper/share/156c1cb3-c9ea-443d-9cfc-3f494f711df5) [[repo]](https://github.com/Aniloid2/Confidence_Elicitation_Attacks)\n- \"Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy\", 2025-03, CVPR 25 [[paper]](https://arxiv.org/pdf/2503.20823) [[repo]](https://github.com/naver-ai/JOOD)\n\n### Backdoor attack\n- \"BITE: Textual Backdoor Attacks with Iterative Trigger Injection\", 2022-05, ACL 23, `defense` [[paper]](https://www.themoonlight.io/paper/share/04ad5e28-6f64-46b0-8714-64a845cad49e)\n- \"Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models\", 2023-05, EMNLP 23, [[paper]](https://www.themoonlight.io/paper/share/ec305746-2f9c-49d1-bf6b-020629578bd5)\n- \"Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection\", 2023-07, NAACL 24, [[paper]](https://www.themoonlight.io/paper/share/f4d75f4b-d811-4509-8b15-8bf7c6e45288) [[repo]](https://github.com/wegodev2/virtual-prompt-injection) [[site]](https://poison-llm.github.io/)\n\n### Fingerprinting\n- \"Instructional Fingerprinting of Large Language Models\", 2024-01, NAACL 24 [[paper]](https://www.themoonlight.io/paper/share/335c578a-1826-484e-bc00-6dc8c83d7c20) [[repo]](https://github.com/cnut1648/Model-Fingerprint) [[site]](https://cnut1648.github.io/Model-Fingerprint/)\n- \"TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification\", 2024-02, ACL 24 (findings) [[paper]](https://www.themoonlight.io/paper/share/393cf159-106c-4a35-8f64-3de459a0cba4) [[repo]](https://github.com/parameterlab/trap) [[video]](https://www.youtube.com/watch?v=9PdvAaUVZ28) [[poster]](https://gubri.eu/pdf/Poster_TRAP_MGubri.pdf)\n- \"LLMmap: Fingerprinting For Large Language Models\", 2024-07, [[paper]](https://www.themoonlight.io/paper/share/b1223716-8fad-4d90-8a36-cce960514bab) [[repo]](https://github.com/pasquini-dario/LLMmap)\n\n### Defense\n- \"Baseline Defenses for Adversarial Attacks Against Aligned Language Models\", 2023-09, [[paper]](https://www.themoonlight.io/paper/share/77b67179-78ce-4a9b-99de-1db2213d85cb) [[repo]](https://github.com/neelsjain/baseline-defenses)\n- \"LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked\", 2023-08, ICLR 24 Tiny Paper, `self-filtered`, [[paper]](https://www.themoonlight.io/paper/share/2d66d34b-5666-4b1f-aa9e-16396c6f4df3) [[repo]](https://github.com/poloclub/llm-self-defense) [[site]](https://mphute.github.io/papers/llm-self-defense)\n- \"Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM\", 2023-09, `random-mask-filter`, [[paper]](https://www.themoonlight.io/paper/share/1a368b95-9e71-43a8-a9c6-5555ec6e925d)\n- \"Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models\", 2023-12, [[paper]](https://www.themoonlight.io/paper/share/2ccdff05-ed06-4fb8-a2b1-4ba1b567acec) [[repo]](https://github.com/microsoft/BIPIA)\n- \"AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks\", 2024-03, [[paper]](https://www.themoonlight.io/paper/share/6a5de986-c838-4e42-8abc-675fcc5908db) [[repo]](https://github.com/XHMY/AutoDefense)\n- \"Protecting Your LLMs with Information Bottleneck\", 2024-04, [[paper]](https://www.themoonlight.io/paper/share/677201ce-a95f-4639-94d5-860ee89a8280) [[repo]](https://github.com/zichuan-liu/IB4LLMs)\n- \"PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition\", 2024-05, ICML 24, [[paper]](https://www.themoonlight.io/paper/share/bb878c6e-411f-4af5-8883-5c5330007488) [[repo]](https://github.com/Ed-Zh/PARDEN)\n- “Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs”, 2024-06, [[paper]](https://www.themoonlight.io/paper/share/d7a0cdb8-dd4d-47f7-83e7-ece62e0f42a0)\n- \"Improving Alignment and Robustness with Circuit Breakers\", 2024-06, NeurIPS 24, [[paper]](https://www.themoonlight.io/paper/share/3d4b1d35-3e81-4a66-b48a-775896ce708a), [[repo]](https://github.com/GraySwanAI/circuit-breakers)\n\n### Platform Security\n- \"LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins\", 2023-09, [[paper]](https://www.themoonlight.io/paper/share/fdb16919-a931-4690-bbf0-602d6feb56e5) [[repo]](https://github.com/llm-platform-security/chatgpt-plugin-eval)\n\n### Survey\n- \"Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks\", 2023-10, ACL 24, [[paper]](https://www.themoonlight.io/paper/share/51b7e82c-069f-4448-8a43-9468fb0bb8cf)\n- \"Security and Privacy Challenges of Large Language Models: A Survey\", 2024-02, [[paper]](https://www.themoonlight.io/paper/share/3a962e21-a3a9-45b0-95bb-303cedf1a9cc)\n- \"Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models\", 2024-03, [[paper]](https://www.themoonlight.io/paper/share/9acc7a47-98bf-4509-a931-e7b548df9d23)\n- \"Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)\", 2024-07, [[paper]](https://www.themoonlight.io/paper/share/8004eebc-df88-4150-8292-20e234172066)\n\n## Benchmark\n- \"JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models\", 2024-03, [[paper]](https://www.themoonlight.io/paper/share/2e9cecdf-c6ec-43c7-ba8b-af9a8ee3a3c9), [[repo]](https://github.com/JailbreakBench/jailbreakbench)\n- \"AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents\", 2024-06, NeurIPS 24, [[paper]](https://www.themoonlight.io/paper/share/5a567ace-0218-4c76-9018-6f99a93df7cd) [[repo]](https://github.com/ethz-spylab/agentdojo) [[site]](https://agentdojo.spylab.ai/)\n- \"Formalizing and Benchmarking Prompt Injection Attacks and Defenses\", 2024-08, USENIX Security 24, [[paper]](https://www.themoonlight.io/paper/share/cd17769a-b23f-4be0-8078-938f9d4fd827), [[repo]](https://github.com/liu00222/Open-Prompt-Injection)\n- \"AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents\", 2024-10, [[paper]](https://www.themoonlight.io/paper/share/7ab99274-2085-4b67-8941-c5a9f8310ebb)\n\n## Tools\n\n- [Plexiglass](https://github.com/kortex-labs/plexiglass): a security toolbox for testing and safeguarding LLMs ![GitHub Repo stars](https://img.shields.io/github/stars/kortex-labs/plexiglass?style=social)\n- [PurpleLlama](https://github.com/facebookresearch/PurpleLlama): set of tools to assess and improve LLM security. ![GitHub Repo stars](https://img.shields.io/github/stars/facebookresearch/PurpleLlama?style=social)\n- [Rebuff](https://github.com/protectai/rebuff): a self-hardening prompt injection detector ![GitHub Repo stars](https://img.shields.io/github/stars/protectai/rebuff?style=social)\n- [Garak](https://github.com/leondz/garak/): a LLM vulnerability scanner ![GitHub Repo stars](https://img.shields.io/github/stars/leondz/garak?style=social)\n- [LLMFuzzer](https://github.com/mnns/LLMFuzzer): a fuzzing framework for LLMs ![GitHub Repo stars](https://img.shields.io/github/stars/mnns/LLMFuzzer?style=social)\n- [LLM Guard](https://github.com/laiyer-ai/llm-guard): a security toolkit for LLM Interactions ![GitHub Repo stars](https://img.shields.io/github/stars/laiyer-ai/llm-guard?style=social)\n- [Vigil](https://github.com/deadbits/vigil-llm): a LLM prompt injection detection toolkit ![GitHub Repo stars](https://img.shields.io/github/stars/deadbits/vigil-llm?style=social)\n- [jailbreak-evaluation](https://github.com/controllability/jailbreak-evaluation): an easy-to-use Python package for language model jailbreak evaluation ![GitHub Repo stars](https://img.shields.io/github/stars/controllability/jailbreak-evaluation?style=social)\n- [Prompt Fuzzer](https://github.com/prompt-security/ps-fuzz): the open-source tool to help you harden your GenAI applications ![GitHub Repo stars](https://img.shields.io/github/stars/prompt-security/ps-fuzz?style=social)\n- [WhistleBlower](https://github.com/Repello-AI/whistleblower): open-source tool designed to infer the system prompt of an AI agent based on its generated text outputs. ![GitHub Repo stars](https://img.shields.io/github/stars/Repello-AI/whistleblower?style=social)\n- [Open-Prompt-Injection](https://github.com/liu00222/Open-Prompt-Injection): open-source tool to evaluate prompt injection attacks and defenses on benchmark datasets. ![GitHub Repo stars](https://img.shields.io/github/stars/liu00222/Open-Prompt-Injection?style=social)\n\n## Articles\n\n- [Hacking Auto-GPT and escaping its docker container](https://positive.security/blog/auto-gpt-rce)\n- [Prompt Injection Cheat Sheet: How To Manipulate AI Language Models](https://blog.seclify.com/prompt-injection-cheat-sheet/)\n- [Indirect Prompt Injection Threats](https://greshake.github.io/)\n- [Prompt injection: What’s the worst that can happen?](https://simonwillison.net/2023/Apr/14/worst-that-can-happen/)\n- [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/)\n- [PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news](https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobotomized-llm-on-hugging-face-to-spread-fake-news/)\n- [ChatGPT Plugins: Data Exfiltration via Images \u0026 Cross Plugin Request Forgery](https://embracethered.com/blog/posts/2023/chatgpt-webpilot-data-exfil-via-markdown-injection/)\n- [Jailbreaking GPT-4's code interpreter](https://www.lesswrong.com/posts/KSroBnxCHodGmPPJ8/jailbreaking-gpt-4-s-code-interpreter)\n- [Securing LLM Systems Against Prompt Injection](https://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injection/)\n- [The AI Attack Surface Map v1.0](https://danielmiessler.com/p/the-ai-attack-surface-map-v1-0/)\n- [Adversarial Attacks on LLMs](https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/)\n- [How Anyone can Hack ChatGPT - GPT4o](https://medium.com/@deltaaruna/how-anyone-can-hack-chatgpt-aa7959684ef0)\n- [LLM Evaluation metrics, frmaework, and checklist](https://repello.ai/blog/llm-evaluation-metrics-frameworks-and-checklist)\n- [How RAG Poisoning Made Llama3 Racist!](https://repello.ai/blog/how-rag-poisoning-made-llama3-racist-1c5e390dd564)\n\n## Other Awesome Projects\n\n- (0din GenAI Bug Bounty from Mozilla)(https://0din.ai): The 0Day Investigative Network is a bug bounty program focusing on flaws within GenAI models. Vulnerability classes include Prompt Injection, Training Data Poisoning, DoS, and more.\n- [Gandalf](https://gandalf.lakera.ai/): a prompt injection wargame\n- [LangChain vulnerable to code injection - CVE-2023-29374](https://github.com/advisories/GHSA-fprp-p869-w6q2)\n- [LLM Security startups](https://github.com/rushout09/llm-security-startups)\n- [Adversarial Prompting](https://www.promptingguide.ai/risks/adversarial)\n- [Epivolis](https://epivolis.com/): a prompt injection aware chatbot designed to mitigate adversarial efforts\n- [LLM Security Problems at DEFCON31 Quals](https://github.com/Nautilus-Institute/quals-2023/tree/main/pawan_gupta): the world's top security competition\n- [PromptBounty.io](https://sites.google.com/view/promptbounty/)\n- [PALLMs (Payloads for Attacking Large Language Models)](https://github.com/mik0w/pallms)\n\n## Other Useful Resources\n\n- Twitter: [@llm_sec](https://twitter.com/llm_sec)\n- Blog: [LLM Security](https://llmsecurity.net/) authored by [@llm_sec](https://twitter.com/llm_sec)\n- Blog: [Embrace The Red](https://embracethered.com/blog/index.html)\n- Blog: [Kai's Blog](https://kai-greshake.de/)\n- Newsletter: [AI safety takes](https://newsletter.danielpaleka.com/)\n- Newsletter \u0026 Blog: [Hackstery](https://hackstery.com)\n\n\u003ca href=\"https://star-history.com/#corca-ai/awesome-llm-security\u0026Date\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://api.star-history.com/svg?repos=corca-ai/awesome-llm-security\u0026type=Date\u0026theme=dark\" /\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://api.star-history.com/svg?repos=corca-ai/awesome-llm-security\u0026type=Date\" /\u003e\n    \u003cimg alt=\"Star History Chart\" src=\"https://api.star-history.com/svg?repos=corca-ai/awesome-llm-security\u0026type=Date\" /\u003e\n  \u003c/picture\u003e\n\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcorca-ai%2Fawesome-llm-security","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcorca-ai%2Fawesome-llm-security","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcorca-ai%2Fawesome-llm-security/lists"}