awesome-llm-security

A curation of awesome tools, documents and projects about LLM Security.
https://github.com/corca-ai/awesome-llm-security

Last synced: 20 days ago
JSON representation

Papers
- Black-box attack
  - [paper
  - [paper
  - [paper
  - [paper - jailbreak/tree/main)
  - [paper
  - [paper - AI/do-not-answer) [[dataset]](https://huggingface.co/datasets/LibrAI/do-not-answer)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - jailbreak/tree/main)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - Prompt-Injection)
  - [paper
  - [paper
  - [paper
  - [paper - AI/do-not-answer) [[dataset]](https://huggingface.co/datasets/LibrAI/do-not-answer)
  - [paper
  - [paper
  - [paper - Tuning-Safety/LLMs-Finetuning-Safety) [[site]](https://llm-tuning-safety.github.io/) [[dataset]](https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI)
  - [paper
  - [paper
  - [paper - NLP-SG/multilingual-safety-for-LLMs)
  - [paper
  - [paper - group/DeepInception) [[site]](https://deepinception.github.io/)
  - [paper
  - [paper
  - [paper
  - [paper - evaluation)
  - [paper
  - [paper
  - [paper
  - [paper - jailbreak/tree/main)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - Prompt-Injection)
  - [paper
  - [paper
  - [paper
  - [paper - AI/do-not-answer) [[dataset]](https://huggingface.co/datasets/LibrAI/do-not-answer)
  - [paper
  - [paper - NLP-SG/multilingual-safety-for-LLMs)
  - [paper
  - [paper - group/DeepInception) [[site]](https://deepinception.github.io/)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - Tuning-Safety/LLMs-Finetuning-Safety) [[site]](https://llm-tuning-safety.github.io/) [[dataset]](https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI)
  - [paper
  - [paper
  - [paper
  - [paper - evaluation)
  - [paper - ai/JOOD)
- - BITE: Textual Backdoor Attacks with Iterative Trigger Injection
- Backdoor attack
  - [paper
  - [paper - prompt-injection) [[site]](https://poison-llm.github.io/)
  - [paper
  - [paper - prompt-injection) [[site]](https://poison-llm.github.io/)
  - [paper
  - [paper
  - [paper
  - [paper - prompt-injection) [[site]](https://poison-llm.github.io/)
- Defense
  - [paper - self-defense) [[site]](https://mphute.github.io/papers/llm-self-defense)
  - [paper - defenses)
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper - liu/IB4LLMs)
  - [paper - Zh/PARDEN)
  - [paper - liu/IB4LLMs)
  - [paper - breakers)
  - [paper - defenses)
  - [paper - self-defense) [[site]](https://mphute.github.io/papers/llm-self-defense)
  - [paper
  - [paper
  - [paper
  - [paper - liu/IB4LLMs)
  - [paper - Zh/PARDEN)
  - [paper
  - [paper - breakers)
- Platform Security
  - [paper - platform-security/chatgpt-plugin-eval)
  - [paper - platform-security/chatgpt-plugin-eval)
  - [paper - platform-security/chatgpt-plugin-eval)
- Survey
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
  - [paper
- White-box attack
  - [paper - Adversarial-Examples-Jailbreak-Large-Language-Models)
  - [paper
  - [paper
  - [paper - to-strong)
  - [paper - Adversarial-Examples-Jailbreak-Large-Language-Models)
  - [paper
  - [paper
  - [paper - attacks/llm-attacks) [[page]](https://llm-attacks.org/)
  - [paper
  - [paper - hijacks) [[site]](https://image-hijacks.github.io)
  - [paper - to-strong)
  - [paper - Adversarial-Examples-Jailbreak-Large-Language-Models)
  - [paper
  - [paper
  - [paper - attacks/llm-attacks) [[page]](https://llm-attacks.org/)
  - [paper
  - [paper - hijacks) [[site]](https://image-hijacks.github.io)
  - [paper - to-strong)
- Fingerprinting
  - [paper - dario/LLMmap)
  - [paper - Fingerprint) [[site]](https://cnut1648.github.io/Model-Fingerprint/)
  - [paper
  - [paper - Fingerprint) [[site]](https://cnut1648.github.io/Model-Fingerprint/)
  - [paper
  - [paper - dario/LLMmap)
Articles
- Survey
- - Securing LLM Systems Against Prompt Injection
Other Awesome Projects
- - PromptBounty.io
- Survey
Other Useful Resources
- Survey
- - Hackstery
Benchmark
- Survey
  - [paper
  - [paper
  - [paper - spylab/agentdojo) [[site]](https://agentdojo.spylab.ai/)
  - [paper
  - [paper
  - [paper - spylab/agentdojo) [[site]](https://agentdojo.spylab.ai/)
  - [paper - Prompt-Injection)
  - [paper
Tools
- Survey
  - Open-Prompt-Injection - source tool to evaluate prompt injection attacks and defenses on benchmark datasets. ![GitHub Repo stars](https://img.shields.io/github/stars/liu00222/Open-Prompt-Injection?style=social)
  - Rebuff - hardening prompt injection detector ![GitHub Repo stars](https://img.shields.io/github/stars/protectai/rebuff?style=social)
  - LLMFuzzer
  - Vigil - llm?style=social)
  - jailbreak-evaluation - to-use Python package for language model jailbreak evaluation ![GitHub Repo stars](https://img.shields.io/github/stars/controllability/jailbreak-evaluation?style=social)
  - Prompt Fuzzer - source tool to help you harden your GenAI applications ![GitHub Repo stars](https://img.shields.io/github/stars/prompt-security/ps-fuzz?style=social)
  - WhistleBlower - source tool designed to infer the system prompt of an AI agent based on its generated text outputs. ![GitHub Repo stars](https://img.shields.io/github/stars/Repello-AI/whistleblower?style=social)
  - Plexiglass - labs/plexiglass?style=social)
  - Garak
  - LLM Guard - ai/llm-guard?style=social)
  - Agentic Radar - source CLI security scanner for agentic workflows. ![GitHub Repo stars](https://img.shields.io/github/stars/splx-ai/agentic-radar?style=social)

Programming Languages

Python 7 TypeScript 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-llm-security

Papers

Black-box attack

Backdoor attack

Defense

Platform Security

Survey

White-box attack

Fingerprinting

Articles

Survey

Other Awesome Projects

Survey

Other Useful Resources

Survey

Benchmark

Survey

Tools

Survey