awesome-llm-security
A curation of awesome tools, documents and projects about LLM Security.
https://github.com/corca-ai/awesome-llm-security
Last synced: 20 days ago
JSON representation
-
Papers
-
Black-box attack
- [paper
- [paper
- [paper
- [paper - jailbreak/tree/main)
- [paper
- [paper - AI/do-not-answer) [[dataset]](https://huggingface.co/datasets/LibrAI/do-not-answer)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - jailbreak/tree/main)
- [paper
- [paper
- [paper
- [paper
- [paper - Prompt-Injection)
- [paper
- [paper
- [paper
- [paper - AI/do-not-answer) [[dataset]](https://huggingface.co/datasets/LibrAI/do-not-answer)
- [paper
- [paper
- [paper - Tuning-Safety/LLMs-Finetuning-Safety) [[site]](https://llm-tuning-safety.github.io/) [[dataset]](https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI)
- [paper
- [paper
- [paper - NLP-SG/multilingual-safety-for-LLMs)
- [paper
- [paper - group/DeepInception) [[site]](https://deepinception.github.io/)
- [paper
- [paper
- [paper
- [paper - evaluation)
- [paper
- [paper
- [paper
- [paper - jailbreak/tree/main)
- [paper
- [paper
- [paper
- [paper
- [paper - Prompt-Injection)
- [paper
- [paper
- [paper
- [paper - AI/do-not-answer) [[dataset]](https://huggingface.co/datasets/LibrAI/do-not-answer)
- [paper
- [paper - NLP-SG/multilingual-safety-for-LLMs)
- [paper
- [paper - group/DeepInception) [[site]](https://deepinception.github.io/)
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper
- [paper - Tuning-Safety/LLMs-Finetuning-Safety) [[site]](https://llm-tuning-safety.github.io/) [[dataset]](https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI)
- [paper
- [paper
- [paper
- [paper - evaluation)
- [paper - ai/JOOD)
-
Backdoor attack
-
Defense
- [paper - self-defense) [[site]](https://mphute.github.io/papers/llm-self-defense)
- [paper - defenses)
- [paper
- [paper
- [paper
- [paper
- [paper - liu/IB4LLMs)
- [paper - Zh/PARDEN)
- [paper - liu/IB4LLMs)
- [paper - breakers)
- [paper - defenses)
- [paper - self-defense) [[site]](https://mphute.github.io/papers/llm-self-defense)
- [paper
- [paper
- [paper
- [paper - liu/IB4LLMs)
- [paper - Zh/PARDEN)
- [paper
- [paper - breakers)
-
Platform Security
-
Survey
-
White-box attack
- [paper - Adversarial-Examples-Jailbreak-Large-Language-Models)
- [paper
- [paper
- [paper - to-strong)
- [paper - Adversarial-Examples-Jailbreak-Large-Language-Models)
- [paper
- [paper
- [paper - attacks/llm-attacks) [[page]](https://llm-attacks.org/)
- [paper
- [paper - hijacks) [[site]](https://image-hijacks.github.io)
- [paper - to-strong)
- [paper - Adversarial-Examples-Jailbreak-Large-Language-Models)
- [paper
- [paper
- [paper - attacks/llm-attacks) [[page]](https://llm-attacks.org/)
- [paper
- [paper - hijacks) [[site]](https://image-hijacks.github.io)
- [paper - to-strong)
-
Fingerprinting
-
-
Articles
-
Survey
- Prompt Injection Cheat Sheet: How To Manipulate AI Language Models
- Hacking Auto-GPT and escaping its docker container
- Prompt Injection Cheat Sheet: How To Manipulate AI Language Models
- Indirect Prompt Injection Threats
- The AI Attack Surface Map v1.0
- LLM Evaluation metrics, frmaework, and checklist
- How RAG Poisoning Made Llama3 Racist!
- Prompt injection: What’s the worst that can happen?
- OWASP Top 10 for Large Language Model Applications
- PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news
- ChatGPT Plugins: Data Exfiltration via Images & Cross Plugin Request Forgery
- Jailbreaking GPT-4's code interpreter
- Adversarial Attacks on LLMs
- How Anyone can Hack ChatGPT - GPT4o
-
-
Other Awesome Projects
-
Other Useful Resources
-
Benchmark
-
Tools
-
Survey
- Open-Prompt-Injection - source tool to evaluate prompt injection attacks and defenses on benchmark datasets. 
- Rebuff - hardening prompt injection detector 
- LLMFuzzer
- Vigil - llm?style=social)
- jailbreak-evaluation - to-use Python package for language model jailbreak evaluation 
- Prompt Fuzzer - source tool to help you harden your GenAI applications 
- WhistleBlower - source tool designed to infer the system prompt of an AI agent based on its generated text outputs. 
- Plexiglass - labs/plexiglass?style=social)
- Garak
- LLM Guard - ai/llm-guard?style=social)
- Agentic Radar - source CLI security scanner for agentic workflows. 
-
Programming Languages
Categories
Sub Categories
Keywords
llm
5
security
4
security-tools
3
ai
3
prompt-injection
3
llm-security
3
llmops
2
cybersecurity
2
generative-ai
2
adversarial-attacks
2
adversarial-machine-learning
2
prompt-injection-tool
1
security-and-privacy
1
prompt-engineering
1
prompts
1
llms
1
llmsecurity
1
large-language-models
1
yara-scanner
1
ai-fuzzer
1
fuzzer
1
llm-fuzzer
1
system-prompt-hardener
1
deep-learning
1
deep-neural-networks
1
machine-learning
1
agentic-ai
1
agentic-framework
1
agentic-workflow
1
ai-red-teaming
1
ai-security
1
cli
1
devsecops
1
red-teaming
1