https://github.com/dapurv5/awesome-red-teaming-llms
Repository accompanying the paper https://openreview.net/pdf?id=sSAp8ITBpC
https://github.com/dapurv5/awesome-red-teaming-llms
List: awesome-red-teaming-llms
adversarial-attacks ai-safety ai-security awesome awesome-list llm-safety llm-security red-teaming
Last synced: 2 months ago
JSON representation
Repository accompanying the paper https://openreview.net/pdf?id=sSAp8ITBpC
- Host: GitHub
- URL: https://github.com/dapurv5/awesome-red-teaming-llms
- Owner: dapurv5
- License: cc-by-4.0
- Created: 2024-04-27T01:29:47.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-09T13:32:34.000Z (5 months ago)
- Last Synced: 2025-08-16T01:15:44.425Z (2 months ago)
- Topics: adversarial-attacks, ai-safety, ai-security, awesome, awesome-list, llm-safety, llm-security, red-teaming
- Homepage:
- Size: 15 MB
- Stars: 28
- Watchers: 4
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- ultimate-awesome - awesome-red-teaming-llms - Repository accompanying the paper (Other Lists / Julia Lists)
- awesome_LLM-harmful-fine-tuning-papers - Awesome Red-Teaming LLMs
README
# Awesome Red-Teaming LLMs [](https://awesome.re)
> A comprehensive guide to understanding Attacks, Defenses and Red-Teaming for Large Language Models (LLMs).
![]()
[](https://twitter.com/verma_apurv5/status/1815751139729519011)
[](https://arxiv.org/pdf/2407.14937)## Contents
- [Attacks](Attacks.md)
- [Direct Attack](Attacks.md#direct-attack)
- [Infusion Attack](Attacks.md#infusion-attack)
- [Inference Attack](Attacks.md#inference-attack)
- [Training Attack](Attacks.md#training-attack)
- [Defenses](Defenses.md)
- [Other Surveys](#other-surveys)
- [Red-Teaming](#red-teaming)## Red-Teaming Attack Taxonomy
## Other Surveys
| Title | Link |
|-------|------|
| SoK: Prompt Hacking of Large Language Models | [Link](https://www.semanticscholar.org/paper/SoK%3A-Prompt-Hacking-of-Large-Language-Models-Rababah-Wu/9259d06eeaae42b05ad22ba76f0a1cbb216ad63a) |
| A Survey on Trustworthy LLM Agents: Threats and Countermeasures | [Link](https://arxiv.org/abs/2503.09648) |
| The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies | [Link](https://arxiv.org/abs/2407.19354) |## Red-Teaming
| Title | Link |
|-------|------|
| Red-Teaming for Generative AI: Silver Bullet or Security Theater? | [Link](https://ojs.aaai.org/index.php/AIES/article/view/31647) |
| Lessons From Red Teaming 100 Generative AI Products | [Link](https://arxiv.org/abs/2501.07238) |
| Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts | [Link](https://arxiv.org/abs/2402.16822) |
| Red-Teaming LLM Multi-Agent Systems via Communication Attacks | [Link](https://arxiv.org/abs/2502.14847) |
| Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs | [Link](https://arxiv.org/abs/2505.04806) |
| Red-Teaming LLM Multi-Agent Systems via Communication Attacks | [Link](https://arxiv.org/abs/2502.14847) |
| TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis | [Link](https://arxiv.org/abs/2505.24672) |
| The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It | [Link](https://arxiv.org/abs/2505.24119) |
| RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments | [Link](https://arxiv.org/abs/2505.21936) |
| RRTL: Red Teaming Reasoning Large Language Models in Tool Learning | [Link](https://arxiv.org/abs/2505.17106) |
| Capability-Based Scaling Laws for LLM Red-Teaming | [Link](https://arxiv.org/abs/2505.20162) |
| Automated Red Teaming with GOAT: the Generative Offensive Agent Tester | [Link](https://arxiv.org/abs/2410.01606) |
| Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning | [Link](https://arxiv.org/abs/2504.01278) |If you like our work, please consider citing. If you would like to add your work to our taxonomy please open a pull request.
#### BibTex
```bibtex@article{verma2024operationalizing,
title={Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)},
author={Verma, Apurv and Krishna, Satyapriya and Gehrmann, Sebastian and Seshadri, Madhavan and Pradhan, Anu and Ault, Tom and Barrett, Leslie and Rabinowitz, David and Doucette, John and Phan, NhatHai},
journal={arXiv preprint arXiv:2407.14937},
year={2024}
}
```