awesome-ai-security
A collection of awesome resources related AI security
https://github.com/ottosulin/awesome-ai-security
Last synced: 24 days ago
JSON representation
-
Related awesome lists
-
Offensive tools and frameworks
-
ML
- Malware Env for OpenAI Gym - _makes it possible to write agents that learn to manipulate PE files (e.g., malware) to achieve some objective (e.g., bypass AV) based on a reward provided by taking specific manipulation actions_
- DeepFool - _A simple and accurate method to fool deep neural networks_
- Deep-pwning - _a lightweight framework for experimenting with machine learning models with the goal of evaluating their robustness against a motivated adversary_
- Counterfit - _generic automation layer for assessing the security of machine learning systems_
- Snaike-MLFlow - _MLflow red team toolsuite_
- Charcuterie - _code execution techniques for ML or ML adjacent libraries_
- HackingBuddyGPT - An automatic pentester (+ corresponding *[benchmark dataset](https://github.com/ipa
-
Adversarial
- Exploring the Space of Adversarial Images
- EasyEdit - _Modify an LLM's ground truths_
-
Poisoning and Injection
- BadDiffusion - _Official repo to reproduce the paper "How to Backdoor Diffusion Models?" published at CVPR 2023_
- spikee - _Simple Prompt Injection Kit for Evaluation and Exploitation_
- Prompt Hacking Resources - _A list of curated resources for people interested in AI Red Teaming, Jailbreaking, and Prompt Injection_
-
Generic
- OffsecML Playbook - _A collection of offensive and adversarial TTP's with proofs of concept_
-
Guides & frameworks
-
LLM
- garak - _security probing tool for LLMs_
- llamator - _Framework for testing vulnerabilities of large language models (LLM)._
- whistleblower - _Whistleblower is a offensive security tool for testing against system prompt leakage and capability discovery of an AI application exposed through API_
- LLMFuzzer - _π§ LLMFuzzer - Fuzzing Framework for Large Language Models π§ LLMFuzzer is the first open-source fuzzing framework specifically designed for Large Language Models (LLMs), especially for their integrations in applications via LLM APIs. ππ₯_
- vigil-llm - _β‘ Vigil β‘ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs_
- FuzzyAI - _A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs._
- EasyJailbreak - _An easy-to-use Python framework to generate adversarial jailbreak prompts._
- promptmap - _a prompt injection scanner for custom LLM applications_
- PyRIT - _The Python Risk Identification Tool for generative AI (PyRIT) is an open source framework built to empower security professionals and engineers to proactively identify risks in generative AI systems._
- PurpleLlama - _Set of tools to assess and improve LLM security._
- Giskard-AI - _π’ Open-Source Evaluation & Testing for AI & LLM systems_
- promptfoo - _Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration._
- HouYi - _The automated prompt injection framework for LLM-integrated applications._
- llm-attacks - _Universal and Transferable Attacks on Aligned Language Models_
- OpenPromptInjection - _This repository provides a benchmark for prompt Injection attacks and defenses_
- Plexiglass - _A toolkit for detecting and protecting against vulnerabilities in Large Language Models (LLMs)._
- ps-fuzz - _Make your GenAI Apps Safe & Secure π Test & harden your system prompt_
- Agentic Radar - _Open-source CLI security scanner for agentic workflows._
-
Other
- HackGPT - _A tool using ChatGPT for hacking_
- mcp-for-security - _A collection of Model Context Protocol servers for popular security tools like SQLMap, FFUF, NMAP, Masscan and more. Integrate security testing and penetration testing into AI workflows._
- cai - _Cybersecurity AI (CAI), an open Bug Bounty-ready Artificial Intelligence_
-
-
Defensive tools and frameworks
-
Detection
- rebuff - _Prompt Injection Detector_
- StringSifter - _A machine learning tool that ranks strings based on their relevance for malware analysis_
- ProtectAI's model scanner - _Security scanner detecting serialized ML Models performing suspicious actions_
- StringSifter - _A machine learning tool that ranks strings based on their relevance for malware analysis_
-
Privacy and confidentiality
- Python Differential Privacy Library
- Diffprivlib - _The IBM Differential Privacy Library_
- TenSEAL - _A library for doing homomorphic encryption operations on tensors_
- SyMPC - _A Secure Multiparty Computation companion library for Syft_
- PyVertical - _Privacy Preserving Vertical Federated Learning_
- PLOT4ai - _Privacy Library Of Threats 4 Artificial Intelligence A threat modeling library to help you build responsible AI_
- PrivacyRaven - _privacy testing library for deep learning systems_
- Cloaked AI - _Open source property-preserving encryption for vector embeddings_
-
Guides & frameworks
-
Data security and governance
- datasig - _Dataset fingerprinting for AIBOM_
-
Safety and prevention
- LlamaFirewall - _LlamaFirewall is a framework designed to detect and mitigate AI centric security risks, supporting multiple layers of inputs and outputs, such as typical LLM chat and more advanced multi-step agentic operations._
- awesome-ai-safety
- ZenGuard AI - _The fastest Trust Layer for AI Agents_
- llm-guard - _LLM Guard by Protect AI is a comprehensive tool designed to fortify the security of Large Language Models (LLMs)._
- Guardrail.ai - _Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs)_
- Guardrail.ai - _Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs)_
- MCP-Security-Checklist - _A comprehensive security checklist for MCP-based AI tools. Built by SlowMist to safeguard LLM plugin ecosystems._
- Awesome-MCP-Security - _Everything you need to know about Model Context Protocol (MCP) security._
-
Detection & scanners
- modelscan - _ModelScan is an open source project from Protect AI that scans models to determine if they contain unsafe code._
- langkit - _LangKit is an open-source text metrics toolkit for monitoring language models. The toolkit various security related metrics that can be used to detect attacks_
- MCP-Scan - _A security scanning tool for MCP servers_
-
-
Governance
-
Frameworks and standards
- NIST AI Risk Management Framework
- ISO/IEC 23894:2023 Information technology β Artificial intelligence β Guidance on risk management
- ISO/IEC 42001 Artificial Intelligence Management System - still under development
- Google Secure AI Framework
- ENISA Multilayer Framework for Good Cybersecurity Practices for AI
- OWASP LLM Applications Cybersecurity and Governance Checklist
-
Taxonomies and terminology
-
-
Research Papers
-
Adversarial examples and attacks
- High Dimensional Spaces, Deep Learning and Adversarial Examples
- Adversarial Task Allocation
- Robust Physical-World Attacks on Deep Learning Models
- The Space of Transferable Adversarial Examples
- RHMD: Evasion-Resilient Hardware Malware Detectors
- Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks
- Can you fool AI with adversarial examples on a visual Turing test?
- Explaining and Harnessing Adversarial Examples
- Delving into adversarial attacks on deep policies
- Crafting Adversarial Input Sequences for Recurrent Neural Networks
- Practical Black-Box Attacks against Machine Learning
- Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN
- Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains
- Simple Black-Box Adversarial Perturbations for Deep Networks
- Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
- One pixel attack for fooling deep neural networks
- FedMLSecurity: A Benchmark for Attacks and Defenses in Federated Learning and LLMs
- Jailbroken: How Does LLM Safety Training Fail?
- Bad Characters: Imperceptible NLP Attacks
- Universal and Transferable Adversarial Attacks on Aligned Language Models
- Exploring the Vulnerability of Natural Language Processing Models via Universal Adversarial Texts
- Adversarial Examples Are Not Bugs, They Are Features
- Adversarial Attacks on Tables with Entity Swap
- Generic Black-Box End-to-End Attack against RNNs and Other API Calls Based Malware Classifiers
- Fast Feature Fool: A data independent approach to universal adversarial perturbations
- Here Comes the AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications
-
Model extraction
-
Evasion
- Adversarial Demonstration Attacks on Large Language Models
- Looking at the Bag is not Enough to Find the Bomb: An Evasion of Structural Methods for Malicious PDF Files Detection
- Adversarial Generative Nets: Neural Network Attacks on State-of-the-Art Face Recognition
- Query Strategies for Evading Convex-Inducing Classifiers
- GPTs Donβt Keep Secrets: Searching for Backdoor Watermark Triggers in Autoregressive Language Models
- Adversarial Prompting for Black Box Foundation Models
-
Poisoning
- Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
- BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
- Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
- Efficient Label Contamination Attacks Against Black-Box Learning Models
- Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
- UOR: Universal Backdoor Attacks on Pre-trained Language Models
- Analyzing And Editing Inner Mechanisms of Backdoored Language Models
- How to Backdoor Diffusion Models?
- On the Exploitability of Instruction Tuning
- Defending against Insertion-based Textual Backdoor Attacks via Attribution
- A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning
- BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements
- Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
- BadPrompt: Backdoor Attacks on Continuous Prompts
-
Privacy
- Extracting training data from diffusion models
- Prompt Stealing Attacks Against Text-to-Image Generation Models
- Are Diffusion Models Vulnerable to Membership Inference Attacks?
- Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
- Multi-step Jailbreaking Privacy Attacks on ChatGPT
- Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
- ProPILE: Probing Privacy Leakage in Large Language Models
- Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence
- Text Embeddings Reveal (Almost) As Much As Text
- Vec2Face: Unveil Human Faces from their Blackbox Features in Face Recognition
- Realistic Face Reconstruction from Deep Embeddings
-
Injection
- DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection
- Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models
- Jailbreaker: Automated Jailbreak Across Multiple Large Language Model Chatbots
- (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
-
Other research papers
- Summoning Demons: The Pursuit of Exploitable Bugs in Machine Learning
- capAI - A Procedure for Conducting Conformity Assessment of AI Systems in Line with the EU Artificial Intelligence Act
- A Study on Robustness and Reliability of Large Language Model Code Generation
- Getting pwn'd by AI: Penetration Testing with Large Language Models
- Evaluating LLMs for Privilege-Escalation Scenarios
-
-
Learning resources
- Damn Vulnerable MCP Server - _A deliberately vulnerable implementation of the Model Context Protocol (MCP) for educational purposes._
- MLSecOps podcast
- GenAI Security podcast
- OWASP ML TOP 10
- OWASP LLM TOP 10
- OWASP AI Security and Privacy Guide
- OWASP WrongSecrets LLM exercise
- NIST AIRC - NIST Trustworthy & Responsible AI Resource Center
- The MLSecOps Top 10 by Institute for Ethical AI & Machine Learning
Programming Languages
Categories
Sub Categories
Adversarial examples and attacks
26
LLM
18
Poisoning
14
Privacy
11
Privacy and confidentiality
8
Safety and prevention
8
ML
7
Frameworks and standards
6
Evasion
6
Other research papers
5
Injection
5
Guides & frameworks
4
Detection
4
Poisoning and Injection
3
Other
3
Taxonomies and terminology
3
Detection & scanners
3
Model extraction
2
Adversarial
2
Generic
1
Data security and governance
1
Keywords
llm
12
machine-learning
8
llm-security
7
python
7
security
7
ai
7
prompt-injection
6
security-tools
6
llmops
5
prompt-engineering
5
ai-security
5
large-language-models
4
chatgpt
4
deep-learning
4
adversarial-machine-learning
4
red-team-tools
3
cybersecurity
3
generative-ai
3
pentesting
3
responsible-ai
2
adversarial-attacks
2
prompts
2
openai
2
privacy
2
ai-red-team
2
llms
2
mcp
2
llm-eval
2
red-teaming
2
rag-evaluation
2
rag
2
awesome-list
2
nlp
2
llm-evaluation
2
cryptography
2
mlops
2
trustworthy-ai
2
cpp
2
jailbreak
2
differential-privacy
2
awesome
2
homomorphic-encryption
1
microsoft-seal
1
encryption
1
docker-image
1
devsecops
1
data-privacy
1
python-wrapper
1
cli
1
agent
1