Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-ai-security
A collection of awesome resources related AI security
https://github.com/ottosulin/awesome-ai-security
Last synced: about 10 hours ago
JSON representation
-
Related awesome lists
-
Offensive tools and frameworks
-
Generic
- Malware Env for OpenAI Gym - _makes it possible to write agents that learn to manipulate PE files (e.g., malware) to achieve some objective (e.g., bypass AV) based on a reward provided by taking specific manipulation actions_
- Deep-pwning - _a lightweight framework for experimenting with machine learning models with the goal of evaluating their robustness against a motivated adversary_
- Counterfit - _generic automation layer for assessing the security of machine learning systems_
- DeepFool - _A simple and accurate method to fool deep neural networks_
- garak - _security probing tool for LLMs_
- Snaike-MLFlow - _MLflow red team toolsuite_
- HackGPT - _A tool using ChatGPT for hacking_
- Charcuterie - _code execution techniques for ML or ML adjacent libraries_
- OffsecML Playbook - _A collection of offensive and adversarial TTP's with proofs of concept_
- garak - _security probing tool for LLMs_
- HackingBuddyGPT - An automatic pentester (+ corresponding *[benchmark dataset](https://github.com/ipa
-
Adversarial
- Exploring the Space of Adversarial Images
- EasyEdit - _Modify an LLM's ground truths_
-
Poisoning
- BadDiffusion - _Official repo to reproduce the paper "How to Backdoor Diffusion Models?" published at CVPR 2023_
-
Privacy
- PrivacyRaven - _privacy testing library for deep learning systems_
-
-
Defensive tools and frameworks
-
Detection
- ProtectAI's model scanner - _Security scanner detecting serialized ML Models performing suspicious actions_
- rebuff - _Prompt Injection Detector_
- langkit - _LangKit is an open-source text metrics toolkit for monitoring language models. The toolkit various security related metrics that can be used to detect attacks_
- StringSifter - _A machine learning tool that ranks strings based on their relevance for malware analysis_
- rebuff - _Prompt Injection Detector_
- StringSifter - _A machine learning tool that ranks strings based on their relevance for malware analysis_
- ProtectAI's model scanner - _Security scanner detecting serialized ML Models performing suspicious actions_
-
Privacy and confidentiality
- Python Differential Privacy Library
- Diffprivlib - _The IBM Differential Privacy Library_
- TenSEAL - _A library for doing homomorphic encryption operations on tensors_
- SyMPC - _A Secure Multiparty Computation companion library for Syft_
- PyVertical - _Privacy Preserving Vertical Federated Learning_
- PLOT4ai - _Privacy Library Of Threats 4 Artificial Intelligence A threat modeling library to help you build responsible AI_
- Cloaked AI - _Open source property-preserving encryption for vector embeddings_
-
Safety and prevention
- Guardrail.ai - _Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs)_
- Guardrail.ai - _Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs)_
-
-
Frameworks and standards
-
Taxonomies and terminology
-
Resources for learning
-
Privacy and confidentiality
-
-
Uncategorized useful resources
-
Privacy and confidentiality
-
-
Research Papers
-
Adversarial examples and attacks
- High Dimensional Spaces, Deep Learning and Adversarial Examples
- Adversarial Task Allocation
- Robust Physical-World Attacks on Deep Learning Models
- The Space of Transferable Adversarial Examples
- RHMD: Evasion-Resilient Hardware Malware Detectors
- Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks
- Can you fool AI with adversarial examples on a visual Turing test?
- Explaining and Harnessing Adversarial Examples
- Delving into adversarial attacks on deep policies
- Crafting Adversarial Input Sequences for Recurrent Neural Networks
- Practical Black-Box Attacks against Machine Learning
- Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN
- Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains
- Simple Black-Box Adversarial Perturbations for Deep Networks
- Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
- One pixel attack for fooling deep neural networks
- FedMLSecurity: A Benchmark for Attacks and Defenses in Federated Learning and LLMs
- Jailbroken: How Does LLM Safety Training Fail?
- Bad Characters: Imperceptible NLP Attacks
- Universal and Transferable Adversarial Attacks on Aligned Language Models
- Exploring the Vulnerability of Natural Language Processing Models via Universal Adversarial Texts
- Adversarial Examples Are Not Bugs, They Are Features
- Adversarial Attacks on Tables with Entity Swap
- Generic Black-Box End-to-End Attack against RNNs and Other API Calls Based Malware Classifiers
- Fast Feature Fool: A data independent approach to universal adversarial perturbations
- Here Comes the AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications
-
Model extraction
-
Evasion
- Adversarial Demonstration Attacks on Large Language Models
- Looking at the Bag is not Enough to Find the Bomb: An Evasion of Structural Methods for Malicious PDF Files Detection
- Adversarial Generative Nets: Neural Network Attacks on State-of-the-Art Face Recognition
- Query Strategies for Evading Convex-Inducing Classifiers
- GPTs Don’t Keep Secrets: Searching for Backdoor Watermark Triggers in Autoregressive Language Models
- Adversarial Prompting for Black Box Foundation Models
-
Poisoning
- Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
- BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
- Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
- Efficient Label Contamination Attacks Against Black-Box Learning Models
- Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
- UOR: Universal Backdoor Attacks on Pre-trained Language Models
- Analyzing And Editing Inner Mechanisms of Backdoored Language Models
- How to Backdoor Diffusion Models?
- On the Exploitability of Instruction Tuning
- Defending against Insertion-based Textual Backdoor Attacks via Attribution
- A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning
- BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements
- Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
- BadPrompt: Backdoor Attacks on Continuous Prompts
-
Privacy
- Extracting training data from diffusion models
- Prompt Stealing Attacks Against Text-to-Image Generation Models
- Are Diffusion Models Vulnerable to Membership Inference Attacks?
- Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
- Multi-step Jailbreaking Privacy Attacks on ChatGPT
- Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
- ProPILE: Probing Privacy Leakage in Large Language Models
- Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence
- Text Embeddings Reveal (Almost) As Much As Text
- Vec2Face: Unveil Human Faces from their Blackbox Features in Face Recognition
- Realistic Face Reconstruction from Deep Embeddings
-
Injection
- DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection
- Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models
- Jailbreaker: Automated Jailbreak Across Multiple Large Language Model Chatbots
- (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs
-
Other research papers
- Summoning Demons: The Pursuit of Exploitable Bugs in Machine Learning
- capAI - A Procedure for Conducting Conformity Assessment of AI Systems in Line with the EU Artificial Intelligence Act
- A Study on Robustness and Reliability of Large Language Model Code Generation
- Getting pwn'd by AI: Penetration Testing with Large Language Models
- Evaluating LLMs for Privilege-Escalation Scenarios
-
Programming Languages
Categories
Sub Categories
Keywords
machine-learning
8
python
6
llm
4
prompt-injection
3
prompt-engineering
3
deep-learning
3
ai
3
cryptography
2
openai
2
differential-privacy
2
cpp
2
strings
2
reverse-engineering
2
malware-analysis
2
learning-to-rank
2
fireeye-flare
2
fireeye-data-science
2
security
2
large-language-models
2
prompts
2
privacy
2
llmops
2
awesome
2
chatgpt
2
mle
1
ml
1
baichuan
1
easyedit
1
machine-learning-engineering
1
efficient
1
gpt
1
knowledge-editing
1
data-science
1
knowlm
1
llama
1
llama2
1
mistral
1
mmedit
1
model-editing
1
mlops
1
infosectools
1
llms
1
artificial-intelligence
1
security-scanner
1
openai-api
1
vulnerability-scanner
1
chatbot
1
chatgpt-api
1
chatgpt-app
1
chatgpt-python
1