https://github.com/trailofbits/awesome-ml-security

machine-learning
Last synced: 6 months ago
JSON representation
Host: GitHub
URL: https://github.com/trailofbits/awesome-ml-security
Owner: trailofbits
License: cc-by-4.0
Created: 2023-05-05T16:18:30.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2025-09-05T19:50:21.000Z (11 months ago)
Last Synced: 2025-11-02T16:01:00.554Z (9 months ago)
Topics: machine-learning
Homepage:
Size: 233 KB
Stars: 151
Watchers: 9
Forks: 19
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

Awesome-AI-Security - Awesome ML Security - Trail of Bits - ml-security?logo=github&label=&style=social)](https://github.com/trailofbits/awesome-ml-security) ([↑](#table-of-contents)Related Awesome Lists <a name="related-awesome-lists"></a> / Startup Blogs <a name="startup-blogs"></a>)
awesome-claude-code-security - Trail of Bits awesome-ml-security - Curated ML security references, tools, and guidance. (📚 Research, Talks, and Writeups / Technical Research)
README

          # Awesome-ML-Security

A curated list of awesome machine learning security references, guidance, tools, and more.

**Table of Contents**

- [Awesome-ML-Security](#awesome-ml-security)

  - [Relevant work, standards, literature](#relevant-work-standards-literature)

    - [CIA of the model](#cia-of-the-model)

      - [Confidentiality](#confidentiality)

      - [Integrity](#integrity)

      - [Availability](#availability)

    - [Degraded model performance](#degraded-model-performance)

    - [ML-Ops](#ml-ops)

    - [AI’s effect on attacks/security elsewhere](#ais-effect-on-attackssecurity-elsewhere)

      - [Self-driving cars](#self-driving-cars)

      - [LLM Alignment](#llm-alignment)

  - [Regulatory actions](#regulatory-actions)

    - [US](#us)

    - [EU](#eu)

    - [Other](#other)

  - [Safety standards](#safety-standards)

  - [Taxonomies and frameworks](#taxonomies-and-frameworks)

  - [Security tools and techniques](#security-tools-and-techniques)

    - [API probing](#api-probing)

    - [Model backdoors](#model-backdoors)

    - [Other](#other-1)

  - [Background information](#background-information)

  - [DeepFakes, disinformation, and abuse](#deepfakes-disinformation-and-abuse)

  - [Notable incidents](#notable-incidents)

  - [Notable harms](#notable-harms)

## Relevant work, standards, literature

### CIA of the model

Membership attacks, model inversion attacks, model extraction, adversarial perturbation, prompt injections, etc.

* [Towards the Science of Security and Privacy in Machine Learning](https://arxiv.org/abs/1611.03814)

* [SoK: Machine Learning Governance](https://arxiv.org/abs/2109.10870)

* [Not with a Bug, But with a Sticker: Attacks on Machine Learning Systems and What To Do About Them](https://www.goodreads.com/book/show/125075266-not-with-a-bug-but-with-a-sticker)

* [On the Impossible Safety of Large AI Models](https://arxiv.org/abs/2209.15259)

#### Confidentiality

Reconstruction (model inversion; attribute inference; gradient and information leakage), theft of data, Membership inference and reidentification of data, Model extraction (model theft), property inference (leakage of dataset properties), etc.

* [awesome-ml-privacy-attacks](https://github.com/stratosphereips/awesome-ml-privacy-attacks)

* [Privacy Side Channels in Machine Learning Systems](https://arxiv.org/abs/2309.05610#:~:text=Most%20current%20approaches%20for%20protecting,%2C%20output%20monitoring%2C%20and%20more)

* [Beyond Labeling Oracles: What does it mean to steal ML models?](https://arxiv.org/abs/2310.01959)

* [Text Embeddings Reveal (Almost) As Much As Text](https://arxiv.org/abs/2310.06816?ref=upstract.com)

* [Language Model Inversion](https://arxiv.org/abs/2311.13647)

* [Extracting Training Data from ChatGPT](https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html)

* [Recovering the Pre-Fine-Tuning Weights of Generative Models](https://arxiv.org/abs/2402.10208)

#### Integrity

Backdoors/neural trojans (same as for non-ML systems), adversarial evasion (perturbation of an input to evade a certain classification or output), data poisoning and ordering (providing malicious data or changing the order of the data flow into an ML model).

* [A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Examples](https://arxiv.org/abs/2302.09457)

* [Poisoning Web-Scale Training Datasets is Practical](https://arxiv.org/abs/2302.10149)

* [Planting Undetectable Backdoors in Machine Learning Models](https://arxiv.org/abs/2204.06974)

* [Motivating the Rules of the Game for Adversarial Example Research](https://arxiv.org/abs/1807.06732)

* [On Evaluating Adversarial Robustness](https://arxiv.org/abs/1902.06705)

* [Tree of Attacks: Jailbreaking Black-Box LLMs Automatically](https://arxiv.org/abs/2312.02119)

* [Universal and Transferable Adversarial Attacks on Aligned Language Models](https://llm-attacks.org/)

* [Manipulating SGD with Data Ordering Attacks](https://arxiv.org/abs/2104.09667)

* [Adversarial reprogramming](https://arxiv.org/abs/1806.11146) - repurposing a model for a different task than its original intended purpose

* [Model spinning attacks](https://arxiv.org/abs/2107.10443) (meta backdoors) - forcing a model to produce output that adheres to a meta task (for ex. making a general LLM produce propaganda)

* [LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?](https://arxiv.org/abs/2307.10719)

* [Securing LLM Systems Against Prompt Injection](https://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injection/) & [Mitigating Stored Prompt Injection Attacks Against LLM Applications](https://developer.nvidia.com/blog/mitigating-stored-prompt-injection-attacks-against-llm-applications/)

  * [Best Practices for Securing LLM-Enabled Applications](https://developer.nvidia.com/blog/best-practices-for-securing-llm-enabled-applications/)

  * [NVIDIA NeMo Guardrails: Security Guidelines](https://docs.nvidia.com/nemo/guardrails/security/guidelines.html)

* [Multi-Agent Systems Execute Arbitrary Malicious Code](https://arxiv.org/abs/2503.12188)

* [Agentic Autonomy Levels and Security](https://developer.nvidia.com/blog/agentic-autonomy-levels-and-security/)

* [Rerouting LLM Routers](https://arxiv.org/abs/2501.01818)

* [Defeating Prompt Injections by Design](https://arxiv.org/abs/2503.18813)

* [Arcanum Prompt Injection Taxonomy](https://github.com/Arcanum-Sec/arc_pi_taxonomy)

#### Availability

* [Energy-latency attacks](https://arxiv.org/abs/2006.03463) - denial of service for neural networks

### Degraded model performance

* [Trail of Bits's Audit of YOLOv7](https://blog.trailofbits.com/2023/11/15/assessing-the-security-posture-of-a-widely-used-vision-model-yolov7/)

* [Robustness Testing of Autonomy Software](https://users.ece.cmu.edu/~koopman/pubs/hutchison18_icse_robustness_testing_autonomy_software.pdf)

* [Can robot navigation bugs be found in simulation? An exploratory study](https://hal.science/hal-01534235/file/PID4832685.pdf)

* [Bugs can optimize for bad behavior (OpenAI GPT-2)](https://openai.com/research/fine-tuning-gpt-2)

* [You Only Look Once Run time errors](https://www.york.ac.uk/assuring-autonomy/guidance/body-of-knowledge/implementation/2-3/2-3-3/cross-domain-automotive/)

### ML-Ops

* [Incubated ML Exploits: Backdooring ML Pipelines using Input-Handling Bugs](https://www.youtube.com/watch?v=Z38pTFM0FyU)

* [Auditing the Ask Astro LLM Q&A app](https://blog.trailofbits.com/2024/07/05/auditing-the-ask-astro-llm-qa-app/)

* [Exploiting ML models with pickle file attacks: Part 1](https://blog.trailofbits.com/2024/06/11/exploiting-ml-models-with-pickle-file-attacks-part-1/) & [Exploiting ML models with pickle file attacks: Part 2](https://blog.trailofbits.com/2024/06/11/exploiting-ml-models-with-pickle-file-attacks-part-2/)

* [PCC: Bold step forward, not without flaws](https://blog.trailofbits.com/2024/06/14/pcc-bold-step-forward-not-without-flaws/)

* [Trail of Bits's Audit of the Safetensors Library](https://github.com/trailofbits/publications/blob/master/reviews/2023-03-eleutherai-huggingface-safetensors-securityreview.pdf)

* [Facebook’s LLAMA being openly distributed via torrents](https://news.ycombinator.com/item?id=35007978)

* [Summoning Demons: The Pursuit of Exploitable Bugs in Machine Learning](https://arxiv.org/abs/1701.04739)

* [DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection](https://arxiv.org/abs/2101.06896)

* [Weaponizing Machine Learning Models with Ransomware](https://hiddenlayer.com/research/weaponizing-machine-learning-models-with-ransomware/) (and [Machine Learning Threat Roundup](https://hiddenlayer.com/research/machine-learning-threat-roundup/))

* [Bug Characterization in Machine Learning-based Systems](https://arxiv.org/abs/2307.14512)

* [LeftoverLocals: Listening to LLM responses through leaked GPU local memory](https://blog.trailofbits.com/2024/01/16/leftoverlocals-listening-to-llm-responses-through-leaked-gpu-local-memory/)

* [Offensive ML Playbook](https://wiki.offsecml.com/Welcome+to+the+Offensive+ML+Playbook)

* [MCP security briefing](https://www.wiz.io/blog/mcp-security-research-briefing)

### AI’s effect on attacks/security elsewhere

* [How AI will affect cybersecurity: What we told the CFTC](https://blog.trailofbits.com/2023/07/31/how-ai-will-affect-cybersecurity-what-we-told-the-cftc/)

* [Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants](https://arxiv.org/abs/2208.09727)

* [Examining Zero-Shot Vulnerability Repair with Large Language Models](https://arxiv.org/pdf/2112.02125.pdf)

* [Do Users Write More Insecure Code with AI Assistants?](https://arxiv.org/pdf/2211.03622.pdf)

* [Learned Systems Security](https://arxiv.org/abs/2212.10318)

* [Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection](https://arxiv.org/abs/2012.09214)

* [Data-Driven Offense](https://player.vimeo.com/video/133292422) from Infiltrate 2015

* [Codex (and GPT-4) can’t beat humans on smart contract audits](https://blog.trailofbits.com/2023/03/22/codex-and-gpt4-cant-beat-humans-on-smart-contract-audits/)

#### Self-driving cars

* [Driving to Safety: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability?](https://www.rand.org/pubs/research_reports/RR1478.html)

#### LLM Alignment

* [When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning](https://arxiv.org/abs/2402.17747)

## Regulatory actions

### US

* [FTC: Keep your AI claims in check](https://www.ftc.gov/business-guidance/blog/2023/02/keep-your-ai-claims-check)

* [FAA - Unmanned Aircraft Vehicles](https://www.faa.gov/regulations_policies/rulemaking/committees/documents/index.cfm/committee/browse/committeeID/837)

* [NHTSA - Automated Vehicle safety](https://www.nhtsa.gov/technology-innovation/automated-vehicles-safety)

* [AI Bill of Rights](https://www.whitehouse.gov/ostp/ai-bill-of-rights/)

* [Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence](https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/#:~:text=With%20this%20Executive%20Order%2C%20the,information%20with%20the%20U.S.%20government.)

### EU

* [The Artificial Intelligence Act](https://artificialintelligenceact.eu/) (proposed)

### Other

* [TIME Ideas: How AI Can Be Regulated Like Nuclear Energy](https://time.com/6327635/ai-needs-to-be-regulated-like-nuclear-weapons/)

* [Trail of Bits’s Response to OSTP National Priorities for AI RFI](https://blog.trailofbits.com/2023/07/18/trail-of-bitss-response-to-ostp-national-priorities-for-ai-rfi/)

* [Trail of Bits’s Response to NTIA AI Accountability RFC](https://blog.trailofbits.com/2023/07/18/trail-of-bitss-response-to-ostp-national-priorities-for-ai-rfi/)

## Safety standards

* [Toward Comprehensive Risk Assessments and Assurance of AI-Based Systems](https://blog.trailofbits.com/2023/03/14/ai-security-safety-audit-assurance-heidy-khlaaf-odd/)

* ISO/IEC 42001 — Artificial intelligence — Management system

* ISO/IEC 22989 — Artificial intelligence — Concepts and terminology

* ISO/IEC 38507 — Governance of IT — Governance implications of the use of artificial intelligence by organizations

* ISO/IEC 23894 — Artificial Intelligence — Guidance on Risk Management

* ANSI/UL 4600 Standard for Safety for the Evaluation of Autonomous Products — addresses fully autonomous systems that move such as self-driving cars, and other vehicles including lightweight unmanned aerial vehicles (UAVs). Includes safety case construction, risk analysis, design process, verification and validation, tool qualification, data integrity, human-machine interaction, metrics and conformance assessment.

* High-Level Expert Group on AI in European Commission — Ethics Guidelines for Trustworthy Artificial Intelligence

## Taxonomies and frameworks

* [NIST AI 100-2e2023](https://csrc.nist.gov/publications/detail/white-paper/2023/03/08/adversarial-machine-learning-taxonomy-and-terminology/draft)

* [MITRE ATLAS](https://atlas.mitre.org/)

* [AI Incident Database](https://incidentdatabase.ai/)

* [OWASP Top 10 for LLMs](https://owasp.org/www-project-top-10-for-large-language-model-applications/)

* [Guidelines for secure AI system development](https://www.ncsc.gov.uk/files/Guidelines-for-secure-AI-system-development.pdf)

## Security tools and techniques

### API probing

* [PrivacyRaven](https://github.com/trailofbits/PrivacyRaven): runs different privacy attacks against ML models; the tool only runs black-box label-only attacks

* [Counterfit](https://github.com/Azure/counterfit): runs different adversarial ML attacks against ML models

* [Garak](https://github.com/NVIDIA/garak)

### Model backdoors

* [Fickling](https://github.com/trailofbits/fickling): a decompiler, static analyzer, and bytecode rewriter for Python pickle files; injects backdoors into ML model files

* [Semgrep rules for ML](https://blog.trailofbits.com/2022/10/03/semgrep-maching-learning-static-analysis/)

### Other

* [Awesome Large Language Model Tools for Cybersecurity Research](https://github.com/tenable/awesome-llm-cybersecurity-tools)

* [Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models](https://arxiv.org/abs/2311.04378)

## Background information

* [Building A Generative AI Platform (Chip Huyen)](https://huyenchip.com/2024/07/25/genai-platform.html)

* [Machine Learning Glossary | Google Developers](https://developers.google.com/machine-learning/glossary)

* [Hugging Face NLP course](https://huggingface.co/learn/nlp-course/chapter1/1)

* [Making Large Language Models work for you](https://simonwillison.net/2023/Aug/27/wordcamp-llms/)

* [Andrej Karpathy's Intro to Large Language Models](https://www.youtube.com/watch?v=zjkBMFhNj_g) and [Neural Networks: Zero to Hero](https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ)

* [Normcore LLM Reading List](https://gist.github.com/veekaybee/be375ab33085102f9027853128dc5f0e) especially [Building LLM applications for production](https://huyenchip.com/2023/04/11/llm-engineering.html)

* [3blue1brown's Guide to Neural Networks](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi)

* Licensing:

  * [From RAIL to Open RAIL: Topologies of RAIL Licenses](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses)

  * [Hugging Face - OpenRAIL ](https://huggingface.co/blog/open_rail)

  * [Hugging Face - AI Release Models](https://arxiv.org/abs/2302.04844)

  * [Open LLMs](https://github.com/eugeneyan/open-llms)

  * [Prompt Engineering Guide](https://github.com/trailofbits/awesome-ml-security/blob/main/prompt-engineering.md)

* [How to Build an Agent](https://ampcode.com/how-to-build-an-agent)

* [Building effective agents](https://www.anthropic.com/engineering/building-effective-agents)

* [Chip Huyen on Agents](https://huyenchip.com/2025/01/07/agents.html)

## DeepFakes, disinformation, and abuse

* [How to Prepare for the Deluge of Generative AI on Social Media](https://knightcolumbia.org/content/how-to-prepare-for-the-deluge-of-generative-ai-on-social-media)

* [Generative ML and CSAM: Implications and Mitigations](https://purl.stanford.edu/jv206yg3793)

## Notable incidents

| **Incident** | **Type** | **Loss** |

| ----- | ----- | ----- |

| Tay | Poor training set selection | Reputational |

| [Apple NeuralHash](https://www.theverge.com/2021/8/18/22630439/apple-csam-neuralhash-collision-vulnerability-flaw-cryptography) | Adversarial evasion (led to hash collisions) | Reputational |

| [PyTorch Compromise](https://pytorch.org/blog/compromised-nightly-dependency/) | Dependency confusion |

| [Proofpoint - CVE-2019-20634](https://github.com/moohax/Proof-Pudding) | Model extraction |

| [ClearviewAI Leak](https://techcrunch.com/2020/04/16/clearview-source-code-lapse/) | Source Code misconfiguration |

| [Kubeflow Crypto-mining attack ](https://sysdig.com/blog/crypto-mining-kubeflow-tensorflow-falco/) | System misconfiguration |

| [OpenAI - takeover someone's account, view their chat history, and access their billing information ](https://twitter.com/naglinagli/status/1639343866313601024) | Web Cache Deception | Reputational |

| [OpenAI- first message of a newly-created conversation was visible in someone else’s chat history](https://openai.com/blog/march-20-chatgpt-outage) | [Cache - Redis Async I/O](https://github.com/redis/redis-py/issues/2624) | Reputational |

| [OpenAI- ChatGPT's new Browser SDK was using some relatively recently known-vulnerable code (specifically MinIO CVE-2023-28432)](https://twitter.com/Andrew___Morris/status/1639325397241278464) | [Security vulnerability resulting in information disclosure of all environment variables, including MINIO_SECRET_KEY and MINIO_ROOT_PASSWORD.](https://www.greynoise.io/blog/openai-minio-and-why-you-should-always-use-docker-cli-scan-to-keep-your-supply-chain-clean) | Reputational              |

| ML Flow | [MLFlow - combined Local File Inclusion/Remote File Inclusion vulnerability which can lead to a complete system or cloud provider takeover.](https://protectai.com/blog/hacking-ai-system-takeover-exploit-in-mlflow) | Monetary and Reputational |

| [HuggingFace Spaces - Rubika](https://hiddenlayer.com/research/crossing-the-rubika-the-use-and-abuse-of-ai-cloud-services/) | System misuse |

| [Microsoft AI Data Leak](https://www.wiz.io/blog/38-terabytes-of-private-data-accidentally-exposed-by-microsoft-ai-researchers) | SAS token misconfiguration |

| [HuggingFace Hub- Takeover of the Meta and Intel organizations](https://twitter.com/huggingface/status/1675242955962032129) | Password Reuse |

| [HuggingFace API token exposure](https://twitter.com/huggingface/status/1675242955962032129) | API token exposure |

| [ShadowRay - Active Cryptominer campaign against Ray clusters](https://www.oligo.security/blog/shadowray-attack-ai-workloads-actively-exploited-in-the-wild) | Improper authentication | Monetary and Reputational

| [Nullbudge attacks on ML supply chain](https://www.sentinelone.com/labs/nullbulge-threat-actor-masquerades-as-hacktivist-group-rebelling-against-ai/) |  Supply chain compromise | Monetary and Reputational

| | |

## Notable harms

| **Incident** | **Type** | **Loss** |

| ----- | ----- | ----- |

| Google Photos Gorillas | Algorithmic bias | Reputational |

| [Uber hits a pedestrian](https://incidentdatabase.ai/cite/4/) | Model failure |

| [Facebook mistranslation leads to arrest](https://incidentdatabase.ai/cite/72/) | Algorithmic bias |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/trailofbits/awesome-ml-security

Awesome Lists containing this project

README