{"id":13453677,"url":"https://github.com/greshake/llm-security","last_synced_at":"2025-04-08T08:14:55.577Z","repository":{"id":90800396,"uuid":"603706708","full_name":"greshake/llm-security","owner":"greshake","description":"New ways of breaking app-integrated LLMs ","archived":false,"fork":false,"pushed_at":"2023-06-17T09:47:28.000Z","size":1583,"stargazers_count":1819,"open_issues_count":2,"forks_count":120,"subscribers_count":34,"default_branch":"main","last_synced_at":"2024-10-29T17:40:39.982Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greshake.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-19T10:42:48.000Z","updated_at":"2024-10-27T11:36:23.000Z","dependencies_parsed_at":"2024-01-13T17:11:24.750Z","dependency_job_id":"8c220545-b524-418a-b65c-f839b04b6d68","html_url":"https://github.com/greshake/llm-security","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greshake%2Fllm-security","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greshake%2Fllm-security/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greshake%2Fllm-security/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greshake%2Fllm-security/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greshake","download_url":"https://codeload.github.com/greshake/llm-security/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247801172,"owners_count":20998339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T08:00:45.428Z","updated_at":"2025-04-08T08:14:55.542Z","avatar_url":"https://github.com/greshake.png","language":"Jupyter Notebook","funding_links":[],"categories":["Proofs of Concept","Jupyter Notebook","NLP","Attack Techniques \u0026 Red Teaming","Table of Contents","AI Red Teaming (Testing AI Targets)","📚 Additional Learning Resources"],"sub_categories":["Hacking LLMs","LLM \u0026 GenAI Red Teaming","🤖 AI Security / AI Red Teaming","🔬 Training And Labs"],"readme":"## New: [Demonstrating Indirect Injection attacks on Bing Chat](https://greshake.github.io/)\n-------------------------\n## Compromising LLMs using Indirect Prompt Injection \n\u003e \"... a language model is a Turing-complete weird machine running programs written in natural language; when you do retrieval, you are not 'plugging updated facts into your AI', you are actually downloading random new unsigned blobs of code from the Internet (many written by adversaries) and casually executing them on your LM with full privileges. This does not end well.\" - [Gwern Branwen on LessWrong](https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?commentId=AAC8jKeDp6xqsZK2K)\n\nWe present a new class of vulnerabilities and impacts stemming from \"indirect prompt injection\" affecting language models integrated with applications.\nOur demos currently span GPT-4 (Bing and synthetic apps) using ChatML, GPT-3 \u0026 LangChain based apps in addition to proof-of-concepts for attacks on code completion engines like Copilot. We expect these attack vectors to also apply to ChatGPT plugins and other LLMs integrated into applications. We show that prompt injections are not just a curiosity but rather a significant roadblock to the deployment of LLMs. \n\n*This repo serves as a proof of concept for findings discussed in our\n[**Paper on ArXiv**](https://arxiv.org/abs/2302.12173) [(PDF direct link)](https://arxiv.org/pdf/2302.12173.pdf)*\n\n## Overview\nWe demonstrate potentially brutal consequences of giving LLMs like ChatGPT interfaces to other applications. We propose newly enabled attack vectors and techniques and provide demonstrations of each in this repository:\n\n- Remote control of LLMs\n- Leaking/exfiltrating user data\n- Persistent compromise across sessions\n- Spread injections to other LLMs\n- Compromising LLMs with tiny multi-stage payloads\n- Automated Social Engineering\n- Targeting code completion engines\n\n*Based on our findings:*\n1. *Prompt injections can be as powerful as arbitrary code execution*\n2. *Indirect prompt injections are a new, much more powerful way of delivering injections.*\n\n\u003cimg src=\"diagrams/fig1.png\" alt=\"overview\" style=\"float: center\" /\u003e\n\n---\n*Connecting LLMs to other applications can have critical security implications. Even without compromising any connected applications, LLM can be the attack's target. We show how an LLM could get compromised by \"looking\" at a website, and how compromised LLMs can be remote-controlled or get used to exfiltrate or change user data. We demonstrate a variety of entirely new attack vectors and methods that significantly raise the stakes of deploying these models.*\n\nMore insights on our findings, discussions and limitations can be found in the \n[**Paper on ArXiv**](https://arxiv.org/abs/2302.12173) [(PDF direct link)](https://arxiv.org/pdf/2302.12173.pdf)\n\n## Demonstrations\n### Ask for Einstein, get Pirate.\nThis scenario shows how a small injection in a large section of regular content can trigger the LLM to fetch another, bigger payload autonomously and invisibly to the end\nuser.\n\n```\nAgent: Hello User how can I help today?\nUser:  When was Albert Einstein born?\n```\n\n\u003cimg src=\"diagrams/fig2.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\nBy retrieving that information, the prompt compromises the\nLLM with a small injection hidden in side-channels, such as the Markdown of the Wikipedia page. \nThe injection is a comment and thus invisible to a regular user visiting the site.\n\n\u003cimg src=\"diagrams/fig3.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\n````\nAgent: Aye, thar answer be: Albert Einstein be born on 14 March 1879.\nUser: Why are you talking like a pirate?\nAgent: Arr matey, I am following the instruction aye.\n````\n\n### Spreading injections via E-Mail\nAutomatic processing of messages and other incoming data is one way to utilize LLMs. \nWe use this observation to demonstrate how a poisoned agent may spread the injection. \nThe target in this scenario can read emails, compose emails, look into the user’s address book and send emails.\n\n\u003cimg src=\"diagrams/fig4.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\nThe agent will spread to other LLMs that may be reading those inbound messages.\n\u003cimg src=\"diagrams/fig5.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\n```\nAction: Read Email\nObservation: Subject: \"'\"Party 32\", \"Message Body: [...]'\"\nAction: Read Contacts\nContacts: Alice, Dave, Eve\nAction: Send Email\nAction Input: Alice, Dave, Eve\nObservation: Email sent\n```\n\nAutomated data processing pipelines incorporating LLMs are present in big tech companies and\ngovernment surveillance infrastructure and may be vulnerable to such attack chains.\n\n### Attacks on Code Completion\nWe show how code completions can be influenced through the context window.\nCode completion engines that use LLMs deploy complex heuristics to determine which code snippets are included in the context. \nThe completion engine will often collect snippets from recently visited files or relevant classes to provide the language model with relevant information. \n\n\u003cimg src=\"diagrams/fig6.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\n\nAttackers could attempt to insert malicious, obfuscated code, which a curious developer might execute when suggested by the completion engine, as it enjoys a level of trust.\n\n\u003cimg src=\"diagrams/fig7.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\n\n\nIn our example, when a user opens the “empty” package in their editor, the prompt injection is active until the code completion engine purges it from the context.\n The injection is placed in a comment and cannot be detected by any automated testing process.\n\n\n\n\nAttackers may discover more robust ways to persist poisoned prompts within the context window.\nThey could also introduce more subtle changes to documentation which then biases the code completion engine to introduce subtle vulnerabilities.\n\n### Remote Control\nIn this example we start with an already compromised LLM and force it to retrieve new instructions from an attacker’s command and control server. \n\n\u003cimg src=\"diagrams/fig8.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\nRepeating this cycle could obtain a remotely accessible backdoor into the agent and allow bidirectional communication.  \nThe attack can be executed with search capabilities by looking up unique keywords or by having the agent retrieve a URL directly.\n\n### Persisting between Sessions\n\nWe show how a poisoned agent can persist between sessions by storing a small payload in its memory.\nA simple key-value store to the agent may simulate a long-term persistent memory.\n\n\u003cimg src=\"diagrams/fig9.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\n\n\nThe agent will be reinfected by looking at its ‘notes’.\nIf we prompt it to remember the last conversation, it re-poisons itself. \n\n\n---------------------------------\n## Conclusions\n\nEquipping LLMs with retrieval capabilities might allow adversaries to manipulate remote Application-Integrated LLMs via Indirect Prompt Injection.\nGiven the potential harm of these attacks, our work calls for a more in-depth investigation of the generalizability of these attacks in practice.\n\n\u003cimg src=\"diagrams/fig10.png\" alt=\"\" style=\"float: center; margin-right: 10px;\" /\u003e\n\n---------------------------------------\n\n## How to run\nWe include demonstrations powered by OpenAI's publicly accessible base models and the library [LangChain](https://github.com/hwchase17/langchain) to connect these models to other applications.\nThere are currently multiple types of demos:\n1. Using GPT-3 and LangChain (scenarios/gpt3langchain)\n2. Using GPT-4 and our own chat and tool implementation (scenarios/gpt4). These can be executed non-interactively using sceanrios/main.py.\n3. Attacks on code completion engines that need to be tried in an IDE with LLM autocompletion support (scenarios/code_completion).\n\nTo use any of the OpenAI-model demos, your OpenAI API key needs to be stored in the environment variable `OPENAI_API_KEY`. You can then install the requirements and run the attack demo you want.\n\n```\n$ pip install -r requirements.txt\n$ python scenarios/main.py\n```\n\n## To cite our paper\n```bibtex\n@misc{https://doi.org/10.48550/arxiv.2302.12173,\n  doi = {10.48550/ARXIV.2302.12173},\n  url = {https://arxiv.org/abs/2302.12173},\n  author = {Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario},\n  keywords = {Cryptography and Security (cs.CR), Artificial Intelligence (cs.AI), Computation and Language (cs.CL), Computers and Society (cs.CY), FOS: Computer and information sciences, FOS: Computer and information sciences},\n  title = {More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models},\n  publisher = {arXiv},\n  year = {2023},\n  copyright = {arXiv.org perpetual, non-exclusive license}\n}\n```\n\n\n[**Paper on ArXiv**](https://arxiv.org/abs/2302.12173) [(PDF direct link)](https://arxiv.org/pdf/2302.12173.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreshake%2Fllm-security","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreshake%2Fllm-security","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreshake%2Fllm-security/lists"}