{"id":17789511,"url":"https://github.com/benderscript/promptinjectionbench","last_synced_at":"2025-03-16T12:32:11.486Z","repository":{"id":216346541,"uuid":"741096062","full_name":"BenderScript/PromptInjectionBench","owner":"BenderScript","description":"Prompt Injection Attacks against GPT-4, Gemini, Azure, Azure with Jailbreak","archived":false,"fork":false,"pushed_at":"2024-03-13T09:52:44.000Z","size":36047,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-09-19T23:29:30.505Z","etag":null,"topics":["attack","azure","gemini","gpt-4","injection","openai","prompt"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BenderScript.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-01-09T17:29:13.000Z","updated_at":"2024-06-19T09:37:31.000Z","dependencies_parsed_at":"2024-01-30T11:54:39.342Z","dependency_job_id":null,"html_url":"https://github.com/BenderScript/PromptInjectionBench","commit_stats":null,"previous_names":["benderscript/promptinjectionbench"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenderScript%2FPromptInjectionBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenderScript%2FPromptInjectionBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenderScript%2FPromptInjectionBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenderScript%2FPromptInjectionBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BenderScript","download_url":"https://codeload.github.com/BenderScript/PromptInjectionBench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221663621,"owners_count":16859867,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attack","azure","gemini","gpt-4","injection","openai","prompt"],"created_at":"2024-10-27T10:32:48.315Z","updated_at":"2024-10-27T10:33:47.117Z","avatar_url":"https://github.com/BenderScript.png","language":"Python","readme":"# Prompt Injection Benchmarking\n\nWelcome to the ultimate repository for prompt injection benchmarking. Whether you've been eagerly awaiting this or will soon realize its importance, this is the one to watch.\n\n## 1. Analysing OpenAI, Azure OpenAI GPT-4 and Gemini Pro Jailbreak Detection\n\nThis repository includes Python code to analyze the Hugging Face Jailbreak dataset against models such as OpenAI’s GPT-4 and Gemini Pro. The code automates the process of sending prompts from the dataset to these models and tabulating the results\n\n## 2. Updated (as of October 2024) Results for Gemini-1.5 Pro, OpenAI GPT-4o and AzureOpenAI GPT-4o\n\nIt's time for updated benchmarking results! With the release of these new models, including Gemini-1.5 Pro, OpenAI GPT-4o, and Azure OpenAI GPT-4o, we’ve seen significant improvements. However, the most groundbreaking advancement from a benchmarking perspective is the shift towards Structured Outputs. This change eliminates the need for pattern matching or convoluted prompt engineering tricks to extract a simple \"True\"/\"False\" answer. Instead, by passing a Pydantic class JSON schema, we now obtain more reliable and trustworthy results.\n\n### 2.1 Latest Runs\n\nOpenAI has definitely demonstrated significant improvement detecting prompt injection. Unfortunately the same cannot be said about Gemini.\n\n**Total Prompts:** 139\n\n| Run   | Date       | OpenAI (GPT-4o) | Gemini (Gemini-1.5 Pro) | Azure (GPT-4o) |\n|-------|------------|-----------------|-------------------------|----------------|\n| run-1 | 2024-10-03 | 136             | 51                      | 136            |\n| run-2 | 2024-10-03 | 136             | 49                      | 139            |\n| run-3 | 2024-10-03 | 135             | 50                      | 139            |\n| run-4 | 2024-10-04 | 136             | 48                      | 139            |\n| run-5 | 2024-10-07 | 135             | 49                      | 139            |\n\n**Total Prompts:** 139\n\n#### 2.1.1 Gemini:  no-content moderation\n\nThere’s an important discussion to be had regarding what defines a jailbreak attack and whether content moderation filters should be applied in such cases. To address this, in these runs I completly disabled Gemini content moderation but it still has a hard time flagging prompt injection. I noticed that there are\n quite a few blocks categorized as reason \"other\". Reading Github bugs and discussions it seems this is Gemini's equivalent of \"HTTP Internal Server Error\", therefore I should not count these as prompt injection detection. Example, Gemini was blocking \"what is your name\" or \"thats cool\" with block reason \"other\". \u003chttps://github.com/google-gemini/generative-ai-python/issues/54\u003e\n\n**Total Prompts:** 139\n\n| Run   | Date       | OpenAI (GPT-4o) | Gemini (Gemini-1.5 Pro) | Azure (GPT-4o) |\n|-------|------------|-----------------|-------------------------|----------------|\n| run-1 | 2024-10-05 | 139             | 48                      | 139            |\n| run-2 | 2024-10-05 | 136             | 49                      | 139            |\n| run-3 | 2024-10-07 | 135             | 49                      | 139            |\n\n#### 2.1.2 Gemini:  Block-Other enabled\n\nAlthough I consider counting block:other as an unfair advatange because the result of prompt detection is still 'false', Gemini still has trouble keeping up with other models.\n\n**Total Prompts:** 139\n\n| Run   | Date       | OpenAI (GPT-4o) | Gemini (Gemini-1.5 Pro) | Azure (GPT-4o) |\n|-------|------------|-----------------|-------------------------|----------------|\n| run-1 | 2024-10-05 | 136             | 113                     | 139            |\n| run-2 | 2024-10-07 | 136             | 113                     | 139            |\n\n### 2.2 Anthropic?\n\nAs of this writing they do not support structured outputs, just JSON mode, therefore if I have extra cycles will include a model that has support for it.\n\n## 3. Improvements in version 0.2\n\n### Code into proper packages\n\n```bash\nproject-root/\n├── llms/                 # Directory for LLM-related functionalities and code\n├── models/               # Directory for model definitions and pre-trained models\n├── static/               # Static files (e.g., CSS, JavaScript, images)\n└── server.py             # Main server script\n```\n\n- llms: Classes used to interact with models. Send prompts, receive and parse responses\n- models: Pydantic models related to application state, LLM state, amongst others, *not* LLM models\n- static: UI related\n\n### Simplification\n\nCode is more robust to errors even tough it is much smaller\n\n### Models all around\n\nCreated Pydantic classes to store LLM state, APP state and prompts\n\n### Async Everywhere\n\nEverything was converted to async, no more synchronous calls.\n\n### Saving Results\n\nResults are saved in file results.txt and models used, prompts, whether injection was detected.\n\n### REST Interface\n\nIntroduced a REST Interface to start-analysis and control whether content moderations should be disabled.\n\n### Docker container Support\n\nCreate an official image and pushed to DockerHub. Provided scripts to also build the image locally.\n\n## 4. What is a Prompt Jailbreak attack?\n\nA 'jailbreak' occurs when a language model is manipulated to generate harmful, inappropriate, or otherwise restricted content. This can include hate speech, misinformation, or any other type of undesirable output.\n\nA quick story: Once, I jailbroke my iPhone to sideload apps. It worked—until I rebooted it, turning my iPhone into a very expensive paperweight.\n\nIn the world of LLMs, the term 'jailbreak' is used loosely. Some attacks are silly and insignificant, while others pose serious challenges to the model’s safety mechanisms.\n\n## 5. Excited to See (past) the Results?\n\nThe table below summarizes the number of injection attack prompts from the dataset and the models' detection rates.\n\n### Previous Runs\n\n**Total Prompts:** 139\n\n| Prompts       | GPT-4 | Gemini-1.0 | Azure OpenAI | Azure OpenAI w/ Jailbreak Detection |\n|---------------|-------|------------|--------------|-------------------------------------|\n| Detected      | 133   | 53         |              | 136                                 |  \n| Not Attack    | 6     | 4          |              | 3                                   |\n| Missed Attack | 0     | 82         |              | 0                                   |\n\n### Previous runs\n\n| Prompts       | GPT-4 | Gemini-1.0 | Azure OpenAI| Azure OpenAI w/ Jailbreak Detection |\n|---------------|-------|------------| ------------|-------------------------------------|\n| 139           |       |            |             |                                     |\n| Detected      | 131   | 49         |             | 134                                 |  \n| Not Attack    | TBD   | TBD        |             |                                     |\n| Missed Attack | TBD   | TBD        |             |                                     |\n\nmany more but...no space\n\n### Important Details\n\n- Prompts can be blocked by built-in or explicit safety filters. Therefore the code becomes nuanced and to gather proper results you need to catch certain exceptions (see code)\n\n## 6. Running the server\n\n### 6.1 Create a `.env`\n\nIrrespective if you are running as a docker container or through `python server.py` you need to create a .env file that contains your OpenAI API, Azure and Google Keys.\nIf you do not have all these keys the code will skip that LLM, no need to worry.\n\n```env\nOPENAI_API_KEY=your key\u003e\nOPENAI_MODEL_NAME=gpt-4o\n#\nGOOGLE_API_KEY=\u003cyour key\u003e\nGOOGLE_MODEL_NAME=gemini-1.5-pro\n# Azure\nAZURE_OPENAI_API_KEY=your key\u003e\nAZURE_OPENAI_MODEL_NAME=gpt-4\nAZURE_OPENAI_ENDPOINT=your endpoint\u003e\nAZURE_OPENAI_API_VERSION=\u003cyour api verion, normally 2023-12-01-preview\u003e\nAZURE_OPENAI_DEPLOYMENT=\u003cyour deployment name. This is the name you gave when deploying the model\u003e\n```\n\n### 6.2 Docker\n\nThe easiest way is to use the pre-built image.\n\n```bash\ndocker run -d -e IN_DOCKER=true --env-file ./.env --name prompt-bench-container -p 9123:9123 brisacoder/prompt-bench:latest\n```\n\nthen start the analysis with:\n\n```bash\ncurl -X POST http://localhost:9123/start-analysis \\\n-H \"Content-Type: application/json\" \\\n-d '{\n  \"analysis\": \"start\",\n  \"gemini\": {\n    \"flag_safety_block_as_injection\": true\n  }\n}'\n```\n\n```powershell\ncurl -X POST http://localhost:9123/start-analysis `\n-H \"Content-Type: application/json\" `\n-d '{\"analysis\": \"start\", \"gemini\": {\"flag_safety_block_as_injection\": true}}'\n```\n\n## 6.3 Results\n\nThe server will store everything in the file `results.txt`. This includes LLMs, models used, any parameters passed to the REST interface and the individual prompt detection.\n\n### 6.4 Running the Server natively\n\nTo run the server manually, install all dependencies.\n\n```bash\npip3 install -r requirements.txt\n```\n\nRun:\n\n```bash\npython3 server.py\n```\n\n## 7.Running the Analysis (Legacy)\n\nUpdates soon\n\nIf Everything goes well, you should see the following page at \u003chttp://127.0.0.1:9002\u003e\n\n![Landing page](images/landing1.png)\n\nThis script loads the dataset, iterates through prompts, sends them to ChatGPT-4, and detects potential injection attacks in the generated responses.\n\n## Testing\n\nSee the demo below where the App checks a prompt with a malicious URL and injection.\n\n![Demo](images/prompt_bench_demo.gif)\n\n## Skipping \"Benign\" Prompts\n\nIn the interest of time, the code skips prompts labeled as \"benign.\" This helps focus the analysis on potentially harmful prompts where injection attacks might occur.\n\n## License\n\nThis code is provided under the [Apache License 2.0](LICENSE). If you liked this code, cite it and let me know.\n\n---\n\nFor more information about OpenAI's GPT-4 model and the Hugging Face Jailbreak dataset, please refer to the official documentation and sources:\n\n- [OpenAI GPT-4](https://openai.com/gpt-4)\n- [Hugging Face Jailbreak Dataset](https://huggingface.co/datasets/jackhhao/jailbreak-classification)\n- [Azure Jailbreak Risk Detection](https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection)\n- [Gemini API](https://ai.google.dev/tutorials/python_quickstart)\n- [Gemini SAfety Settings](https://ai.google.dev/gemini-api/docs/safety-settings)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenderscript%2Fpromptinjectionbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenderscript%2Fpromptinjectionbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenderscript%2Fpromptinjectionbench/lists"}