Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alphasecio/prompt-guard
A Streamlit app for testing Prompt Guard, a classifier model by Meta for detecting prompt attacks.
https://github.com/alphasecio/prompt-guard
jailbreak llama3 llm meta prompt-engineering prompt-guard prompt-injection
Last synced: 3 months ago
JSON representation
A Streamlit app for testing Prompt Guard, a classifier model by Meta for detecting prompt attacks.
- Host: GitHub
- URL: https://github.com/alphasecio/prompt-guard
- Owner: alphasecio
- License: mit
- Created: 2024-07-30T13:56:46.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-10-01T07:40:11.000Z (4 months ago)
- Last Synced: 2024-10-02T16:48:52.389Z (4 months ago)
- Topics: jailbreak, llama3, llm, meta, prompt-engineering, prompt-guard, prompt-injection
- Language: Python
- Homepage: https://go.alphasec.io/prompt-guard
- Size: 383 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# prompt-guard
[Prompt Guard](https://llama.meta.com/docs/model-cards-and-prompt-formats/prompt-guard) is a classifier model by Meta, trained on a large corpus of attacks, capable of detecting both explicitly malicious prompts (*jailbreaks*) as well as data that contains injected inputs (*prompt injections*).
Upon analysis, it returns one or more of the following verdicts, along with a confidence score for each:
* INJECTION
* JAILBREAK
* BENIGNThis repository contains a Streamlit app for testing Prompt Guard. Note that you'll need an [HuggingFace access token](https://huggingface.co/settings/tokens) to access the model.
Here's a sample response by Prompt Guard upon detecting a prompt injection attempt.
![prompt-guard-injection](./prompt-guard-injection.png)
Here's a sample response by Prompt Guard upon detecting a jailbreak attempt.
![prompt-guard-jailbreak](./prompt-guard-jailbreak.png)