Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cutwell/canary
LLM prompt injection detection
https://github.com/cutwell/canary
fastapi generative-ai openai prompt-injection
Last synced: about 1 month ago
JSON representation
LLM prompt injection detection
- Host: GitHub
- URL: https://github.com/cutwell/canary
- Owner: Cutwell
- License: mit
- Created: 2023-09-19T19:57:49.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-24T11:47:09.000Z (about 1 year ago)
- Last Synced: 2024-05-08T00:33:47.760Z (7 months ago)
- Topics: fastapi, generative-ai, openai, prompt-injection
- Language: Python
- Homepage:
- Size: 5 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Canary
LLM prompt injection detection.[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![PyTests](https://github.com/Cutwell/canary/actions/workflows/pytest-with-poetry.yaml/badge.svg)
![Pre-commit](https://github.com/Cutwell/canary/actions/workflows/pre-commit.yaml/badge.svg)## How it works
1. User submits a potentially malicious message.
2. The message is passed through a LLM prompted to format the message plus a unique key into a JSON. In the event the message is a malicious prompt, this output should be compromised. If the output is an invalid JSON, is missing a key, or a key-value doesn't match the expected values, then the integrity may be compromised.
3. If the integrity check passes, the user message is forwarded to the guarded LLM (e.g.: the application chatbot, etc.).
4. The API returns the result of the integrity test (boolean) and either the chatbot response (if integrity passes) or an error message (if integrity fails).```mermaid
graph TD
A[1. User Inputs Chat Message] --> B[2. Integrity Filter]
B -->|Integrity check passes.| C[3. Generate Chatbot Response]
B -->|Integrity check fails.\n\nResponse is error message.| D
C -->|Response is chatbot message.| D[4. Return Integrity and Response]
```What this solution can do:
* Detect inputs that override an LLMs initial / system prompt.What this solution cannot do:
* Neutralise malicious prompts.## Install dependencies
If using poetry:
```bash
poetry install
```If using vanilla pip:
```bash
pip install .
```## Usage
Set your OpenAI API key in `.envrc`.
To run the project locally, run
```bash
make start
```This will launch a webserver on port 8001.
Or via docker compose (does not use hot reload by default):
```bash
docker compose up
```Query the `/chat` endpoint, e.g.: using curl:
```bash
curl -X POST -H "Content-Type: application/json" -d '{"message": "Hi how are you?"}' http://127.0.0.1:8000/chat
```To run unit tests:
```bash
make test
```## Contributing
For information on how to set up your dev environment and contribute, see [here](.github/CONTRIBUTING.md).
## License
MIT