https://github.com/cutwell/canary
LLM prompt injection detection
https://github.com/cutwell/canary
fastapi generative-ai openai prompt-injection
Last synced: 4 months ago
JSON representation
LLM prompt injection detection
- Host: GitHub
- URL: https://github.com/cutwell/canary
- Owner: Cutwell
- License: mit
- Created: 2023-09-19T19:57:49.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-05-29T10:39:08.000Z (6 months ago)
- Last Synced: 2025-05-29T12:29:10.523Z (6 months ago)
- Topics: fastapi, generative-ai, openai, prompt-injection
- Language: Python
- Homepage:
- Size: 5 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
#
Canary
LLM prompt injection detection.
[](https://opensource.org/licenses/MIT)


## How it works
1. User submits a potentially malicious message.
2. The message is passed through a LLM prompted to format the message plus a unique key into a JSON. In the event the message is a malicious prompt, this output should be compromised. If the output is an invalid JSON, is missing a key, or a key-value doesn't match the expected values, then the integrity may be compromised.
3. If the integrity check passes, the user message is forwarded to the guarded LLM (e.g.: the application chatbot, etc.).
4. The API returns the result of the integrity test (boolean) and either the chatbot response (if integrity passes) or an error message (if integrity fails).
```mermaid
graph TD
A[1: User Inputs Chat Message] --> B[2: Integrity Filter]
B -->|Integrity check passes.| C[3: Generate Chatbot Response]
B -->|Integrity check fails. Response is error message.| D
C -->|Response is chatbot message.| D[4: Return Integrity and Response]
```
What this solution can do:
* Detect inputs that override an LLMs initial / system prompt.
What this solution cannot do:
* Neutralise malicious prompts.
## Install dependencies
If using poetry:
```bash
poetry install
```
If using vanilla pip:
```bash
pip install .
```
## Usage
Set your OpenAI API key in `.envrc`.
To run the project locally, run
```bash
make start
```
This will launch a webserver on port 8001.
Or via docker compose (does not use hot reload by default):
```bash
docker compose up
```
Query the `/chat` endpoint, e.g.: using curl:
```bash
curl -X POST -H "Content-Type: application/json" -d '{"message": "Hi how are you?"}' http://127.0.0.1:8000/chat
```
To run unit tests:
```bash
make test
```
## Contributing
For information on how to set up your dev environment and contribute, see [here](.github/CONTRIBUTING.md).
## License
MIT