https://github.com/humancompatibleai/tensor-trust
A prompt injection game to collect data for robust ML research
https://github.com/humancompatibleai/tensor-trust
ctf django game htmx jailbreaks large-language-models llm llms prompt-engineering prompt-injection prompting security
Last synced: 8 months ago
JSON representation
A prompt injection game to collect data for robust ML research
- Host: GitHub
- URL: https://github.com/humancompatibleai/tensor-trust
- Owner: HumanCompatibleAI
- License: bsd-2-clause
- Created: 2023-06-05T17:14:18.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-01-27T08:39:22.000Z (over 1 year ago)
- Last Synced: 2025-05-30T03:40:21.412Z (about 1 year ago)
- Topics: ctf, django, game, htmx, jailbreaks, large-language-models, llm, llms, prompt-engineering, prompt-injection, prompting, security
- Language: Python
- Homepage: https://tensortrust.ai/paper
- Size: 8.34 MB
- Stars: 60
- Watchers: 6
- Forks: 5
- Open Issues: 36
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tensor Trust
## A prompt injection attack game to collect data for adversarial ML research
This is the source code for the Tensor Trust web game and data cleaning pipeline. See the [paper website](https://tensortrust.ai/paper) for more details on the project. You can also [use the data](https://github.com/HumanCompatibleAI/tensor-trust-data), or [go play the game!](https://tensortrust.ai/)
If you build on our code or data in an academic publication, please cite us with the following BibTeX:
```bibtex
@misc{toyer2023tensor,
title={{Tensor Trust}: Interpretable Prompt Injection Attacks from an Online Game},
author={Toyer, Sam and Watkins, Olivia and Mendes, Ethan Adrian and Svegliato, Justin and Bailey, Luke and Wang, Tiffany and Ong, Isaac and Elmaaroufi, Karim and Abbeel, Pieter and Darrell, Trevor and Ritter, Alan and Russell, Stuart},
year={2023},
journal={arXiv preprint arXiv:2311.01011},
url={https://arxiv.org/pdf/2311.01011.pdf}
}
```
### Installation
To install and run, first set up OpenAI API key if you have not already:
1. Login to OpenAI account and go to `https://platform.openai.com/account/api-keys`.
2. Create an API key.
3. Now open a shell: on Windows run `set OPENAI_API_KEY=`, and on Unix run `export OPENAI_API_KEY=`.
Now run the following:
```bash
# Install Redis on Ubuntu. For other OSes see:
# https://redis.io/docs/getting-started/installation/
sudo apt install redis
# If this command fails, try running `redis-server` directly
sudo systemctl enable redis-server \
&& sudo systemctl restart redis-server
# Install node.js on Ubuntu. For other OSes see:
# https://nodejs.org/en/download
# If this command doesn't work, try installing using nvm. See
# https://www.digitalocean.com/community/tutorials/how-to-install-node-js-on-ubuntu-20-04#option-3-installing-node-using-the-node-version-manager
sudo snap install node --classic
# setup:
conda create -n promptgame python=3.10
conda activate promptgame
pip install -e '.[dev]'
./manage.py tailwind install # install JS modules for Tailwind
./manage.py migrate # set up database
# For testing, we need two commands.
# Run this first command in one terminal to update the stylesheet in response to Tailwind changes:
./manage.py tailwind start
# Now run this second command in another terminal to a Django server
./manage.py runserver # run demo server (will auto-restart when you edit files)
```
Now you can visit a development copy of the website at
[http://localhost:8000/](http://localhost:8000/).
### Database Management
Django handles database management with `Models`, which we define in `src/promptgame/gameui/models.py`. Whenever
you edit a `Model`, you need the change to be reflected in the underlying database that
Django is managing. To do this, run:
```bash
./manage.py makemigrations
./manage.py migrate
```
In git terms, `makemigrations` is like creating a commit recording your change to the database. This migration
is actually tracked within a file in the `src/promptgame/migrations` directory. Running `migrate` is like
pushing this commit, and thus actually updates the database. To find out more about this process (including
how to do more complex behavior such as revert your database back to a previous migration state), click
[here](https://docs.djangoproject.com/en/4.2/topics/migrations/).
Note that if you are pulling from `main` after someone has made a change to a model, you will also have to run `./manage.py migrate` to apply the new migrations generated by the other person.
### Creating an admin account
To create an admin account, run:
```bash
./manage.py createsuperuser
```
Follow the prompts to create a username and password.
### Viewing the admin interface
Log in to the admin page at [localhost:8000/private/dj-login/](http://localhost:8000/private/dj-login/).
On the prod site, this will be at [tensortrust.ai/private/dj-login/](https://tensortrust.ai/private/dj-login/).
Enter the username and password you created above. If you are on the prod site, you'll have to get the password by opening a terminal and running `gcloud secrets versions access --secret=promptgame_prod_application_settings latest`.
### What's up with Tailwind?
Tailwind is a [CSS framework](https://tailwindcss.com/) that makes it easier to
embed CSS directly in your HTML tags, as opposed to putting your HTML source and
your CSS source on different places. It works by stuffing style information
into a set of predefined classes, like this mix of HTML and Tailwind classes
that defines a rounded purple button:
```html
This is a button!
```
You might notice from this example that the set of possible Tailwind classes is
really large. e.g. `text-[0.8125rem]` makes the text 0.8125 rem high, but what
if the user asked for 0.31 rem or $\pi$ rem? It turns out that Tailwind allows
for an unlimited number of possibilities, so the set of valid Tailwind classes
is technically infinite.
Of course, browsers can only handle a finite number of defined, styled classes,
so Tailwind needs some way of figuring out which classes it actually has to
generate and which it can skip. It does this using a CSS compiler. For
development purposes, the compiler can be run dynamically in your web browser by
inserting this tag into the head of your document:
```html
```
This works but has the drawback of [being slow and sometimes causing unstyled
content to
display](https://github.com/tailwindlabs/tailwindcss/discussions/7637). I'm also
slightly worried that we'd be banned from their CDN if we used it in production,
but I don't know how likely that actually is.
For both of these reasons, we instead use Tailwind's server-side compiler (via
[django-tailwind](https://django-tailwind.readthedocs.io/en/latest/installation.html)).
The server-side compiler is written in Javascript, which is why we need Node,
and also why we need to run `./manage.py tailwind install` to download all of
Tailwind's dependencies when first installing on a new machine. The compiler
scans your source code (HTML, Python, Javascript) for things that look like
Tailwind class names, then generates all of them and puts them into this
stylesheet:
```
src/promptgame/theme/static/css/dist/styles.css
```
The stylesheet is checked into version control, so when you run `./manage.py
tailwind start`, the changes made by the live compiler will also show up in `git
diffs`. This is a bit ugly but ultimately fine, because the produced
`styles.css` file is only a few thousand lines long.
### Django Silk
To use view the Django Silk UI visit [http://127.0.0.1:8000/silk/](http://127.0.0.1:8000/silk/).
### Deployment on GCP
This project is configured to be deployed on GCP. It turned out to be
surprisingly complicated, since we needed:
- Cloud Run to serve the web app itself.
- Cloud SQL (managed Postgres) to serve as a database.
- Cloud Memorystore (managed Redis) as a replacement for vanilla Redis.
- Cloud Storage to serve static files.
- Cloud Build, Compute Engine, etc.
The details of how it is all set up are in an internal doc (please see internal TT channel if you're a CHAI affiliate who needs access).
To deploy a new version of the website, you only need to know a tiny subset of
what's in that doc. Once you have appropriate permissions on the
`prompt-ad-game` GCP project, you can cut a new staging deployment like this:
1. You commit your changes to the git repo (and ideally push).
2. Set up the project of gcloud:
```gcloud auth login && gcloud config set project prompt-ad-game```
3. From the root of your repo, run a Cloud Build command to create a new Docker image:
```bash
staging_image_tag="$(git rev-parse --short=7 HEAD)$(git diff --quiet || echo "-drt")" \
&& gcloud builds submit -t "gcr.io/prompt-ad-game/promptgame-staging:$staging_image_tag" \
&& yes | gcloud container images add-tag \
gcr.io/prompt-ad-game/promptgame-staging:{"$staging_image_tag",latest}
```
This will build an image on Google's servers using the current git repo and
the `Dockerfile` in the root of the repo. The image will be named
`gcr.io/prompt-ad-game/promptgame-staging` with a `:latest` tag, as well as a
tag consisting of the last 7 digits of the current git revision.
4. Apply migrations to the staging instance, and collect static files (this
implicitly uses the `:latest` image that you built above):
```bash
gcloud run jobs execute promptgame-staging-collect-and-migrate \
--region us-central1 --wait
```
5. Deploy to the staging site with this command:
```bash
./deploy/replace_cloud_run_service.py staging
```
If all commands succeed, the app should be running on our staging site! You can use this as an
opportunity to play with it in a low-stakes setting—it's fine if our staging
site gets messed up, so long as we fix the bugs before going to production.
Once you've verified that the app works in staging, you can push it to
production:
1. Add a new tag to the staging image you generated above to indicate that
you're ready to use it in production as well. In this case I used revision
`0f043fc`, but you can figure out the right tag for you image using this
command:
```bash
gcloud container images list-tags \
gcr.io/prompt-ad-game/promptgame-staging
```
Once you have the right tag for the staging image, you can use this command to also tag that image as the latest production image:
```bash
# can replace -staging:latest with -staging:
yes | gcloud container images add-tag \
gcr.io/prompt-ad-game/promptgame-staging:latest \
gcr.io/prompt-ad-game/promptgame-prod:latest
```
2. Now collect static and run migrations:
```bash
gcloud run jobs execute promptgame-prod-collect-and-migrate \
--region us-central1 --wait
```
3. Finally, deploy to Cloud Run:
```bash
./deploy/replace_cloud_run_service.py prod
```
Once you've completed all these steps, the code you ran successfully on the
staging site should be available on the staging site as well!
There are lots of other details I haven't covered here, like how to add new
settings that differ between staging and prod, or how to re-create the staging
environment from scratch. The (very long) Google doc linked above should answer
some of those questions, but you can also ping Sam on Slack if you want
pointers.