{"id":28381109,"url":"https://github.com/humancompatibleai/tensor-trust","last_synced_at":"2025-10-10T04:34:07.221Z","repository":{"id":199334518,"uuid":"649805178","full_name":"HumanCompatibleAI/tensor-trust","owner":"HumanCompatibleAI","description":"A prompt injection game to collect data for robust ML research","archived":false,"fork":false,"pushed_at":"2025-01-27T08:39:22.000Z","size":8742,"stargazers_count":60,"open_issues_count":36,"forks_count":5,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-30T03:40:21.412Z","etag":null,"topics":["ctf","django","game","htmx","jailbreaks","large-language-models","llm","llms","prompt-engineering","prompt-injection","prompting","security"],"latest_commit_sha":null,"homepage":"https://tensortrust.ai/paper","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HumanCompatibleAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-05T17:14:18.000Z","updated_at":"2025-05-28T02:58:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"ce5c25df-9d16-4225-8f5b-672fc673461e","html_url":"https://github.com/HumanCompatibleAI/tensor-trust","commit_stats":null,"previous_names":["humancompatibleai/tensor-trust"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HumanCompatibleAI/tensor-trust","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HumanCompatibleAI%2Ftensor-trust","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HumanCompatibleAI%2Ftensor-trust/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HumanCompatibleAI%2Ftensor-trust/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HumanCompatibleAI%2Ftensor-trust/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HumanCompatibleAI","download_url":"https://codeload.github.com/HumanCompatibleAI/tensor-trust/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HumanCompatibleAI%2Ftensor-trust/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259542188,"owners_count":22873769,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ctf","django","game","htmx","jailbreaks","large-language-models","llm","llms","prompt-engineering","prompt-injection","prompting","security"],"created_at":"2025-05-30T03:38:07.027Z","updated_at":"2025-10-10T04:34:02.162Z","avatar_url":"https://github.com/HumanCompatibleAI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tensor Trust\n\n## A prompt injection attack game to collect data for adversarial ML research\n\nThis is the source code for the Tensor Trust web game and data cleaning pipeline. See the [paper website](https://tensortrust.ai/paper) for more details on the project. You can also [use the data](https://github.com/HumanCompatibleAI/tensor-trust-data), or [go play the game!](https://tensortrust.ai/)\n\nIf you build on our code or data in an academic publication, please cite us with the following BibTeX:\n\n```bibtex\n@misc{toyer2023tensor,\n    title={{Tensor Trust}: Interpretable Prompt Injection Attacks from an Online Game},\n    author={Toyer, Sam and Watkins, Olivia and Mendes, Ethan Adrian and Svegliato, Justin and Bailey, Luke and Wang, Tiffany and Ong, Isaac and Elmaaroufi, Karim and Abbeel, Pieter and Darrell, Trevor and Ritter, Alan and Russell, Stuart},\n    year={2023},\n    journal={arXiv preprint arXiv:2311.01011},\n    url={https://arxiv.org/pdf/2311.01011.pdf}\n}\n```\n\n### Installation\n\nTo install and run, first set up OpenAI API key if you have not already:\n\n1. Login to OpenAI account and go to `https://platform.openai.com/account/api-keys`.\n2. Create an API key.\n3. Now open a shell: on Windows run `set OPENAI_API_KEY=\u003cyour-key\u003e`, and on Unix run `export OPENAI_API_KEY=\u003cyour-key\u003e`.\n\nNow run the following:\n\n```bash\n# Install Redis on Ubuntu. For other OSes see: \n# https://redis.io/docs/getting-started/installation/\nsudo apt install redis\n# If this command fails, try running `redis-server` directly\nsudo systemctl enable redis-server \\\n    \u0026\u0026 sudo systemctl restart redis-server\n# Install node.js on Ubuntu. For other OSes see:\n# https://nodejs.org/en/download\n# If this command doesn't work, try installing using nvm. See\n# https://www.digitalocean.com/community/tutorials/how-to-install-node-js-on-ubuntu-20-04#option-3-installing-node-using-the-node-version-manager\nsudo snap install node --classic\n\n# setup:\nconda create -n promptgame python=3.10\nconda activate promptgame\npip install -e '.[dev]'\n\n./manage.py tailwind install  # install JS modules for Tailwind\n./manage.py migrate  # set up database\n\n# For testing, we need two commands.\n# Run this first command in one terminal to update the stylesheet in response to Tailwind changes:\n./manage.py tailwind start\n\n# Now run this second command in another terminal to a Django server\n./manage.py runserver  # run demo server (will auto-restart when you edit files)\n```\n\nNow you can visit a development copy of the website at\n[http://localhost:8000/](http://localhost:8000/).\n\n### Database Management\n\nDjango handles database management with `Models`, which we define in `src/promptgame/gameui/models.py`. Whenever \nyou edit a `Model`, you need the change to be reflected in the underlying database that \nDjango is managing. To do this, run:\n\n```bash\n./manage.py makemigrations \n\n./manage.py migrate\n```\n\nIn git terms, `makemigrations` is like creating a commit recording your change to the database. This migration \nis actually tracked within a file in the `src/promptgame/migrations` directory. Running `migrate` is like \npushing this commit, and thus actually updates the database. To find out more about this process (including \nhow to do more complex behavior such as revert your database back to a previous migration state), click \n[here](https://docs.djangoproject.com/en/4.2/topics/migrations/).\n\nNote that if you are pulling from `main` after someone has made a change to a model, you will also have to run `./manage.py migrate` to apply the new migrations generated by the other person.\n\n### Creating an admin account\n\nTo create an admin account, run:\n\n```bash\n./manage.py createsuperuser\n```\n\nFollow the prompts to create a username and password. \n\n\n### Viewing the admin interface\n\nLog in to the admin page at [localhost:8000/private/dj-login/](http://localhost:8000/private/dj-login/).\nOn the prod site, this will be at [tensortrust.ai/private/dj-login/](https://tensortrust.ai/private/dj-login/).\n\nEnter the username and password you created above. If you are on the prod site, you'll have to get the password by opening a terminal and running `gcloud secrets versions access --secret=promptgame_prod_application_settings latest`.\n\n\n### What's up with Tailwind?\n\nTailwind is a [CSS framework](https://tailwindcss.com/) that makes it easier to\nembed CSS directly in your HTML tags, as opposed to putting your HTML source and\nyour CSS source on different places.  It works by stuffing style information\ninto a set of predefined classes, like this mix of HTML and Tailwind classes\nthat defines a rounded purple button:\n\n```html\n\u003cdiv class=\"ml-8 rounded-md bg-indigo-600 px-3 py-2 text-[0.8125rem]\n            font-semibold leading-5 text-white hover:bg-indigo-500\"\u003e\n    This is a button!\n\u003c/div\u003e\n```\n\nYou might notice from this example that the set of possible Tailwind classes is\nreally large. e.g. `text-[0.8125rem]` makes the text 0.8125 rem high, but what\nif the user asked for 0.31 rem or $\\pi$ rem? It turns out that Tailwind allows\nfor an unlimited number of possibilities, so the set of valid Tailwind classes\nis technically infinite.\n\nOf course, browsers can only handle a finite number of defined, styled classes,\nso Tailwind needs some way of figuring out which classes it actually has to\ngenerate and which it can skip. It does this using a CSS compiler. For\ndevelopment purposes, the compiler can be run dynamically in your web browser by\ninserting this tag into the head of your document:\n\n```html\n\u003cscript src=\"https://cdn.tailwindcss.com\"\u003e\u003c/script\u003e\n```\n\nThis works but has the drawback of [being slow and sometimes causing unstyled\ncontent to\ndisplay](https://github.com/tailwindlabs/tailwindcss/discussions/7637). I'm also\nslightly worried that we'd be banned from their CDN if we used it in production,\nbut I don't know how likely that actually is.\n\nFor both of these reasons, we instead use Tailwind's server-side compiler (via\n[django-tailwind](https://django-tailwind.readthedocs.io/en/latest/installation.html)).\nThe server-side compiler is written in Javascript, which is why we need Node,\nand also why we need to run `./manage.py tailwind install` to download all of\nTailwind's dependencies when first installing on a new machine.  The compiler\nscans your source code (HTML, Python, Javascript) for things that look like\nTailwind class names, then generates all of them and puts them into this\nstylesheet:\n\n```\nsrc/promptgame/theme/static/css/dist/styles.css\n```\n\nThe stylesheet is checked into version control, so when you run `./manage.py\ntailwind start`, the changes made by the live compiler will also show up in `git\ndiffs`. This is a bit ugly but ultimately fine, because the produced\n`styles.css` file is only a few thousand lines long.\n\n### Django Silk\nTo use view the Django Silk UI visit [http://127.0.0.1:8000/silk/](http://127.0.0.1:8000/silk/).\n\n### Deployment on GCP\n\nThis project is configured to be deployed on GCP. It turned out to be\nsurprisingly complicated, since we needed:\n\n- Cloud Run to serve the web app itself.\n- Cloud SQL (managed Postgres) to serve as a database.\n- Cloud Memorystore (managed Redis) as a replacement for vanilla Redis.\n- Cloud Storage to serve static files.\n- Cloud Build, Compute Engine, etc.\n\nThe details of how it is all set up are in an internal doc (please see internal TT channel if you're a CHAI affiliate who needs access).\n\nTo deploy a new version of the website, you only need to know a tiny subset of\nwhat's in that doc. Once you have appropriate permissions on the\n`prompt-ad-game` GCP project, you can cut a new staging deployment like this:\n\n1. You commit your changes to the git repo (and ideally push).\n2. Set up the project of gcloud:\n   ```gcloud auth login \u0026\u0026 gcloud config set project prompt-ad-game```\n3. From the root of your repo, run a Cloud Build command to create a new Docker image:\n   ```bash\n   staging_image_tag=\"$(git rev-parse --short=7 HEAD)$(git diff --quiet || echo \"-drt\")\" \\\n     \u0026\u0026 gcloud builds submit -t \"gcr.io/prompt-ad-game/promptgame-staging:$staging_image_tag\" \\\n     \u0026\u0026 yes | gcloud container images add-tag \\\n        gcr.io/prompt-ad-game/promptgame-staging:{\"$staging_image_tag\",latest}\n   ```\n   This will build an image on Google's servers using the current git repo and\n   the `Dockerfile` in the root of the repo. The image will be named\n   `gcr.io/prompt-ad-game/promptgame-staging` with a `:latest` tag, as well as a\n   tag consisting of the last 7 digits of the current git revision.\n4. Apply migrations to the staging instance, and collect static files (this\n   implicitly uses the `:latest` image that you built above):\n   ```bash\n   gcloud run jobs execute promptgame-staging-collect-and-migrate \\\n     --region us-central1 --wait\n   ```\n5. Deploy to the staging site with this command:\n   ```bash\n   ./deploy/replace_cloud_run_service.py staging\n   ```\n\nIf all commands succeed, the app should be running on our staging site! You can use this as an\nopportunity to play with it in a low-stakes setting—it's fine if our staging\nsite gets messed up, so long as we fix the bugs before going to production.\n\nOnce you've verified that the app works in staging, you can push it to\nproduction:\n\n1. Add a new tag to the staging image you generated above to indicate that\n   you're ready to use it in production as well. In this case I used revision\n   `0f043fc`, but you can figure out the right tag for you image using this\n   command:\n   ```bash\n   gcloud container images list-tags \\\n     gcr.io/prompt-ad-game/promptgame-staging\n   ```\n   Once you have the right tag for the staging image, you can use this command to also tag that image as the latest production image:\n   ```bash\n   # can replace -staging:latest with -staging:\u003cyour tag\u003e\n   yes | gcloud container images add-tag \\\n     gcr.io/prompt-ad-game/promptgame-staging:latest \\\n     gcr.io/prompt-ad-game/promptgame-prod:latest\n   ```\n2. Now collect static and run migrations:\n   ```bash\n   gcloud run jobs execute promptgame-prod-collect-and-migrate \\\n     --region us-central1 --wait\n   ```\n3. Finally, deploy to Cloud Run:\n   ```bash\n   ./deploy/replace_cloud_run_service.py prod\n   ```\n\nOnce you've completed all these steps, the code you ran successfully on the\nstaging site should be available on the staging site as well!\n\nThere are lots of other details I haven't covered here, like how to add new\nsettings that differ between staging and prod, or how to re-create the staging\nenvironment from scratch. The (very long) Google doc linked above should answer\nsome of those questions, but you can also ping Sam on Slack if you want\npointers.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhumancompatibleai%2Ftensor-trust","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhumancompatibleai%2Ftensor-trust","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhumancompatibleai%2Ftensor-trust/lists"}