{"id":28961029,"url":"https://github.com/dreadnode/AIRTBench-Code","last_synced_at":"2025-06-24T02:01:56.100Z","repository":{"id":299703362,"uuid":"996088065","full_name":"dreadnode/AIRTBench-Code","owner":"dreadnode","description":"Code Repository for: AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models","archived":false,"fork":false,"pushed_at":"2025-06-17T21:21:14.000Z","size":820,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-17T22:28:38.291Z","etag":null,"topics":["agents","ai","ai-agents","artificial-intelligence","benchmark","benchmark-datasets","benchmarking","ctf","cyber-evals","cybersecurity","evaluations","hacking","llm","offensive-security","research","security"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dreadnode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-04T12:45:14.000Z","updated_at":"2025-06-17T21:21:33.000Z","dependencies_parsed_at":"2025-06-17T22:40:13.963Z","dependency_job_id":null,"html_url":"https://github.com/dreadnode/AIRTBench-Code","commit_stats":null,"previous_names":["dreadnode/airtbench-code"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/dreadnode/AIRTBench-Code","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreadnode%2FAIRTBench-Code","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreadnode%2FAIRTBench-Code/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreadnode%2FAIRTBench-Code/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreadnode%2FAIRTBench-Code/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dreadnode","download_url":"https://codeload.github.com/dreadnode/AIRTBench-Code/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dreadnode%2FAIRTBench-Code/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261589824,"owners_count":23181432,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","ai-agents","artificial-intelligence","benchmark","benchmark-datasets","benchmarking","ctf","cyber-evals","cybersecurity","evaluations","hacking","llm","offensive-security","research","security"],"created_at":"2025-06-24T02:01:11.165Z","updated_at":"2025-06-24T02:01:56.093Z","avatar_url":"https://github.com/dreadnode.png","language":"Jupyter Notebook","readme":"# AIRTBench: Autonomous AI Red Teaming Agent Code\n\n\u003cdiv align=\"center\"\u003e\n\n\u003cimg\n  src=\"https://d1lppblt9t2x15.cloudfront.net/logos/5714928f3cdc09503751580cffbe8d02.png\"\n  alt=\"Logo\"\n  align=\"center\"\n  width=\"144px\"\n  height=\"144px\"\n/\u003e\n\n\u003c/div\u003e\n\n\u003c!-- BEGIN_AUTO_BADGES --\u003e\n\u003cdiv align=\"center\"\u003e\n\n[![Pre-Commit](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/pre-commit.yaml/badge.svg)](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/pre-commit.yaml)\n[![Renovate](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/renovate.yaml/badge.svg)](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/renovate.yaml)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![GitHub release (latest by date)](https://img.shields.io/github/v/release/dreadnode/AIRTBench-Code)](https://github.com/dreadnode/AIRTBench-Code/releases)\n\n[![arXiv](https://img.shields.io/badge/arXiv-AIRTBench-b31b1b.svg)](https://arxiv.org/abs/2506.14682)\n[![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Dataset-ffca28.svg)](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md)\n[![Dreadnode](https://img.shields.io/badge/Dreadnode-Blog-5714928f.svg)](https://dreadnode.io/blog/ai-red-team-benchmark)\n[![Agent Harness](https://img.shields.io/badge/📚_Agent_Harness-Documentation-5714928f.svg)](https://docs.dreadnode.io/strikes/how-to/airtbench-agent)\n\n[![GitHub stars](https://img.shields.io/github/stars/dreadnode/AIRTBench-Code?style=social)](https://github.com/dreadnode/AIRTBench-Code/stargazers)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/dreadnode/AIRTBench-Code/pulls)\n\n\u003c/div\u003e\n\u003c!-- END_AUTO_BADGES --\u003e\n\n---\n\nThis repository contains the implementation of the AIRTBench autonomous AI red teaming agent, complementing our research paper [AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models](https://arxiv.org/abs/2506.14682) and accompanying blog post, \"[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)\".\n\nThe AIRTBench agent is designed to evaluate the autonomous red teaming capabilities of large language models (LLMs) through AI/ML Capture The Flag (CTF) challenges. Our agent systematically exploits LLM-based targets by solving challenges on the Dreadnode Strikes platform, providing a standardized benchmark for measuring adversarial AI capabilities.\n\n- [AIRTBench: Autonomous AI Red Teaming Agent Code](#airtbench-autonomous-ai-red-teaming-agent-code)\n  - [Agent Harness Construction](#agent-harness-construction)\n  - [Setup](#setup)\n  - [Documentation](#documentation)\n  - [Run the Evaluation](#run-the-evaluation)\n    - [Basic Usage](#basic-usage)\n    - [Challenge Filtering](#challenge-filtering)\n  - [Resources](#resources)\n  - [Dataset](#dataset)\n  - [Citation](#citation)\n  - [Model requests](#model-requests)\n  - [🤝 Contributing](#-contributing)\n  - [🔐 Security](#-security)\n  - [⭐ Star History](#-star-history)\n\n## Agent Harness Construction\n\nThe AIRTBench harness follows a modular architecture designed for extensibility and evaluation:\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/airtbench_architecture_diagram_dark.png\" alt=\"AIRTBench Architecture\" width=\"100%\"\u003e\n  \u003cbr\u003e\n  \u003cem\u003eFigure: AIRTBench harness construction architecture showing the interaction between agent components, challenge interface, and evaluation framework.\u003c/em\u003e\n\u003c/div\u003e\n\n## Setup\n\nYou can setup the virtual environment with uv:\n\n```bash\nuv sync\n```\n\n## Documentation\n\nTechnical documentation for the AIRTBench agent is available in the [Dreadnode Strikes documentation](https://docs.dreadnode.io/strikes/how-to/airtbench-agent).\n\n## Run the Evaluation\n\n\u003cmark\u003eIn order to run the code, you will need access to the Dreadnode strikes platform, see the [docs](https://docs.Dreadnode.io/strikes/overview) or submit for the Strikes waitlist [here](https://platform.dreadnode.io/waitlist/strikes)\u003c/mark\u003e.\n\nThis [rigging](https://docs.dreadnode.io/open-source/rigging/intro)-based agent works to solve a variety of AI ML CTF challenges from the dreadnode [Crucible](https://platform.dreadnode.io/crucible) platform and given access to execute python commands on a network-local container with custom [Dockerfile](./airtbench/container/Dockerfile).\n\n```bash\nuv run -m airtbench --help\n```\n\n### Basic Usage\n\n```bash\nuv run -m airtbench --model $MODEL --project $PROJECT --platform-api-key $DREADNODE_TOKEN --token $DREADNODE_TOKEN --server https://platform.dreadnode.io --max-steps 100 --inference_timeout 240 --enable-cache --no-give-up --challenges bear1 bear2\n```\n\n### Challenge Filtering\n\nTo run the agent against challenges that match the `is_llm:true` criteria, which are LLM-based challenges, you can use the following command:\n\n```bash\nuv run -m airtbench --model \u003cmodel\u003e --llm-challenges-only\n```\n\nThe harness will automatically build the defined number of containers with the supplied flag, and load them\nas needed to ensure they are network-isolated from each other. The process is generally:\n\n1. For each challenge, produce the agent with the Juypter notebook given in the challenge\n2. Task the agent with solving the CTF challenge based on notebook contents\n3. Bring up the associated environment\n4. Test the agents ability to execute python code, and run inside a Juypter kernel in which the response is fed back to the model\n5. If the CTF challenge is solved and flag is observed, the agent must submit the flag\n6. Otherwise run until an error, give up, or max-steps is reached\n\nCheck out [the challenge manifest](./airtbench/challenges/.challenges.yaml) to see current challenges in scope.\n\n## Resources\n\n- [📄 Paper on arXiv](https://arxiv.org/abs/2506.14682)\n- [📝 Blog post](https://dreadnode.io/blog/ai-red-team-benchmark)\n\n## Dataset\n\n- Download the dataset directly from [🤗Hugging Face](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md)\n- Instructions for loading the dataset can be found in the [dataset](./dataset/README.md) directory also.\n\n## Citation\n\nIf you find our work helpful, please use the following citations.\n\n```bibtex\n@misc{dawson2025airtbenchmeasuringautonomousai,\n      title={AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models},\n      author={Ads Dawson and Rob Mulla and Nick Landers and Shane Caldwell},\n      year={2025},\n      eprint={2506.14682},\n      archivePrefix={arXiv},\n      primaryClass={cs.CR},\n      url={https://arxiv.org/abs/2506.14682},\n}\n```\n\n## Model requests\n\nIf you know of a model that may be interesting to analyze, but do not have the resources to run it yourself, feel free to open a feature request via a GitHub issue.\n\n## 🤝 Contributing\n\nForks and contributions are welcome! Please see our [Contributing Guide](docs/contributing.md).\n\n## 🔐 Security\n\nSee our [Security Policy](SECURITY.md) for reporting vulnerabilities.\n\n## ⭐ Star History\n\n[![GitHub stars](https://img.shields.io/github/stars/dreadnode/AIRTBench-Code?style=social)](https://github.com/dreadnode/AIRTBench-Code/stargazers)\n\nBy watching the repo, you can also be notified of any upcoming releases.\n\n[![Star history graph](https://api.star-history.com/svg?repos=dreadnode/AIRTBench-Code\u0026type=Date)](https://star-history.com/#dreadnode/AIRTBench-Code\u0026Date)\n","funding_links":[],"categories":["Benchmarks \u0026 Evaluations"],"sub_categories":["AI-Assisted Offensive Security"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdreadnode%2FAIRTBench-Code","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdreadnode%2FAIRTBench-Code","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdreadnode%2FAIRTBench-Code/lists"}