{"id":41947826,"url":"https://github.com/facebookresearch/aira-dojo","last_synced_at":"2026-02-04T17:01:27.929Z","repository":{"id":302814146,"uuid":"1003075563","full_name":"facebookresearch/aira-dojo","owner":"facebookresearch","description":"AIRA-dojo: a framework for developing and evaluating AI research agents","archived":false,"fork":false,"pushed_at":"2025-09-22T22:43:32.000Z","size":240,"stargazers_count":92,"open_issues_count":1,"forks_count":12,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-23T00:23:00.371Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-16T15:27:10.000Z","updated_at":"2025-09-16T10:05:41.000Z","dependencies_parsed_at":"2025-07-04T11:51:15.661Z","dependency_job_id":null,"html_url":"https://github.com/facebookresearch/aira-dojo","commit_stats":null,"previous_names":["facebookresearch/aira-dojo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/facebookresearch/aira-dojo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faira-dojo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faira-dojo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faira-dojo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faira-dojo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/aira-dojo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Faira-dojo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29091317,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-04T03:31:03.593Z","status":"ssl_error","status_checked_at":"2026-02-04T03:29:50.742Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-25T20:00:26.085Z","updated_at":"2026-02-04T17:01:27.920Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","funding_links":[],"categories":["AutoML Agents"],"sub_categories":[],"readme":"# `aira-dojo`: AI Research Agent DOJO \n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://creativecommons.org/licenses/by-nc/4.0/deed.en\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-CC--BY--NC%204.0-lightgrey\"/\u003e\u003c/a\u003e\n  \u003ca href=\"https://arxiv.org/abs/2507.02554\"\u003e\n  \u003cimg alt=\"Documentation\" src=\"https://img.shields.io/badge/arXiv-2507.02554-b31b1b.svg\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n`aira-dojo` is a scalable and customizable framework for AI research agents, designed to accelerate hill-climbing on research capabilities toward a fully automated AI research scientist.\nThe framework provides a general abstraction for tasks and agents, implements the MLE-bench task, and includes the state-of-the-art agents introduced in our paper, “AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench.” Additionally, it features an isolated code execution environment that integrates smoothly with job schedulers like Slurm.\nThe framework enabled 1,000 agents to run in parallel for up to 120 hours, uncovering valuable insights and results detailed in the paper.\n\n## 📚 Documentation\n\nThe following documentation is available to help you get started with `aira-dojo`:\n\n- [Installation Guide](./docs/INSTALLATION.md) - Detailed setup instructions\n- [Project Structure](./docs/PROJECT_STRUCTURE.md) - Overview of the codebase organization\n- [Task Development Guide](./docs/TASK_DEVELOPMENT.md) - How to create new tasks\n- [Solver Development Guide](./docs/SOLVER_DEVELOPMENT.md) - How to implement new solvers\n- [Running Experiments](./docs/RUNNING_EXPERIMENTS.md) - How to run experiments with `aira-dojo`\n- [Building Superimage](./docs/BUILD_SUPERIMAGE.md) - Instructions for building the superimage container\n\n## Terminology\n\n**Task**: A specific problem or challenge that the AI agent (solver) is designed to solve. Each task has a defined execution environment, solver action space, and evaluation function.\n\n**Solver**: An AI agent that attempts to solve a given task. A solver is composed of:\n- **Operators**: Functions that are used to generate new solutions (e.g., a call to an LLM with a specific prompt and some context).\n- **Search Policy**: The method used to explore the solution space and orchestrate the execution of operators (e.g., greedy search, evolutionary search, Monte Carlo Tree Search)\n\n**Run**: A single execution in which a **solver** (an AI agent) attempts to solve a given **task**.\n\n**Runner**: A component used to parallelize runs. It manages and orchestrates multiple solver-task pairs concurrently, allowing large-scale experiments and rapid iteration across a portfolio of tasks and solvers.\n\nThe diagram below gives a high-level overview of the key components of the framework and how they interact.\n\u003cp align=\"center\"\u003e\n      \u003cbr/\u003e\n      \u003cimg width=\"800\" alt=\"image\" src=\"assets/diagram.png\" /\u003e\n      \u003cbr/\u003e\n\u003cp\u003e\n\n## Quick Start\n\n### **1. Clone the Repository**\n```bash\ngit clone https://github.com/facebookresearch/aira-dojo\ncd aira-dojo\n```\n\n### **2. Create the conda environment**\n```bash\nconda env create -f environment.yaml\nconda activate aira-dojo\n```\n\n### **3. Install aira-dojo via pip**\n```bash\npip install -e .\n```\n\n### **4. Set up Environment Variables**\n```bash\ncp .env_default .env\n# Edit .env with your specific configuration\n```\nNote that the `.env` file is ignored by git to avoid accidentally pushing tokens to github.\n\n### **5. Change LLM Client Configs**\nIf you are using different endpoints, you should change them accordingly in `dojo/configs/run/solver/client`\nExamples:\n- **Changing Azure endpoint for 4o:**\n\n  Go to [`src/dojo/configs/run/solver/client/litellm_4o.yaml`](./src/dojo/configs/solver/client/litellm_4o.yaml) and change the `base_url` to your Azure endpoint:\n  ```yaml\n    ...\n    base_url: https://azure-services-endpoint-here.azure-api.net #\u003c---- Set to your Azure endpoint\n    ...\n  ```\n- **Changing to openai endpoint for 4o:**\n\n  Go to [`src/dojo/configs/run/solver/client/litellm_4o.yaml`](./src/dojo/configs/solver/client/litellm_4o.yaml) and change the `base_url` and `use_azure_client` to the following:\n  ```yaml\n    ...\n    base_url: null  # litellm will use the openai endpoint by default\n    use_azure_client: False\n    ...\n  ```\n  Finally, in `.env`, set your primary key to your openai key:\n  ```yaml\n  PRIMARY_KEY=\"sk-...\" # \u003c---- Set to your OpenAI key\u003e\n  ```\n\nNote: To run the examples in the \"Example Usage\" section of this read me, you must setup the following models:\n- `o3`: Set the `base_url` in [`src//dojo/configs/solver/client/litellm_o3.yaml`](./src//dojo/configs/solver/client/litellm_o3.yaml) and set the `PRIMARY_KEY_O3` in `.env`.\n- `gpt-4o`: Set the `base_url` in [`src//dojo/configs/solver/client/litellm_4o.yaml`](./src/dojo/configs/solver/client/litellm_4o.yaml) and set the `PRIMARY_KEY` in `.env`.\n\n### **6. Build a superimage with apptainer**\nFollow the steps in [`docs/BUILD_SUPERIMAGE.md`](./docs/BUILD_SUPERIMAGE.md) to build your superimage. This is necessary to run tasks that use jupyter as the interpreter.\n\n### **7. Install mle-bench and run you first task**\nFollow the steps in [`src/dojo/tasks/mlebench/README.md`](./src/dojo/tasks/mlebench/README.md) to install mle-bench and run your first task.\n\n### **8. Setting up wandb**\nLog in with the following command:\n```bash\n  wandb login\n```\nIt will ask you your API key, which you can get by going into \"User settings\" (click top right of screen) and scrolling down.\n\n## Example Usage\n\n### Single-Run Example\n```bash\n# Runs AIRA_GREEDY on a single MLE-bench task\npython -m dojo.main_run +_exp=run_example logger.use_wandb=False\n```\n\nSee the config [run_example.yaml](./src/dojo/configs/_exp/run_example.yaml) for details.\n\n### Parallel-Run (Runner) Example\n```bash\n# Runs AIRA_GREEDY on our quick-dev set of MLE-bench tasks\npython -m dojo.main_runner_job_array +_exp=runner_example logger.use_wandb=False launcher.debug=True\n```\n\nSee the config [runner_example.yaml](./src/dojo/configs/_exp/runner_example.yaml) for details.\n\n### Hydra Multi Parallel-Run Example\n```bash\n# Runs AIRA_GREEDY on our quick-dev set of MLE-bench tasks\npython -m dojo.main_runner_job_array +_exp=runner_multi_example logger.use_wandb=False launcher.debug=True\n```\n\nSee the config [runner_multi_example.yaml](./src/dojo/configs/_exp/runner_multi_example.yaml) for details.\n\n### Running AIRA\u003csub\u003eGREEDY\u003c/sub\u003e , AIDE\u003csub\u003eGREEDY\u003c/sub\u003e , AIRA\u003csub\u003eMCTS\u003c/sub\u003e and AIRA_\u003csub\u003eEVO\u003c/sub\u003e on MLEbench lite\n\nNote: Make you set `\u003c\u003c\u003cDEFAULT_SLURM_ACCOUNT\u003e\u003e\u003e`, `\u003c\u003c\u003cDEFAULT_SLURM_QOS\u003e\u003e\u003e`, and `\u003c\u003c\u003cDEFAULT_SLURM_PARTITION\u003e\u003e\u003e` with your actual Slurm account, QoS, and partition settings in your `.env` before running these commands\n\n```bash\n# Runs AIRA_GREEDY on MLE-bench lite tasks\npython -m dojo.main_runner_job_array +_exp=mlebench/aide_greedy_o3 logger.use_wandb=False launcher.debug=False\n# Runs AIDE_GREEDY on MLE-bench lite tasks\npython -m dojo.main_runner_job_array +_exp=mlebench/aira_greedy_o3 logger.use_wandb=False launcher.debug=False\n\n# Runs AIRA_MCTS on MLE-bench lite tasks\npython -m dojo.main_runner_job_array +_exp=mlebench/aira_evo_o3 logger.use_wandb=False launcher.debug=False\n\n# Runs AIRA_EVO on MLE-bench lite tasks\npython -m dojo.main_runner_job_array +_exp=mlebench/aira_mcts_o3 logger.use_wandb=False launcher.debug=False\n```\n\n### Analyse and Visualize Results\nTo visualize results checkout [src/dojo/ui/README](./src/dojo/ui/README.md). To learn how to load and extract the best node of each experiment, checkout [notebooks/analyze_results.ipynb](./notebooks/analyze_results.ipynb).\n\n## Citation\n\nIf you found this work useful, please consider citing:\n\n```\n@article{toledo2025airesearchagentsmachine,\n    title={AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench}, \n    author={Edan Toledo and Karen Hambardzumyan and Martin Josifoski and Rishi Hazra and Nicolas Baldwin and Alexis Audran-Reiss and Michael Kuchnik and Despoina Magka and Minqi Jiang and Alisia Maria Lupidi and Andrei Lupu and Roberta Raileanu and Kelvin Niu and Tatiana Shavrina and Jean-Christophe Gagnon-Audet and Michael Shvartsman and Shagun Sodhani and Alexander H. Miller and Abhishek Charnalia and Derek Dunfield and Carole-Jean Wu and Pontus Stenetorp and Nicola Cancedda and Jakob Nicolaus Foerster and Yoram Bachrach},\n    year={2025},\n    journal={arXiv},\n    url={https://arxiv.org/abs/2507.02554}\n}\n```\n\n## License\n\nThis code is made available under a [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license, as found in the [LICENSE](LICENSE) file. Some portions of the project are subject to separate license terms outlined in [THIRD_PARTY_LICENSES.md](THIRD_PARTY_LICENSES.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Faira-dojo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2Faira-dojo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Faira-dojo/lists"}