{"id":19287198,"url":"https://github.com/DigiRL-agent/digirl","last_synced_at":"2025-04-22T04:31:53.254Z","repository":{"id":245196619,"uuid":"817660183","full_name":"DigiRL-agent/digirl","owner":"DigiRL-agent","description":"Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.","archived":false,"fork":false,"pushed_at":"2024-08-18T23:47:28.000Z","size":9910,"stargazers_count":186,"open_issues_count":1,"forks_count":13,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-08-19T00:44:52.401Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DigiRL-agent.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-20T07:33:03.000Z","updated_at":"2024-08-18T23:47:31.000Z","dependencies_parsed_at":"2024-06-20T13:01:28.881Z","dependency_job_id":"797c763a-c679-4885-a6d4-d6dc3eb3bcb8","html_url":"https://github.com/DigiRL-agent/digirl","commit_stats":null,"previous_names":["digirl-agent/digirl"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigiRL-agent%2Fdigirl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigiRL-agent%2Fdigirl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigiRL-agent%2Fdigirl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DigiRL-agent%2Fdigirl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DigiRL-agent","download_url":"https://codeload.github.com/DigiRL-agent/digirl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223888466,"owners_count":17220083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T22:05:37.550Z","updated_at":"2025-04-22T04:31:53.246Z","avatar_url":"https://github.com/DigiRL-agent.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["Models","Frameworks \u0026 Models"],"readme":"\u003ch3 align=\"center\"\u003e\n    🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉\u003cbr\u003e\n     \u003cfont color=\"red\"\u003e\u003cb\u003eCheck out our latest progress\u003c/b\u003e\u003c/font\u003e of new offline RL algorithm for Android \u003ca href=\"https://digirl-agent.github.io/DigiQ-agent.github.io/\"\u003e\u003cb\u003eDigiQ\u003c/b\u003e\u003c/a\u003e and autonomous skill discovery for web agents \u003ca href=\"https://yanqval.github.io/PAE/\"\u003e\u003cb\u003eProposer-Agent-Evaluator\u003c/b\u003e\u003c/a\u003e. \u003cbr\u003e\n    🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉\n\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"./assets/digirl-logo-text.png\" alt=\"logo\" width=\"50%\"\u003e\n\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003e\nDigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning\n\u003cbr\u003e\n\u003cb\u003eOral @ \u003ca href=\"https://icml-fm-wild.github.io/\"\u003eFM Wild\u003c/a\u003e, ICML\u003c/b\u003e\n    \u003cbr\u003e\n    \u003cb\u003eNeurips 2024\u003c/b\u003e\n\u003c/h3\u003e\n\n\n\n\u003cp align=\"center\"\u003e\n| \u003ca href=\"https://digirl-agent.github.io/\"\u003e\u003cb\u003eWebsite | Demo | Results\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"https://arxiv.org/abs/2406.11896\"\u003e\u003cb\u003ePaper\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"https://drive.google.com/drive/folders/14Iu6lAHePQ2qG0ghYkVG1RG6RUu7e2Hz?usp=sharing\"\u003e\u003cb\u003eCheckpoints | Data\u003c/b\u003e\u003c/a\u003e |\n\u003c/p\u003e\n\n---\n\nResearch Code for preprint \"DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning\".\n\n[Hao Bai*](https://jackgethome.com), [Yifei Zhou*](https://\u003cusername\u003e02.github.io/), [Mert Cemri](https://scholar.google.com/citations?user=sMEFwf8AAAAJ\u0026hl=en), [Jiayi Pan](https://www.jiayipan.me/), [Alane Suhr](https://www.alanesuhr.com/), [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/), [Aviral Kumar](https://aviralkumar2907.github.io/)\u003cbr\u003e\nUC Berkeley, UIUC, Google DeepMind\n\u003cbr\u003e\n*Equal contribution, alphabetic order; work done at UC Berkeley\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"./assets/digirl-diagram.png\" alt=\"digirl-diagram\" width=\"70%\"\u003e\n\u003c/p\u003e\n\n## 🍩 Features\n\n### Environment Features\n\n- Auto-adaptive error handling support.\n- Multi-machine [emulation parallel](multimachine/README.md) support.\n- Checkpoint resuming support.\n- Trajectory video recording support.\n\n### Approach Features\n\n- Two training algorithms proposed in the paper\n  - DigiRL (automatic curriculum + doubly robust estimator filtering).\n  - Filtered Behavior Cloning (reward-based filtering).\n- Three training modes: \n  - Offline-only training: baseline apporach - use the AutoUI checkpoint to collect data (we have this data ready for you), then train with these pre-collected sub-optimal trajectories. This mode only allows evaluation using the checkpoint.\n  - Online-only training: traditional RL approach - the AutoUI checkpoint simultaneously interacts with the environment learns online. This mode allows interactive training.\n  - Offline-to-online training: the most powerful approach as evaluated in paper - the AutoUI checkpoint first learns the pre-collected data, then simultanesouly interacts with the environment and do online learning starting from this checkpoint. This mode allows interactive training\n- Two agents:\n  - [AutoUI](https://arxiv.org/abs/2309.11436): we support both training (2 algorithms x 3 paradigms) and evaluation.\n  - [CogAgent](https://arxiv.org/abs/2312.08914): current only support evaluation, no training pipeline is supported.\n\n- Two [Android-in-the-Wild](https://arxiv.org/abs/2307.10088) task sets:\n  - AitW General: general browsing, opening apps.\n  - AitW Web Shopping: shopping on popular shopping websites.\n  - It'll also be interesting to explore the [other AitW subsets](https://github.com/google-research/google-research/tree/master/android_in_the_wild) or other task sets  if you have good candidates, please propose one in the issue.\n- DDP Multi-GPU training:\n  - We support `accelerate` for multi-GPU training. You can turn off this feature if you only have 1 GPU. It only takes **12GB** of GPU memory for AutoUI running the DigiRL algorithm, but we provide this feature in case you want to play with something larger.\n\n\n## 🚀 Quick Start\n### Dependencies\n\nFirst, create a [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) environment and install all pip package requirements.\n\n```bash\nconda create -n digirl python==3.10\nconda activate digirl\n\ngit clone https://github.com/DigiRL-agent/digirl.git\ncd digirl\npip install -e .\n```\n\n### Environment Setup\n\nTo set up the Android environment for the DigiRL/filtered BC to interact with, refer to [the environment README](./env_setup/README.md). Before moving on, you should be able to view [this screenshot](./env_setup/screenshot.png) by running [this script](./env_setup/screenshot.py).\n\n### Model checkpoint and Datasets\n\nThe SFT checkpoint of the AutoUI model was released here and we use it:\n\n- [AutoUI SFT checkpoint](https://huggingface.co/cooelf/Auto-UI)\n\nSimply download `Auto-UI-Base.zip`, then unzip to a directory.\n\n```bash\ncd \u003cpath_to_autoui_dir\u003e\nwget https://huggingface.co/cooelf/Auto-UI/resolve/main/Auto-UI-Base.zip\nunzip Auto-UI-Base.zip\n# wait...\nls Auto-UI-Base\n# config.json             pytorch_model.bin        tokenizer.json         training_args.bin\n# generation_config.json  special_tokens_map.json  tokenizer_config.json\n```\n\nWe provide the pre-collected trajectories using this SFT checkpoint:\n\n- [Trajectories of SFT'ed AutoUI](https://drive.google.com/drive/folders/1ud1XyzCfh0257CixxdgLjjpX59jYbhfU?usp=sharing)\n\nThe Google Drive folder contains 4 files, with stats below (you can use `gdown` to download the checkpoint you want):\n\n| File Name | #Trajectories | Horizon | File Size |\n|-----------|---------------|---------|-----------|\n| `general-off2on-zeroshot-trajectories.pt` | 608 | 10 | 95.5M |\n| `general-offline-zeroshot-trajectories.pt` | 1552 | 10 | 243.9M |\n| `webshop-off2on-zeroshot-trajectories.pt` | 528 | 20 | 115.2M |\n| `webshop-offline-zeroshot-trajectories.pt` | 1296 | 20 | 297.5M |\n\nwhere `general/webshop` mean the AitW General/Web Shopping subset, `off2on/offline` means whether the data is used for offline learning or offline-to-online learning. To make a fair comparison, offline learning should use the similar amount of data that offline-to-online learning finally uses.\n\nStore these files into a directory:\n\n```bash\nmkdir ~/data \u0026\u0026 cd ~/data\n# copy the .pt file here\n```\n\nIf you want to use our final offline-to-online checkpoints to reproduce scores in the paper, you can also download from Google Drive. We release the first offline-to-online checkpoint (`run1` in paper) for each algorithm in each environment:\n\n\n- [AutoUI DigiRL \u0026 online filtered BC checkpoint](https://drive.google.com/drive/folders/13jkIgWQ6JCcaTsfG_AWdgxE1qO4c2imJ?usp=sharing)\n\nThe Google Drive folder also contains 4 files:\n\n| File Name | Index in Paper | Test Set Score | File Size |\n|-----------|---------------|---------|---------|\n| `general-off2on-digirl.zip` | `run1` | 70.8 | 1.9G |\n| `general-off2on-filteredbc.zip` | `run1` | 59.4 | 1.9G |\n| `webshop-off2on-digirl.zip` | `run1` | 75.0 | 1.9G |\n| `webshop-off2on-filteredbc.zip` | `run1` | 55.2 | 1.9G |\n\nYou can also access through [Huggingface](https://huggingface.co/collections/JackBAI/digirl-checkpoints-6682ea42bdfb5af9bfc5f29f).\n\nNote that these checkpoints only allows evaluation because we only release the AutoUI checkpoint, not the optimizer states.\n\n### Modify Configurations\n\nThen change the `huggingface_token`, `wandb_token`, `gemini_token`, etc. in `scripts/config/main/default.yaml`, note that you need to specify **all entries** left blank or `\u003cusername\u003e` for you in this file. This config is the default configuration - you also need to specify the subconfiguration - for example, if you want to run the online algorithm, you should also examine what to modify in `scripts/config/main/digirl_online`. Feel free to DIY your configs and play with the code!\n\n**Note: to load existing checkpoints, modify `save_path` instead of `policy_lm`.** That is, `policy_lm` should still be the path to the AutoUI checkpoint.\n\n### Run Experiments\n\nAfter modifying the config to what you like, you can now run experiments with the following commands:\n\n```bash\ncd scripts\npython run.py --config-path config/main --config-name digirl_online\n```\n\nThe file `run.py` is the entrance of the program, and you can pass the config name to run different experiments. The config file is in `scripts/config/` directory.\n\n### Main Results Reproduction\n\nTo reproduce the results in Table 1 of our paper, first download the corresponding checkpoints as described above. As the results in the training set are obtained by randomly sampling tasks, we recommend reproducing the test results (which are obtained by sequentially sampling the first 96 trajectories).\n\nTo do this, modify the [`eval_only.yaml`](https://github.com/DigiRL-agent/digirl/blob/master/scripts/config/main/default.yaml) config file and its parent ['default.yaml'](https://github.com/DigiRL-agent/digirl/blob/master/scripts/config/main/default.yaml) config file to experiment settings. For instance, you can modify these configs for reproduction:\n\n1. `default.yaml`\n    1. Set `task_split: \"test\"` and `eval_sample_mode: \"sequential\"`\n    2. Don't forget to increase `max_steps` to `20` if `task_set` is set to `webshop` (as the webshop tasks usually need more steps than the general tasks to complete).\n2. `eval_only.yaml`\n    1. Make sure `rollout_size` (in `default.yaml`) * `eval_iterations` (in `eval_only.yaml`) = 96. For example, `rollout_size (16) * eval_iterations (6) = 96`.\n\n### (Optional) CogAgent server\n\nThe way we set CogAgent up is using a Gradio-based API approach, which means that you need to setup CogAgent inference service on a server, then use our code to query that API. To set up CogAgent, refer to the GitHub Page of project [AutoEval](https://github.com/Berkeley-NLP/Agent-Eval-Refine/blob/main/exps/android_exp/README.md) by [Jiayi Pan](https://www.jiayipan.me/). \n\nGrab the link and modify that in `scripts/config/cogagent/default.yaml` file. You need at least one GPU with 48GB memory to host CogAgent for inference.\n\n### (Optional) Multi-machine Emulation Parallel\n\nIf you want to launch large scale emulation (say more than 32 emulators running at the same time), you'll need multiple machines that collects trajectories at the same time. Refer to the [multimachine-training README](multimachine/README.md) for details.\n\n### (Optional) Multi-GPU DDP Training\n\nWe use `accelerate` for multi-GPU DDP training. To enable, you need to identify the number of GPUs on your machine in the [accelerate config](scripts/config/accelerate_config/default_config.yaml). If you model is extremely large, it's also possible to do multi-machine DDP training but we currently don't support it.\n\nTo enable this, the only thing you need to do is to replace `python run.py` with `accelerate launch --config_file \u003cconfig_file\u003e run.py`. An example below:\n\n```\naccelerate launch --config_file config/accelerate_config/default_config.yaml run.py --config-path config/main --config-name digirl_off2on\n```\n\nYou should be able to see a much faster learning speed if you've successfully set this up.\n\n## Trouble Shooting (IMPORTANT)\n\n1. If you frequently get the `Error in environment reset` error, you can try increasing the timeout at [this line](https://github.com/DigiRL-agent/digirl/blob/5b77663c3c3f19932cdb9ceb6fe0474c7b28a0b7/digirl/environment/env_utils.py#L59). \n2. If you frequently get the `409 resource exhausted` error, try adding a `sleep()` function within the `call_gemini()` function [here](https://github.com/DigiRL-agent/digirl/blob/3896fda9d2e31081234f8b716e9049f6a2d6a7f8/digirl/environment/android/evaluate.py#L161). FYI, a free-tier Gemini API fits `sleep(2)` very well.\n3. If you see AVD copying errors (started with `shutil.error`), you can safely ignore it unless the location copying to is empty.\n\n## 🌟 Contribution\n\nWe welcome the open-source community to contribute to this project. If you invented an algorithm, or you support other types of base models, please propose a PR or issue. Example topics:\n\n- [ ] Other algorithms like PPO or any algorithm you invented.\n- [ ] Other base models like LLaVA.\n- [ ] Other task sets like WebArena.\n- [ ] Potential sub-optimal implementations.\n\n## 📄 License\n\nAll content of this work is under [Apache License v2.0](https://github.com/DigiRL-agent/digirl/blob/master/LICENSE), including codebase, data, and model checkpoints.\n\n## 📚 Citation\n\nConsider citing our paper!\n\n```\n@article{bai2024digirl,\n  title={DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning},\n  author={Bai, Hao and Zhou, Yifei and Cemri, Mert and Pan, Jiayi and Suhr, Alane and Levine, Sergey and Kumar, Aviral},\n  journal={arXiv preprint arXiv:2406.11896},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDigiRL-agent%2Fdigirl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDigiRL-agent%2Fdigirl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDigiRL-agent%2Fdigirl/lists"}