{"id":26061720,"url":"https://github.com/ltzheng/agent-studio","last_synced_at":"2025-04-05T19:08:58.466Z","repository":{"id":227237834,"uuid":"738466288","full_name":"ltzheng/agent-studio","owner":"ltzheng","description":"[ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents","archived":false,"fork":false,"pushed_at":"2025-02-05T18:24:57.000Z","size":18255,"stargazers_count":190,"open_issues_count":2,"forks_count":17,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-02-05T19:43:59.075Z","etag":null,"topics":["ai-agents","benchmark","environment","language-model"],"latest_commit_sha":null,"homepage":"https://ltzheng.github.io/agent-studio/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ltzheng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-03T09:43:31.000Z","updated_at":"2025-02-05T18:38:23.000Z","dependencies_parsed_at":"2024-04-29T06:36:34.244Z","dependency_job_id":"454fc60c-078d-4458-a8fe-13f6375021ae","html_url":"https://github.com/ltzheng/agent-studio","commit_stats":null,"previous_names":["ltzheng/agent-studio","skyworkai/agent-studio","computer-agents/agent-studio"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltzheng%2Fagent-studio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltzheng%2Fagent-studio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltzheng%2Fagent-studio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltzheng%2Fagent-studio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ltzheng","download_url":"https://codeload.github.com/ltzheng/agent-studio/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247386263,"owners_count":20930618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","benchmark","environment","language-model"],"created_at":"2025-03-08T15:13:14.432Z","updated_at":"2025-04-05T19:08:58.434Z","avatar_url":"https://github.com/ltzheng.png","language":"Python","funding_links":[],"categories":["🔍 Paper List"],"sub_categories":["Datasets and Benchmarks"],"readme":"\u003ch1 align=\"center\"\u003e\nAgentStudio\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href='https://arxiv.org/abs/2403.17918'\u003e\u003cimg src='https://img.shields.io/badge/arXiv-2403.17918-b31b1b.svg'\u003e\u003c/a\u003e\n\u003ca href='https://ltzheng.github.io/agent-studio'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e\n\u003ca href=\"https://www.python.org/downloads/release/python-3117/\"\u003e\u003cimg alt=\"Python 3.11\" src=\"https://img.shields.io/badge/python-3.11-blue.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\u003e\u003c/a\u003e\n\u003c!-- \u003ca href=\"https://mypy-lang.org/\"\u003e\u003cimg src=\"https://www.mypy-lang.org/static/mypy_badge.svg\" alt=\"Checked with mypy\"\u003e\u003c/a\u003e --\u003e\n\u003ca href=\"https://www.gnu.org/licenses/agpl-3.0\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-AGPL%20v3-blue.svg\" alt=\"License: AGPL v3\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pre-commit.com/\"\u003e\u003cimg src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white\" alt=\"pre-commit\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n![](docs/assets/overview.png)\n\nAgentStudio is **a trinity of environments, tools, and benchmarks** for general virtual agents to interact with any computer software. AgentStudio targets the desiderata for robust, general, and open-ended virtual agents by providing:\n1. **A lightweight, interactive environment** with highly **generic observation and action spaces**, e.g., video observations and GUI/API actions\n2. **Tools for creating online benchmark tasks, annotating GUI elements, and labeling actions in videos**\n3. **Online benchmark tasks** that evaluate both GUI interactions and function calling with **auto-evaluation** and language feedback\n4. **Three benchmark datasets**: GroundUI, IDMBench, and CriticBench, for fundamental agent abilities, including GUI grounding, learning from videos, and success detection\n\n\nComparisons with existing work:\n\n![](docs/assets/comparison.png)\n\n## News\n\n- **Oct 3, 2024**: Released the \u003ca href='https://arxiv.org/abs/2403.17918'\u003earXiv paper v2\u003c/a\u003e and a full version of AgentStudio, including comprehensive documentation, complete tasks, and datasets!!\n- **Aug 18, 2024**: Major update to clean up the codebase and datasets.\n- **Mar 30, 2024**: Released the beta version of AgentStudio.\n\n## Installation\n\nInstall requirements:\n\n```bash\napt-get install gnome-screenshot xclip xdotool  # If using Ubuntu 22.04\nconda create --name agent-studio python=3.11 -y\nconda activate agent-studio\npip install -e '.[client]'\n```\n\nAll confidential API keys should be stored in `agent_studio/config/api_key.json`, e.g., OpenAI API key, Claude API key, Gemini API key, etc. We have provided an example config in `agent_studio/config/api_key_template.json`.\n\n## AgentStudio Overall Benchmark Tasks\n\n![](docs/assets/agent_space.jpg)\n\nAgentStudio provides the most generic observation and action spaces, which significantly expands the task space, allowing for developing and evaluating agents in real-world settings. We introduce a benchmark suite consisting of 205 tasks. These tasks span API usages such as terminal and Gmail and GUI software like VS Code. Please find more in [eval_online_benchmarks/README.md](eval_online_benchmarks/README.md). The task-related files are available at our \u003ca href='https://ltzheng.github.io/agent-studio'\u003eproject page\u003c/a\u003e.\n\n## AgentStudio Datasets Decompose Agent Abilities\n\nTo gain deeper insights into agent capabilities beyond the overall performance measured by online benchmark tasks, we develop three datasets using AgentStudio: GroundUI, IDMBench, and CriticBench. These datasets target general UI grounding, learning from videos, and success detection. More details are provided in [eval_agent_desiderata/README.md](eval_agent_desiderata/README.md). All data are available at our \u003ca href='https://ltzheng.github.io/agent-studio'\u003eproject page\u003c/a\u003e.\n\n## AgentStudio Tools\n\nTo facilitate the development and evaluation of agents within the AgentStudio environment, we provide three tools for:\n- Benchmark task creation and validation\n- Step-level GUI element annotation\n- Trajectory-level video-action recording and refinement\n\nThese tools, combined with the realistic environment of AgentStudio, contribute to the generation of rich, structured data for training and evaluating agents. Please refer to [docs/annotate_ground_ui.md](docs/annotate_ground_ui.md) for the GUI annotation tool, [agent_studio/recorder/README.md](agent_studio/recorder/README.md) for the video-action annotation tool, and [eval_online_benchmarks/README.md](eval_online_benchmarks/README.md) for the task creation/validation.\n\n## Contributing\n\nContributions and feedback from everyone on how to make this into a better tool are more than welcome. Please check out [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.\n\n## Acknowledgement\n\nWe would like to thank the following projects for their inspiration and contributions to the open-source community: [Open Interpreter](https://github.com/KillianLucas/open-interpreter), [WebArena](https://github.com/web-arena-x/webarena), [Cradle](https://baai-agents.github.io/Cradle/), [Synapse](https://ltzheng.github.io/Synapse/), [SeeClick](https://github.com/njucckevin/SeeClick), [ScreenAgent](https://github.com/niuzaisheng/ScreenAgent), [OSWorld](https://github.com/xlang-ai/OSWorld), etc.\n\n## Citation\n\nIf you find AgentStudio useful, please cite our paper:\n\n```bibtex\n@article{zheng2024agentstudio,\n  title={AgentStudio: A Toolkit for Building General Virtual Agents},\n  author={Longtao Zheng and Zhiyuan Huang and Zhenghai Xue and Xinrun Wang and Bo An and Shuicheng Yan},\n  journal={arXiv preprint arXiv:2403.17918},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fltzheng%2Fagent-studio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fltzheng%2Fagent-studio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fltzheng%2Fagent-studio/lists"}