{"id":19279774,"url":"https://github.com/showlab/computer_use_ootb","last_synced_at":"2025-05-14T15:07:18.502Z","repository":{"id":259431174,"uuid":"877242296","full_name":"showlab/computer_use_ootb","owner":"showlab","description":"Out-of-the-box (OOTB) GUI Agent for Windows and macOS","archived":false,"fork":false,"pushed_at":"2025-03-27T01:35:08.000Z","size":48587,"stargazers_count":1492,"open_issues_count":35,"forks_count":149,"subscribers_count":20,"default_branch":"main","last_synced_at":"2025-04-09T18:20:35.613Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/showlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-23T10:30:42.000Z","updated_at":"2025-04-09T08:02:31.000Z","dependencies_parsed_at":"2024-10-25T10:23:20.829Z","dependency_job_id":"7c3d88cb-43e2-492e-9247-3a4e393ab7aa","html_url":"https://github.com/showlab/computer_use_ootb","commit_stats":null,"previous_names":["showlab/computer_use_ootb"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2Fcomputer_use_ootb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2Fcomputer_use_ootb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2Fcomputer_use_ootb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2Fcomputer_use_ootb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/showlab","download_url":"https://codeload.github.com/showlab/computer_use_ootb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254169583,"owners_count":22026213,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T21:16:03.439Z","updated_at":"2025-05-14T15:07:18.475Z","avatar_url":"https://github.com/showlab.png","language":"Python","funding_links":[],"categories":["Python","Projects","GUI \u0026 Computer Control AI Agents"],"sub_categories":["Frameworks \u0026 Models","Desktop Automation"],"readme":"\u003ch2 align=\"center\"\u003e\n    \u003ca href=\"https://computer-use-ootb.github.io\"\u003e\n        \u003cimg src=\"./assets/ootb_logo.png\" alt=\"Logo\" style=\"display: block; margin: 0 auto; filter: invert(1) brightness(2);\"\u003e\n    \u003c/a\u003e\n\u003c/h2\u003e\n\n\n\u003ch5 align=\"center\"\u003e If you like our project, please give us a star ⭐ on GitHub for the latest update.\u003c/h5\u003e\n\n\u003ch5 align=center\u003e\n\n[![arXiv](https://img.shields.io/badge/Arxiv-2411.10323-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2411.10323)\n[![Project Page](https://img.shields.io/badge/Project_Page-GUI_Agent-blue)](https://computer-use-ootb.github.io)\n[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fshowlab%2Fcomputer_use_ootb\u0026count_bg=%2379C83D\u0026title_bg=%23555555\u0026icon=\u0026icon_color=%23E7E7E7\u0026title=hits\u0026edge_flat=false)](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fshowlab%2Fcomputer_use_ootb\u0026count_bg=%2379C83D\u0026title_bg=%23555555\u0026icon=\u0026icon_color=%23E7E7E7\u0026title=hits\u0026edge_flat=false)\n\n\n\u003c/h5\u003e\n\n## \u003cimg src=\"./assets/ootb_icon.png\" alt=\"Star\" style=\"height:25px; vertical-align:middle; filter: invert(1) brightness(2);\"\u003e  Overview\n**Computer Use \u003cspan style=\"color:rgb(106, 158, 210)\"\u003eO\u003c/span\u003e\u003cspan style=\"color:rgb(111, 163, 82)\"\u003eO\u003c/span\u003e\u003cspan style=\"color:rgb(209, 100, 94)\"\u003eT\u003c/span\u003e\u003cspan style=\"color:rgb(238, 171, 106)\"\u003eB\u003c/span\u003e**\u003cimg src=\"./assets/ootb_icon.png\" alt=\"Star\" style=\"height:20px; vertical-align:middle; filter: invert(1) brightness(2);\"\u003e is an out-of-the-box (OOTB) solution for Desktop GUI Agent, including API-based (**Claude 3.5 Computer Use**) and locally-running models (**\u003cspan style=\"color:rgb(106, 158, 210)\"\u003eS\u003c/span\u003e\u003cspan style=\"color:rgb(111, 163, 82)\"\u003eh\u003c/span\u003e\u003cspan style=\"color:rgb(209, 100, 94)\"\u003eo\u003c/span\u003e\u003cspan style=\"color:rgb(238, 171, 106)\"\u003ew\u003c/span\u003eUI**, **UI-TARS**). \n\n**No Docker** is required, and it supports both **Windows** and **macOS**. OOTB provides a user-friendly interface based on Gradio.🎨\n\nVisit our study on GUI Agent of Claude 3.5 Computer Use [[project page]](https://computer-use-ootb.github.io). 🌐\n\n## Update\n- **[2025/02/08]** We've added the support for [**UI-TARS**](https://github.com/bytedance/UI-TARS). Follow [Cloud Deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#cloud-deployment) or [VLLM deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#local-deployment-vllm) to implement UI-TARS and run it locally in OOTB.\n- **Major Update! [2024/12/04]** **Local Run🔥** is now live! Say hello to [**\u003cspan style=\"color:rgb(106, 158, 210)\"\u003eS\u003c/span\u003e\u003cspan style=\"color:rgb(111, 163, 82)\"\u003eh\u003c/span\u003e\u003cspan style=\"color:rgb(209, 100, 94)\"\u003eo\u003c/span\u003e\u003cspan style=\"color:rgb(238, 171, 106)\"\u003ew\u003c/span\u003eUI**](https://github.com/showlab/ShowUI), an open-source 2B vision-language-action (VLA) model for GUI Agent. Now compatible with `\"gpt-4o + ShowUI\" (~200x cheaper)`*  \u0026 `\"Qwen2-VL + ShowUI\" (~30x cheaper)`* for only few cents for each task💰! \u003cspan style=\"color: grey; font-size: small;\"\u003e*compared to Claude Computer Use\u003c/span\u003e.\n- **[2024/11/20]** We've added some examples to help you get hands-on experience with Claude 3.5 Computer Use.\n- **[2024/11/19]** Forget about the single-display limit set by Anthropic - you can now use **multiple displays** 🎉!\n- **[2024/11/18]** We've released a deep analysis of Claude 3.5 Computer Use: [https://arxiv.org/abs/2411.10323](https://arxiv.org/abs/2411.10323).\n- **[2024/11/11]** Forget about the low-resolution display limit set by Anthropic — you can now use *any resolution you like* and still keep the **screenshot token cost low** 🎉!\n- **[2024/11/11]** Now both **Windows** and **macOS** platforms are supported 🎉!\n- **[2024/10/25]** Now you can **Remotely Control** your computer 💻 through your mobile device 📱 — **No Mobile App Installation** required! Give it a try and have fun 🎉.\n\n\n## Demo Video\n\nhttps://github.com/user-attachments/assets/f50b7611-2350-4712-af9e-3d31e30020ee\n\n\u003cdiv style=\"display: flex; justify-content: space-around;\"\u003e\n  \u003ca href=\"https://youtu.be/Ychd-t24HZw\" target=\"_blank\" style=\"margin-right: 10px;\"\u003e\n    \u003cimg src=\"https://img.youtube.com/vi/Ychd-t24HZw/maxresdefault.jpg\" alt=\"Watch the video\" width=\"48%\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://youtu.be/cvgPBazxLFM\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.youtube.com/vi/cvgPBazxLFM/maxresdefault.jpg\" alt=\"Watch the video\" width=\"48%\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\n## 🚀 Getting Started\n\n### 0. Prerequisites\n- Instal Miniconda on your system through this [link](https://www.anaconda.com/download?utm_source=anacondadocs\u0026utm_medium=documentation\u0026utm_campaign=download\u0026utm_content=topnavalldocs). (**Python Version: \u003e= 3.12**).\n- Hardware Requirements (optional, for ShowUI local-run):\n    - **Windows (CUDA-enabled):** A compatible NVIDIA GPU with CUDA support, \u003e=6GB GPU memory\n    - **macOS (Apple Silicon):** M1 chip (or newer), \u003e=16GB unified RAM\n\n\n### 1. Clone the Repository 📂\nOpen the Conda Terminal. (After installation Of Miniconda, it will appear in the Start menu.)\nRun the following command on **Conda Terminal**.\n```bash\ngit clone https://github.com/showlab/computer_use_ootb.git\ncd computer_use_ootb\n```\n\n### 2.1 Install Dependencies 🔧\n```bash\npip install -r requirements.txt\n```\n\n### 2.2 (Optional) Get Prepared for **\u003cspan style=\"color:rgb(106, 158, 210)\"\u003eS\u003c/span\u003e\u003cspan style=\"color:rgb(111, 163, 82)\"\u003eh\u003c/span\u003e\u003cspan style=\"color:rgb(209, 100, 94)\"\u003eo\u003c/span\u003e\u003cspan style=\"color:rgb(238, 171, 106)\"\u003ew\u003c/span\u003eUI** Local-Run\n\n1. Download all files of the ShowUI-2B model via the following command. Ensure the `ShowUI-2B` folder is under the `computer_use_ootb` folder.\n\n    ```python\n    python install_tools/install_showui.py\n    ```\n\n2. Make sure to install the correct GPU version of PyTorch (CUDA, MPS, etc.) on your machine. See [install guide and verification](https://pytorch.org/get-started/locally/).\n\n3. Get API Keys for [GPT-4o](https://platform.openai.com/docs/quickstart) or [Qwen-VL](https://help.aliyun.com/zh/dashscope/developer-reference/acquisition-and-configuration-of-api-key). For mainland China users, Qwen API free trial for first 1 mil tokens is [available](https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api).\n\n### 2.3 (Optional) Get Prepared for **UI-TARS** Local-Run\n\n1. Follow [Cloud Deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#cloud-deployment) or [VLLM deployment](https://github.com/bytedance/UI-TARS?tab=readme-ov-file#local-deployment-vllm) guides to deploy your UI-TARS server.\n\n2. Test your UI-TARS sever with the script `.\\install_tools\\test_ui-tars_server.py`.\n\n### 2.4 (Optional) If you want to deploy Qwen model as planner on ssh server\n1. git clone this project on your ssh server\n\n2. python computer_use_demo/remote_inference.py\n### 3. Start the Interface ▶️\n\n**Start the OOTB interface:**\n```bash\npython app.py\n```\nIf you successfully start the interface, you will see two URLs in the terminal:\n```bash\n* Running on local URL:  http://127.0.0.1:7860\n* Running on public URL: https://xxxxxxxxxxxxxxxx.gradio.live (Do not share this link with others, or they will be able to control your computer.)\n```\n\n\n\u003e \u003cu\u003eFor convenience\u003c/u\u003e, we recommend running one or more of the following command to set API keys to the environment variables before starting the interface. Then you don’t need to manually pass the keys each run. On Windows Powershell (via the `set` command if on cmd): \n\u003e ```bash\n\u003e $env:ANTHROPIC_API_KEY=\"sk-xxxxx\" (Replace with your own key)\n\u003e $env:QWEN_API_KEY=\"sk-xxxxx\"\n\u003e $env:OPENAI_API_KEY=\"sk-xxxxx\"\n\u003e ```\n\u003e On macOS/Linux, replace `$env:ANTHROPIC_API_KEY` with `export ANTHROPIC_API_KEY` in the above command. \n\n\n### 4. Control Your Computer with Any Device can Access the Internet\n- **Computer to be controlled**: The one installed software.\n- **Device Send Command**: The one opens the website.\n  \nOpen the website at http://localhost:7860/ (if you're controlling the computer itself) or https://xxxxxxxxxxxxxxxxx.gradio.live in your mobile browser for remote control.\n\nEnter the Anthropic API key (you can obtain it through this [website](https://console.anthropic.com/settings/keys)), then give commands to let the AI perform your tasks.\n\n### ShowUI Advanced Settings\n\nWe provide a 4-bit quantized ShowUI-2B model for cost-efficient inference (currently **only support CUDA devices**). To download the 4-bit quantized ShowUI-2B model:\n```\npython install_tools/install_showui-awq-4bit.py\n```\nThen, enable the quantized setting in the 'ShowUI Advanced Settings' dropdown menu.\n\nBesides, we also provide a slider to quickly adjust the `max_pixel` parameter in the ShowUI model. This controls the visual input size of the model and greatly affects the memory and inference speed.\n\n## 📊 GUI Agent Model Zoo\n\nNow, OOTB supports customizing the GUI Agent via the following models:\n\n- **Unified Model**: Unified planner \u0026 actor, can both make the high-level planning and take the low-level control.\n- **Planner**: General-purpose LLMs, for handling the high-level planning and decision-making.\n- **Actor**: Vision-language-action models, for handling the low-level control and action command generation.\n\n\n\u003cdiv align=\"center\"\u003e\n  \u003cb\u003eSupported GUI Agent Models, OOTB\u003c/b\u003e\n\n\u003c/div\u003e\n\u003ctable align=\"center\"\u003e\n  \u003ctbody\u003e\n    \u003ctr align=\"center\" valign=\"bottom\"\u003e\n      \u003ctd\u003e\n        \u003cb\u003e[API] Unified Model\u003c/b\u003e\n      \u003c/td\u003e\n      \u003ctd\u003e\n        \u003cb\u003e[API] Planner\u003c/b\u003e\n      \u003c/td\u003e\n      \u003ctd\u003e\n        \u003cb\u003e[Local] Planner\u003c/b\u003e\n      \u003c/td\u003e\n      \u003ctd\u003e\n        \u003cb\u003e[API] Actor\u003c/b\u003e\n      \u003c/td\u003e\n      \u003ctd\u003e\n        \u003cb\u003e[Local] Actor\u003c/b\u003e\n      \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr valign=\"top\"\u003e\n      \u003ctd\u003e\n        \u003cul\u003e\n            \u003cli\u003e\u003ca href=\"\"\u003eClaude 3.5 Sonnet\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n      \u003c/td\u003e\n      \u003ctd\u003e\n        \u003cul\u003e\n          \u003cli\u003e\u003ca href=\"\"\u003eGPT-4o\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"\"\u003eQwen2-VL-Max\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"\"\u003eQwen2-VL-2B(ssh)\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"\"\u003eQwen2-VL-7B(ssh)\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"\"\u003eQwen2.5-VL-7B(ssh)\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"\"\u003eDeepseek V3 (soon)\u003c/a\u003e\u003c/li\u003e\n        \u003c/ul\u003e\n      \u003c/td\u003e\n      \u003ctd\u003e\n        \u003cul\u003e\n          \u003cli\u003e\u003ca href=\"\"\u003eQwen2-VL-2B\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"\"\u003eQwen2-VL-7B\u003c/a\u003e\u003c/li\u003e\n        \u003c/ul\u003e\n      \u003c/td\u003e\n        \u003ctd\u003e\n        \u003cul\u003e\n          \u003cli\u003e\u003ca href=\"https://github.com/showlab/ShowUI\"\u003eShowUI\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"https://huggingface.co/bytedance-research/UI-TARS-7B-DPO\"\u003eUI-TARS-7B/72B-DPO (soon)\u003c/a\u003e\u003c/li\u003e \n        \u003c/ul\u003e\n      \u003c/td\u003e\n      \u003ctd\u003e\n        \u003cul\u003e\n          \u003cli\u003e\u003ca href=\"https://github.com/showlab/ShowUI\"\u003eShowUI\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"https://huggingface.co/bytedance-research/UI-TARS-7B-DPO\"\u003eUI-TARS-7B/72B-DPO\u003c/a\u003e\u003c/li\u003e\n        \u003c/ul\u003e\n      \u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/td\u003e\n\u003c/table\u003e\n\n\u003e where [API] models are based on API calling the LLMs that can inference remotely, \nand [Local] models can use your own device that inferences locally with no API costs.\n\n\n\n## 🖥️ Supported Systems\n- **Windows** (Claude ✅, ShowUI ✅)\n- **macOS** (Claude ✅, ShowUI ✅)\n\n## 👓 OOTB Iterface\n\u003cdiv style=\"display: flex; align-items: center; gap: 10px;\"\u003e\n  \u003cfigure style=\"text-align: center;\"\u003e\n    \u003cimg src=\"./assets/gradio_interface.png\" alt=\"Desktop Interface\" style=\"width: auto; object-fit: contain;\"\u003e\n  \u003c/figure\u003e\n\u003c/div\u003e\n\n\n## ⚠️ Risks\n- **Potential Dangerous Operations by the Model**: The models' performance is still limited and may generate unintended or potentially harmful outputs. Recommend continuously monitoring the AI's actions. \n- **Cost Control**: Each task may cost a few dollars for Claude 3.5 Computer Use.💸\n\n## 📅 Roadmap\n- [ ] **Explore available features**\n  - [ ] The Claude API seems to be unstable when solving tasks. We are investigating the reasons: resolutions, types of actions required, os platforms, or planning mechanisms. Welcome any thoughts or comments on it.\n- [ ] **Interface Design**\n  - [x] **Support for Gradio** ✨\n  - [ ] **Simpler Installation**\n  - [ ] **More Features**... 🚀\n- [ ] **Platform**\n  - [x] **Windows**\n  - [x] **macOS**\n  - [x] **Mobile** (Send command)\n  - [ ] **Mobile** (Be controlled)\n- [ ] **Support for More MLLMs**\n  - [x] **Claude 3.5 Sonnet** 🎵\n  - [x] **GPT-4o**\n  - [x] **Qwen2-VL**\n  - [ ] **Local MLLMs**\n  - [ ] ...\n- [ ] **Improved Prompting Strategy**\n  - [ ] Optimize prompts for cost-efficiency. 💡\n- [x] **Improved Inference Speed**\n  - [x] Support int4 Quantization.\n\n## Join Discussion\nWelcome to discuss with us and continuously improve the user experience of Computer Use - OOTB. Reach us using this [**Discord Channel**](https://discord.gg/vMMJTSew37) or the WeChat QR code below!\n\n\u003cdiv style=\"display: flex; flex-direction: row; justify-content: space-around;\"\u003e\n\n\u003c!-- \u003cimg src=\"./assets/wechat_2.jpg\" alt=\"gradio_interface\" width=\"30%\"\u003e --\u003e\n\u003cimg src=\"./assets/wechat_3.jpg\" alt=\"gradio_interface\" width=\"30%\"\u003e\n\n\u003c/div\u003e\n\n\u003cdiv style=\"height: 30px;\"\u003e\u003c/div\u003e\n\n\u003chr\u003e\n\u003ca href=\"https://computer-use-ootb.github.io\"\u003e\n\u003cimg src=\"./assets/ootb_logo.png\" alt=\"Logo\" width=\"30%\" style=\"display: block; margin: 0 auto; filter: invert(1) brightness(2);\"\u003e\n\u003c/a\u003e\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshowlab%2Fcomputer_use_ootb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshowlab%2Fcomputer_use_ootb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshowlab%2Fcomputer_use_ootb/lists"}