{"id":24502459,"url":"https://github.com/showlab/ShowUI","last_synced_at":"2025-10-02T18:31:31.382Z","repository":{"id":263101321,"uuid":"881201918","full_name":"showlab/ShowUI","owner":"showlab","description":"Open-source, End-to-end, Vision-Language-Action model for GUI Agent \u0026 Computer Use.","archived":false,"fork":false,"pushed_at":"2025-01-17T06:53:42.000Z","size":24110,"stargazers_count":841,"open_issues_count":5,"forks_count":46,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-01-17T07:18:43.397Z","etag":null,"topics":["agent","computer-use","gui-agent","vision-language-action","vision-language-model"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2411.17465","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/showlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-31T04:56:39.000Z","updated_at":"2025-01-17T06:53:43.000Z","dependencies_parsed_at":"2024-11-16T07:27:01.784Z","dependency_job_id":"e778d15b-08dc-43df-8ff3-e03b71ec95f5","html_url":"https://github.com/showlab/ShowUI","commit_stats":null,"previous_names":["showlab/showui"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2FShowUI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2FShowUI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2FShowUI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2FShowUI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/showlab","download_url":"https://codeload.github.com/showlab/ShowUI/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235033794,"owners_count":18925498,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","computer-use","gui-agent","vision-language-action","vision-language-model"],"created_at":"2025-01-21T23:02:09.453Z","updated_at":"2025-10-02T18:31:31.376Z","avatar_url":"https://github.com/showlab.png","language":"Jupyter Notebook","readme":"# ShowUI\nOpen-source, End-to-end, Lightweight, Vision-Language-Action model for GUI Agent \u0026 Computer Use.\n\nShowUI 是一款开源的、端到端、轻量级的视觉-语言-动作模型，专为 GUI 智能体设计。\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/showui.jpg\" alt=\"ShowUI\" width=\"480\"\u003e\n\u003cp\u003e\n\n\u003cp align=\"center\"\u003e\n        \u0026nbsp\u0026nbsp 📑 \u003ca href=\"https://arxiv.org/abs/2411.17465\"\u003ePaper\u003c/a\u003e \u0026nbsp\u0026nbsp \n        | 🤗 \u003ca href=\"https://huggingface.co/showlab/ShowUI-2B\"\u003eHugging Models\u003c/a\u003e\u0026nbsp\u0026nbsp \n        | \u0026nbsp\u0026nbsp 🤗 \u003ca href=\"https://huggingface.co/spaces/showlab/ShowUI\"\u003eSpaces Demo\u003c/a\u003e \u0026nbsp\u0026nbsp \n        | \u0026nbsp\u0026nbsp 📝 \u003ca href=\"./assets/slide.pdf\"\u003eSlides\u003c/a\u003e \u0026nbsp\u0026nbsp \n        | \u0026nbsp\u0026nbsp 🕹️ \u003ca href=\"https://openbayes.com/console/public/tutorials/I8euxlahBAm\"\u003eOpenBayes贝式计算 Demo\u003c/a\u003e \n\u003cbr\u003e\n🤗 \u003ca href=\"https://huggingface.co/datasets/showlab/ShowUI-desktop-8K\"\u003eDatasets\u003c/a\u003e\u0026nbsp\u0026nbsp | \u0026nbsp\u0026nbsp💬 \u003ca href=\"https://x.com/_akhaliq/status/1864387028856537400\"\u003eX (Twitter)\u003c/a\u003e\u0026nbsp\u0026nbsp\n| \u0026nbsp\u0026nbsp 🖥️ \u003ca href=\"https://github.com/showlab/computer_use_ootb\"\u003eComputer Use\u003c/a\u003e \u0026nbsp\u0026nbsp \u003c/a\u003e \n|  \u0026nbsp\u0026nbsp 📖 \u003ca href=\"https://github.com/showlab/Awesome-GUI-Agent\"\u003eGUI Paper List\u003c/a\u003e \u0026nbsp\u0026nbsp \u003c/a\u003e\n| \u0026nbsp\u0026nbsp 🤖 \u003ca href=\"https://www.modelscope.cn/models/AI-ModelScope/ShowUI-2B\"\u003eModelScope\u003c/a\u003e \n\u003c/p\u003e\n\n\u003c!-- [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fshowlab%2FShowUI\u0026count_bg=%2379C83D\u0026title_bg=%23555555\u0026icon=\u0026icon_color=%23E7E7E7\u0026title=hits\u0026edge_flat=false)](https://hits.seeyoufarm.com) --\u003e\n\n\u003e [**ShowUI: One Vision-Language-Action Model for GUI Visual Agent**](https://arxiv.org/abs/2411.17465)\u003cbr\u003e\n\u003e [Kevin Qinghong Lin](https://qinghonglin.github.io/), [Linjie Li](https://scholar.google.com/citations?user=WR875gYAAAAJ\u0026hl=en), [Difei Gao](https://scholar.google.com/citations?user=No9OsocAAAAJ\u0026hl=en), [Zhengyuan Yang](https://zyang-ur.github.io/), [Shiwei Wu](https://scholar.google.com/citations?user=qWOFgUcAAAAJ), [Zechen Bai](https://www.baizechen.site/), [Weixian Lei](), [Lijuan Wang](https://scholar.google.com/citations?user=cDcWXuIAAAAJ\u0026hl=en), [Mike Zheng Shou](https://scholar.google.com/citations?user=h1-3lSoAAAAJ\u0026hl=en)\n\u003e \u003cbr\u003eShow Lab @ National University of Singapore, Microsoft\u003cbr\u003e\n\n## 🔥 Update\n- [x] [2025.3.2] Support fine-tuning and inference of the lastest base model **Qwen2.5-VL**.\n- [x] [2025.2.27] ShowUI has been accepted to **CVPR 2025**.\n- [x] [2025.2.13] Support **vllm** inference.\n- [x] [2025.1.20] Support Navigation tasks: Mind2Web, AITW, Miniwob training and evaluator.\n- [x] [2025.1.17] Support **API Calling** via Gradio Client, simply run `python3 api.py`.\n- [x] [2025.1.5] Release the [`ShowUI-web`](https://huggingface.co/datasets/showlab/ShowUI-web) dataset.\n- [x] [2024.12.28] Update GPT-4o annotation recaptioning scripts.\n- [x] [2024.12.27] Update training codes and instructions.\n- [x] [2024.12.23] Update `showui` for UI-guided token selection implementation.\n- [x] [2024.12.15] ShowUI received **Outstanding Paper Award** at [NeurIPS2024 Open-World Agents workshop](https://sites.google.com/view/open-world-agents/schedule).\n- [x] [2024.12.9] Support int8 Quantization.\n- [x] [2024.12.5] **Major Update: ShowUI is integrated into [OOTB](https://github.com/showlab/computer_use_ootb?tab=readme-ov-file) for local run!**\n- [x] [2024.12.1] We support iterative refinement to improve grounding accuracy. Try it at [HF Spaces demo](https://huggingface.co/spaces/showlab/ShowUI).\n- [x] [2024.11.27] We release the [arXiv paper](https://arxiv.org/abs/2411.17465), [HF Spaces demo](https://huggingface.co/spaces/showlab/ShowUI) and [`ShowUI-desktop`](https://huggingface.co/datasets/showlab/ShowUI-desktop).\n- [x] [2024.11.16] [`showlab/ShowUI-2B`](https://huggingface.co/showlab/ShowUI-2B) is available at huggingface.\n\n\n## 🤖 vllm Inference\nSee [inference_vllm.ipynb](inference_vllm.ipynb) for vllm inference.\n\u003e To leverage multiple GPUs for faster inference, you can adjust the gpu_num parameter\n\n## ⚡ API Calling\nRun `python3 api.py` by providing a screenshot and a query.\n\u003e Since we are based on huggingface gradio client, you don't need a GPU to deploy the model locally 🤗\n\n## 🖥️ Computer Use\nSee [Computer Use OOTB](https://github.com/showlab/computer_use_ootb?tab=readme-ov-file) for using ShowUI to control your PC.\n\nhttps://github.com/user-attachments/assets/f50b7611-2350-4712-af9e-3d31e30020ee\n\n## ⭐ Quick Start\nSee [Quick Start](QUICK_START.md) for local model usage.\n\n## 🤗 Local Gradio\nSee [Gradio](GRADIO.md) for installation.\n\n## 🚀 Training\nOur Training codebases supports:\n- [x] Grounding and Navigation training: Mind2Web, AITW, Miniwob\n- [x] Self-customized model: ShowUI, Qwen2VL, Qwen2.5VL\n- [x] Efficient Training: DeepSpeed, BF16, QLoRA, SDPA / FlashAttention2, Liger-Kernel\n- [x] Multiple datasets mixed training\n- [x] Interleaved data streaming\n- [x] Image randomly resize (crop, pad)\n- [x] Wandb training monitor\n- [x] Multi-GPUs, Multi-nodes training \n\nSee [Train](TRAIN.md) for training set up.\n\n## 🕹️ UI-Guided Token Selection\nTry [`test.ipynb`](test.ipynb), which seamless support for Qwen2VL models.\n\n\u003cdiv style=\"display: flex; justify-content: space-between;\"\u003e\n  \u003cimg src=\"examples/chrome.png\" alt=\"(a) Screenshot patch number: 1296\" style=\"width: 48%;\"/\u003e\n  \u003cimg src=\"examples/demo.png\" alt=\"(b) By applying UI-graph, UI Component number: 167\" style=\"width: 48%;\"/\u003e\n\u003c/div\u003e\n\n## ✍️ Annotate your own data\nTry [`recaption.ipynb`](recaption.ipynb), where we provide instructions on how to recaption the original annotations using GPT-4o.\n\n## ❤ Acknowledgement\nWe extend our gratitude to [SeeClick](https://github.com/njucckevin/SeeClick) for providing their codes and datasets.\n\nSpecial thanks to [Siyuan](https://x.com/who_s_yuan) for assistance with the Gradio demo and OOTB support.\n\n## 🎓 BibTeX\nIf you find our work helpful, please kindly consider citing our paper.\n\n```\n@misc{lin2024showui,\n      title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent}, \n      author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},\n      year={2024},\n      eprint={2411.17465},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https://arxiv.org/abs/2411.17465}, \n}\n```\n\nIf you like our project, please give us a star ⭐ on GitHub for the latest update.\n\n[![Star History Chart](https://api.star-history.com/svg?repos=showlab/ShowUI\u0026type=Timeline\u0026width=600\u0026height=300)](https://star-history.com/#showlab/ShowUI\u0026Timeline)\n","funding_links":[],"categories":["AI Agent Frameworks","Python","Web Automation and UI Interaction","AI开源项目"],"sub_categories":["Browser Automation Agents","UI Interaction","AI Agent"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshowlab%2FShowUI","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshowlab%2FShowUI","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshowlab%2FShowUI/lists"}