Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bytedance/ui-tars-desktop
A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
https://github.com/bytedance/ui-tars-desktop
agent browser-use computer-use electron gui-agents vision vite vlm
Last synced: about 8 hours ago
JSON representation
A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
- Host: GitHub
- URL: https://github.com/bytedance/ui-tars-desktop
- Owner: bytedance
- License: apache-2.0
- Created: 2025-01-19T09:04:43.000Z (14 days ago)
- Default Branch: main
- Last Pushed: 2025-01-28T07:41:08.000Z (5 days ago)
- Last Synced: 2025-01-28T08:20:51.347Z (5 days ago)
- Topics: agent, browser-use, computer-use, electron, gui-agents, vision, vite, vlm
- Language: TypeScript
- Homepage:
- Size: 22.7 MB
- Stars: 2,116
- Watchers: 40
- Forks: 130
- Open Issues: 32
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# UI-TARS Desktop
UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Model)](https://github.com/bytedance/UI-TARS) that allows you to control your computer using natural language.
   π Paper   
| π€ Hugging Face Models  
|   π€ ModelScope  
π₯οΈ Desktop Application   
|    π Midscene (use in browser)### β οΈ Important Announcement: GGUF Model Performance
The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
π‘ **Alternative Solution**:
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.We appreciate your understanding and patience as we work to ensure the best possible experience.
## Updates
- π 01.25: We updated the **[Cloud Deployment](#cloud-deployment)** section in the δΈζη: [GUI樑ει¨η½²ζη¨](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
## Showcases
| Instruction | Video |
| :---: | :---: |
| Get the current weather in SF using the web browser | |
| Send a twitter with the content "hello world" | |## Features
- π€ Natural language control powered by Vision-Language Model
- π₯οΈ Screenshot and visual recognition support
- π― Precise mouse and keyboard control
- π» Cross-platform support (Windows/MacOS)
- π Real-time feedback and status display
- π Private and secure - fully local processing## Quick Start
### Download
You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
### Install
#### MacOS
1. Drag **UI TARS** application into the **Applications** folder
2. Enable the permission of **UI TARS** in MacOS:
- System Settings -> Privacy & Security -> **Accessibility**
- System Settings -> Privacy & Security -> **Screen Recording**
3. Then open **UI TARS** application, you can see the following interface:
#### Windows
**Still to run** the application, you can see the following interface:
### Deployment
#### Cloud Deployment
We recommend using HuggingFace Inference Endpoints for fast deployment.
We provide two docs for users to refer:English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
δΈζη: [GUI樑ει¨η½²ζη¨](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
#### Local Deployment [vLLM]
We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
```bash
pip install -U transformers
VLLM_VERSION=0.6.6
CUDA_VERSION=cu124
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}```
##### Download the Model
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)##### Start an OpenAI API Service
Run the command below to start an OpenAI-compatible API service:```bash
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model
```##### Input your API information
> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
## Development
Just simple two steps to run the application:
```bash
pnpm install
pnpm run dev
```> **Note**: On MacOS, you need to grant permissions to the app (e.g., iTerm2, Terminal) you are using to run commands.
### Testing
```bash
# Unit test
pnpm run test
# E2E test
pnpm run test:e2e
```## System Requirements
- Node.js >= 20
- Supported Operating Systems:
- Windows 10/11
- macOS 10.15+## License
UI-TARS Desktop is licensed under the Apache License 2.0.
## Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:```BibTeX
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}
```