Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bytedance/ui-tars-desktop

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
https://github.com/bytedance/ui-tars-desktop

agent browser-use computer-use electron gui-agents vision vite vlm

Last synced: about 8 hours ago
JSON representation

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

Awesome Lists containing this project

README

        


UI-TARS

# UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Model)](https://github.com/bytedance/UI-TARS) that allows you to control your computer using natural language.


&nbsp&nbsp πŸ“‘ Paper &nbsp&nbsp
| πŸ€— Hugging Face Models&nbsp&nbsp
| &nbsp&nbspπŸ€– ModelScope&nbsp&nbsp


πŸ–₯️ Desktop Application &nbsp&nbsp
| &nbsp&nbsp πŸ‘“ Midscene (use in browser)

### ⚠️ Important Announcement: GGUF Model Performance

The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.

πŸ’‘ **Alternative Solution**:
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.

We appreciate your understanding and patience as we work to ensure the best possible experience.

## Updates

- πŸš€ 01.25: We updated the **[Cloud Deployment](#cloud-deployment)** section in the δΈ­ζ–‡η‰ˆ: [GUIζ¨‘εž‹ιƒ¨η½²ζ•™η¨‹](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

## Showcases

| Instruction | Video |
| :---: | :---: |
| Get the current weather in SF using the web browser | |
| Send a twitter with the content "hello world" | |

## Features

- πŸ€– Natural language control powered by Vision-Language Model
- πŸ–₯️ Screenshot and visual recognition support
- 🎯 Precise mouse and keyboard control
- πŸ’» Cross-platform support (Windows/MacOS)
- πŸ”„ Real-time feedback and status display
- πŸ” Private and secure - fully local processing

## Quick Start

### Download

You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.

### Install

#### MacOS

1. Drag **UI TARS** application into the **Applications** folder

2. Enable the permission of **UI TARS** in MacOS:
- System Settings -> Privacy & Security -> **Accessibility**
- System Settings -> Privacy & Security -> **Screen Recording**

3. Then open **UI TARS** application, you can see the following interface:

#### Windows

**Still to run** the application, you can see the following interface:

### Deployment

#### Cloud Deployment
We recommend using HuggingFace Inference Endpoints for fast deployment.
We provide two docs for users to refer:

English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)

δΈ­ζ–‡η‰ˆ: [GUIζ¨‘εž‹ιƒ¨η½²ζ•™η¨‹](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)

#### Local Deployment [vLLM]
We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
```bash
pip install -U transformers
VLLM_VERSION=0.6.6
CUDA_VERSION=cu124
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}

```
##### Download the Model
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):

- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)

##### Start an OpenAI API Service
Run the command below to start an OpenAI-compatible API service:

```bash
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model
```

##### Input your API information

> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).

## Development

Just simple two steps to run the application:

```bash
pnpm install
pnpm run dev
```

> **Note**: On MacOS, you need to grant permissions to the app (e.g., iTerm2, Terminal) you are using to run commands.

### Testing

```bash
# Unit test
pnpm run test
# E2E test
pnpm run test:e2e
```

## System Requirements

- Node.js >= 20
- Supported Operating Systems:
- Windows 10/11
- macOS 10.15+

## License

UI-TARS Desktop is licensed under the Apache License 2.0.

## Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

```BibTeX
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}
```