An open API service indexing awesome lists of open source software.

https://github.com/bytedance/ui-tars-desktop

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
https://github.com/bytedance/ui-tars-desktop

agent browser-use computer-use electron gui-agents mcp mcp-server vision vite vlm

Last synced: about 8 hours ago
JSON representation

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

Awesome Lists containing this project

README

        

> [!IMPORTANT]
>
>
>

>
> **\[2025-03-18\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/agent-tars/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.


UI-TARS

# UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Model)](https://github.com/bytedance/UI-TARS) that allows you to control your computer using natural language.


&nbsp&nbsp πŸ“‘ Paper &nbsp&nbsp
| πŸ€— Hugging Face Models&nbsp&nbsp
| &nbsp&nbsp🫨 Discord&nbsp&nbsp
| &nbsp&nbspπŸ€– ModelScope&nbsp&nbsp


πŸ–₯️ Desktop Application &nbsp&nbsp
| &nbsp&nbsp πŸ‘“ Midscene (use in browser)

## Showcases

| Instruction | Video |
| :---: | :---: |
| Get the current weather in SF using the web browser | |
| Send a twitter with the content "hello world" | |

## News

- **\[2025-02-20\]** - πŸ“¦ Introduced [UI TARS SDK](./docs/sdk.md), is a powerful cross-platform toolkit for building GUI automation agents.
- **\[2025-01-23\]** - πŸš€ We updated the **[Cloud Deployment](./docs/deployment.md#cloud-deployment)** section in the δΈ­ζ–‡η‰ˆ: [GUIζ¨‘εž‹ιƒ¨η½²ζ•™η¨‹](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

## Features

- πŸ€– Natural language control powered by Vision-Language Model
- πŸ–₯️ Screenshot and visual recognition support
- 🎯 Precise mouse and keyboard control
- πŸ’» Cross-platform support (Windows/MacOS)
- πŸ”„ Real-time feedback and status display
- πŸ” Private and secure - fully local processing

## Quick Start

See [Quick Start](./docs/quick-start.md).

## Deployment

See [Deployment](./docs/deployment.md).

## Contributing

See [CONTRIBUTING.md](./CONTRIBUTING.md).

## SDK (Experimental)

See [@ui-tars/sdk](./docs/sdk.md)

## License

UI-TARS Desktop is licensed under the Apache License 2.0.

## Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

```BibTeX
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}
```