https://github.com/bytedance/ui-tars-desktop
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
https://github.com/bytedance/ui-tars-desktop
agent browser-use computer-use electron gui-agents mcp mcp-server vision vite vlm
Last synced: about 8 hours ago
JSON representation
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
- Host: GitHub
- URL: https://github.com/bytedance/ui-tars-desktop
- Owner: bytedance
- License: apache-2.0
- Created: 2025-01-19T09:04:43.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-04-21T08:41:20.000Z (1 day ago)
- Last Synced: 2025-04-21T09:36:25.233Z (1 day ago)
- Topics: agent, browser-use, computer-use, electron, gui-agents, mcp, mcp-server, vision, vite, vlm
- Language: TypeScript
- Homepage: https://agent-tars.com
- Size: 42.3 MB
- Stars: 12,072
- Watchers: 115
- Forks: 953
- Open Issues: 133
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
> [!IMPORTANT]
>
>![]()
>
>
> **\[2025-03-18\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/agent-tars/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
![]()
# UI-TARS Desktop
UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Model)](https://github.com/bytedance/UI-TARS) that allows you to control your computer using natural language.
   π Paper   
| π€ Hugging Face Models  
|   π«¨ Discord  
|   π€ ModelScope  
π₯οΈ Desktop Application   
|    π Midscene (use in browser)## Showcases
| Instruction | Video |
| :---: | :---: |
| Get the current weather in SF using the web browser | |
| Send a twitter with the content "hello world" | |## News
- **\[2025-02-20\]** - π¦ Introduced [UI TARS SDK](./docs/sdk.md), is a powerful cross-platform toolkit for building GUI automation agents.
- **\[2025-01-23\]** - π We updated the **[Cloud Deployment](./docs/deployment.md#cloud-deployment)** section in the δΈζη: [GUI樑ει¨η½²ζη¨](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.## Features
- π€ Natural language control powered by Vision-Language Model
- π₯οΈ Screenshot and visual recognition support
- π― Precise mouse and keyboard control
- π» Cross-platform support (Windows/MacOS)
- π Real-time feedback and status display
- π Private and secure - fully local processing## Quick Start
See [Quick Start](./docs/quick-start.md).
## Deployment
See [Deployment](./docs/deployment.md).
## Contributing
See [CONTRIBUTING.md](./CONTRIBUTING.md).
## SDK (Experimental)
See [@ui-tars/sdk](./docs/sdk.md)
## License
UI-TARS Desktop is licensed under the Apache License 2.0.
## Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:```BibTeX
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}
```