{"id":47584632,"url":"https://github.com/xlang-ai/OpenCUA","last_synced_at":"2026-04-16T04:01:02.113Z","repository":{"id":308371720,"uuid":"1006147470","full_name":"xlang-ai/OpenCUA","owner":"xlang-ai","description":"OpenCUA: Open Foundations for Computer-Use Agents","archived":false,"fork":false,"pushed_at":"2026-02-04T08:33:41.000Z","size":48976,"stargazers_count":672,"open_issues_count":15,"forks_count":86,"subscribers_count":7,"default_branch":"main","last_synced_at":"2026-02-13T08:39:56.849Z","etag":null,"topics":["benchmark","computer-use-agent","dataset","foundation-models","gui","vision-language-model"],"latest_commit_sha":null,"homepage":"https://opencua.xlang.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xlang-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-21T15:47:01.000Z","updated_at":"2026-02-12T05:28:51.000Z","dependencies_parsed_at":"2025-10-05T05:42:13.455Z","dependency_job_id":null,"html_url":"https://github.com/xlang-ai/OpenCUA","commit_stats":null,"previous_names":["xlang-ai/opencua"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/xlang-ai/OpenCUA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FOpenCUA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FOpenCUA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FOpenCUA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FOpenCUA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xlang-ai","download_url":"https://codeload.github.com/xlang-ai/OpenCUA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FOpenCUA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31870516,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"online","status_checked_at":"2026-04-16T02:00:06.042Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","computer-use-agent","dataset","foundation-models","gui","vision-language-model"],"created_at":"2026-04-01T06:00:24.931Z","updated_at":"2026-04-16T04:01:02.099Z","avatar_url":"https://github.com/xlang-ai.png","language":"Python","funding_links":[],"categories":["Training Data Tools","Agent Harnessing and Evaluation","Python"],"sub_categories":["Benchmark Reality Check (real-world tool use)"],"readme":"\n\u003ch1 style=\"\n  font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Helvetica,Arial,sans-serif;\n  font-size:48px;\n  font-weight:700;\n  line-height:1.25;\n  text-align:center;\n  margin:0 0 24px;\"\u003e\n  OpenCUA: Open Foundations for Computer-Use Agents\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n\u0026nbsp\u0026nbsp🌐 \u003ca href=\"https://opencua.xlang.ai/\"\u003eWebsite\u003c/a\u003e\u0026nbsp\u0026nbsp | \u0026nbsp\u0026nbsp📑 \u003ca href=\"https://arxiv.org/abs/2508.09123\"\u003ePaper\u003c/a\u003e\u0026nbsp\u0026nbsp | \u0026nbsp\u0026nbsp🤗 \u003ca href=\"https://huggingface.co/datasets/xlangai/AgentNet\"\u003eDataset\u003c/a\u003e\u0026nbsp\u0026nbsp | \u0026nbsp\u0026nbsp🔎 \u003ca href=\"https://agentnet_data_viewer.xlang.ai/\"\u003eData Viewer\u003c/a\u003e\u0026nbsp\u0026nbsp | \u0026nbsp\u0026nbsp🤖 \u003ca href=\"https://huggingface.co/collections/xlangai/opencua-open-foundations-for-computer-use-agents-6882014ebecdbbe46074a68d\"\u003eModel\u003c/a\u003e\u0026nbsp\u0026nbsp | \u0026nbsp\u0026nbsp🔧  \u003ca href=\"https://agentnet-tool.xlang.ai/\"\u003eTool\u003c/a\u003e\u0026nbsp\u0026nbsp | \u0026nbsp\u0026nbsp🎮  \u003ca href=\"https://huggingface.co/spaces/xlangai/OpenCUA-demo\"\u003eModel Demo\u003c/a\u003e\u0026nbsp\u0026nbsp \n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/images/main_fig.png\" width=\"600\" alt=\"OpenCUA-7B Performance Scaling\"\u003e\n\u003c/div\u003e\n\n\u003cdiv style=\"max-width:900px;margin:0 auto;\"\u003e\n\n## 📢 Updates\n- 2026-01-17: 🎉 **vLLM now fully supports OpenCUA-7B, OpenCUA-32B, and OpenCUA-72B!** Thanks to the [Meituan EvoCUA Team](https://github.com/meituan/EvoCUA) for their contributions to vLLM integration. See [vLLM Serve](model/README.md) for usage instructions.\n\n- 2025-12-17: You can now view AgentNet dataset trajectories online via [AgentNet Data Viewer](https://agentnet_data_viewer.xlang.ai/), or use the code in `data/vis/` to visualize your own trajectory data. See [vis/README.md](./data/vis/README.md) for usage instructions. We also summarized the metadata of AgentNet here [Metadata json](https://huggingface.co/datasets/xlangai/AgentNet/blob/main/meta_data_merged.jsonl).\n\n\n- 2025-11-28: VLLM support of OpenCUA is available at [[Model] Add OpenCUA-7B support #29068](https://github.com/vllm-project/vllm/pull/29068). Super grateful to [lim4349](https://github.com/lim4349) !\n  \n- 2025-10-12:  \u003cspan style=\"font-weight:bold\"\u003e[OpenCUA-7B-exl2](https://huggingface.co/sujitvasanth/OpenCUA-7B-exl2) is now live!\u003c/span\u003e ⚡️  \n  Thanks to [Sujit Vasanth](https://huggingface.co/sujitvasanth) for producing a quantized **exllamav2** version of OpenCUA-7B — enabling much faster inference with lower VRAM usage.  \n\n\n- 2025-10-03: \u003cspan style=\"color:red; font-weight:bold\"\u003eNew OpenCUA model!\u003c/span\u003e🔥 \n[OpenCUA-72B](https://huggingface.co/xlangai/OpenCUA-72B-preview) now ranks #1 on the [OSWorld-Verified leaderboard](https://os-world.github.io/). It also has strong grounding ability, 37.3% (SOTA) on UI-Vision \n and 60.8% on ScreenSpot-Pro.\n- 2025-08-13: We released our [paper](https://arxiv.org/abs/2508.09123) and [project page](https://opencua.xlang.ai/). Check it out!\n\n# Introduction\n\u003cdiv style=\"\n  max-width: 880px;              /* 可按需调节整体宽度 */\n  margin: 0 auto;               /* 居中容器 */\n  text-align: justify;          /* 关键：两端对齐 */\n  text-justify: inter-word;     /* 优化英文对齐效果 */\n  line-height: 1.6;\"\u003e\n  \n\u003cb\u003eOpenCUA\u003c/b\u003e is a comprehensive open-source framework for scaling CUA data and foundation models, consisting of: \n- \u003cb\u003e[AgentNet](https://huggingface.co/datasets/xlangai/AgentNet)\u003c/b\u003e: the first large-scale computer-use task dataset spanning 3 operating systems and 200+ applications and websites; \n- **[AgentNetTool](https://agentnet-tool.xlang.ai/)**: an annotation infrastructure that seamlessly captures human computer-use demonstrations; \n- \u003cb\u003e[AgentNetBench](https://github.com/xlang-ai/OpenCUA/tree/main/AgentNetBench)\u003c/b\u003e: an offline evaluator that benchmarks model-predicted low-level actions against ground-truth trajectories.\n- **[OpenCUA Models](https://huggingface.co/collections/xlangai/opencua-open-foundations-for-computer-use-agents-6882014ebecdbbe46074a68d\")**: end-to-end computer-use foundation models than can produce executable actions in the computer environments with great planning and grounding capabilities.\n\n\nWith the help of OpenCUA framework, our end-to-end agent models demonstrate strong performance across CUA benchmarks. In particular, \u003cb\u003eOpenCUA-72B\u003c/b\u003e achieves an average success rate of **45.0%** on [OSWorld-Verified](https://os-world.github.io/), \nestablishing a new state-of-the-art (SOTA) among open-source models. \n\n\u003c/div\u003e\n\n\n##  🚀 Quick Start of OpenCUA Models\n\u003cdiv style=\"border-left: 6px solid #f28c28; background: #fff8e6; padding: 12px 16px; margin: 16px 0;\"\u003e\n  \u003cstrong\u003e⚠️ Important for Qwen-based Models (OpenCUA-7B, OpenCUA-32B, OpenCUA-72B):\u003c/strong\u003e\n  \n  To align with our training infrastructure, we have modified the model in two places:\n  \u003cul style=\"margin-top: 8px;\"\u003e\n    \u003cli\u003e1. Multimodal Rotary Position Embedding (M-RoPE) has been replaced with 1D RoPE\u003c/strong\u003e.\u003c/li\u003e\n    \u003cli\u003e2. Using the same Tokenizer and ChatTemplate as Kimi-VL.\u003c/li\u003e\n    \u003cli\u003eDo not use the default transformers and vllm classes to load the model. Tokenizer and Chat Template should be aligned if training the models.\u003c/li\u003e\n  \u003c/ul\u003e\n\u003c/div\u003e\n\n\n### Installation \u0026 Download\n\nFirst, install the required transformers dependencies:\n\n```bash\nconda create -n opencua python=3.10\nconda activate opencua\npip install -r requirement.txt\n```\n\nDownload the model weight from huggingface:\n```bash\nfrom huggingface_hub import snapshot_download\nsnapshot_download(\n    repo_id=\"xlangai/OpenCUA-7B\",\n    local_dir=\"OpenCUA-7B\",                \n    local_dir_use_symlinks=False  \n)\n```\n\n### 🚀 vLLM Serve\n\nWe recommend using vLLM for production deployment. Requires **vllm\u003e=0.12.0** with `--trust-remote-code`.\n\n```bash\n# OpenCUA-7B (single GPU)\nvllm serve xlangai/OpenCUA-7B \\\n  --trust-remote-code \\\n  --served-model-name opencua-7b \\\n  --host 0.0.0.0 \\\n  --port 8000\n\n# OpenCUA-32B (4 GPUs, tensor parallel)\nvllm serve xlangai/OpenCUA-32B \\\n  --trust-remote-code \\\n  --tensor-parallel-size 4 \\\n  --served-model-name opencua-32b \\\n  --host 0.0.0.0 \\\n  --port 8000\n\n# OpenCUA-72B with data parallelism (tp=2, dp=4 for 4 instances on 8 GPUs)\nvllm serve xlangai/OpenCUA-72B \\\n  --trust-remote-code \\\n  --tensor-parallel-size 2 \\\n  --data-parallel-size 4 \\\n  --gpu-memory-utilization 0.85 \\\n  --host 0.0.0.0 \\\n  --port 8000\n```\n\nAdjust `--tensor-parallel-size`, `--data-parallel-size`, and `--gpu-memory-utilization` based on your hardware configuration.\n\nFor more examples and inference code, see [model/inference/vllm_inference.py](./model/inference/vllm_inference.py).\n\n### 🎯 GUI Grounding\n\nFirst, start the vLLM server (using OpenCUA-7B as example):\n```bash\nvllm serve xlangai/OpenCUA-7B \\\n  --trust-remote-code \\\n  --served-model-name opencua-7b \\\n  --host 0.0.0.0 \\\n  --port 8000\n```\n\nThen run the grounding examples:\n```\ncd ./model/inference/\npython vllm_inference.py\n```\n\nOr with HuggingFace Transformers (no server required):\n```\npython huggingface_inference.py\n```\n\n### 🖥️ Computer Use Agent\n**[OpenCUAAgent](https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/opencua_agent.py)** is developed in the [OSWorld](https://github.com/xlang-ai/OSWorld) environment based on OpenCUA models. It iteratively perceives the environment via screenshots, produces reflective long CoT as inner monologue, and predicts the next action to be executed. OpenCUAAgent uses 3 images in total and L2 CoT format in default.\n\nCommand for running OpenCUA-7B and OpenCUA-32B in OSWorld:\n```\n    python run_multienv_opencua.py \\\n        --headless \\\n        --observation_type screenshot \\\n        --model OpenCUA-32B \\\n        --result_dir ./results --test_all_meta_path evaluation_examples/test_all_no_gdrive.json \\\n        --max_steps 100 \\\n        --num_envs 30  \\\n        --coordinate_type qwen25\n```\n\n---\n\n## Performance\n\n### Online Agent Evaluation\nOpenCUA models achieves strong performance on **[OSWorld-Verified](https://os-world.github.io/)**. \nOPENCUA-32B achieves the best performance among all open-source models with an average success rate of 34.8%, outperforming prior baselines by large margins. \nIt also closes the gap to proprietary Claude models.\n\u003cdiv align=\"center\"\u003e\n\n| **Model**                        | **15 Steps** | **50 Steps** | **100 Steps** |\n|-------------------------------|:--------:|:--------:|:---------:|\n| **Proprietary**               |          |          |           |\n| OpenAI CUA                    | 26.0     | 31.3     | 31.4      |\n| Seed 1.5-VL                   | 27.9     | —        | 34.1      |\n| Claude 3.7 Sonnet             | 27.1     | 35.8     | 35.9      |\n| Claude 4 Sonnet               | 31.2     | 43.9     | 41.5      |\n| **Open-Source**               |          |          |           |\n| Qwen 2.5-VL-32B-Instruct      | 3.0      | —        | 3.9       |\n| Qwen 2.5-VL-72B-Instruct      | 4.4      | —        | 5.0       |\n| Kimi-VL-A3B                   | 9.7      | —        | 10.3      |\n| UI-TARS-72B-DPO               | 24.0     | 25.8     | 27.1      |\n| UI-TARS-1.5-7B                | 24.5     | 27.3     | 27.4      |\n| OpenCUA-7B *(Ours)*           | 24.3     | 27.9     | 26.6      |\n| OpenCUA-32B *(Ours)*          | **29.7** | **34.1** | 34.8      |\n| **OpenCUA-72B*(Ours)***      | 39.0   | 44.9  | **45.0**  |\n\u003c/div\u003e\n\n*OpenCUA scores are the mean of 3 independent runs.*\n\n### GUI Grounding Performance\n\u003cdiv align=\"center\"\u003e\n\n| **Model** | **OSWorld-G** | **ScreenSpot-V2** | **ScreenSpot-Pro** | **UI-Vision** |\n|-------|-----------|---------------|----------------| ---------- |\n| Qwen2.5-VL-7B   | 31.4 | 88.8 | 27.6 |  0.85 |\n| Qwen2.5-VL-32B  | 46.5 | 87.0 | 39.4 | - |\n| UI-TARS-72B     | 57.1 | 90.3 | 38.1 | 25.5 |\n| **OpenCUA-7B**  | 55.3 | 92.3 | 50.0 | 29.7 |\n| **OpenCUA-32B** | **59.6** | **93.4** | 55.3 | 33.3 |\n| **OpenCUA-72B** | 59.2 | 92.9 | **60.8** | **37.3** |\n\u003c/div\u003e\n\n\n### AgentNetBench (Offline Evaluation)\n\u003cdiv align=\"center\"\u003e\n\n| **Model** | **Coordinate Actions** | **Content Actions** | **Function Actions** | **Average** |\n|-------|-------------------|-----------------|------------------|---------|\n| Qwen2.5-VL-7B | 50.7 | 40.8 | 3.1 | 48.0 |\n| Qwen2.5-VL-32B | 66.6 | 47.2 | 41.5 | 64.8 |\n| Qwen2.5-VL-72B | 67.2 | 52.6 | 50.5 | 67.0 |\n| OpenAI CUA          | 71.7 | 57.3 | **80.0** | 73.1 |\n| **OpenCUA-7B**  | 79.0 | 62.0 | 44.3 | 75.2 |\n| **OpenCUA-32B** | **81.9** | 66.1 | 55.7 | **79.1** |\n\u003c/div\u003e\n\n---\n\n## AgentNet Dataset - Large-Scale Computer-Use Dataset\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/images/domain_distribution.png\" width=\"400\" alt=\"AgentNet Dataset Domain Distribution\"\u003e\n\u003c/div\u003e\n\nAgentNet is the first large-scale desktop computer-use agent trajectory dataset, containing 22.6K human-annotated computer-use tasks across Windows, macOS, and Ubuntu systems. \n\n👉 **[AgentNet Huggingface Dataset](https://huggingface.co/datasets/xlangai/AgentNet)**\n\nDownload the dataset here：\n```\npip install -U huggingface_hub\nhuggingface-cli download xlangai/AgentNet --repo-type dataset --local-dir ./AgentNet\n```\n\nUse the following command to unzip the file (For exmaple, Ubuntu data):\n```\ncd path_to_your_zip_files\n\n# Merge all the zips\nzip -s 0 images.zip --out images-full.zip\n\n# Unzip\nunzip images-full.zip -d path_to_your_target_dir\n```\n\nCollecting computer-use agent training data requires 3 steps:\n- Demonstrate human computer-use task via [AgentNetTool](https://agentnet-tool.xlang.ai/);\n- Preprocess the demonstration using [Action Reduction \u0026 State-Action Matching](./data/data-processor);\n- For each step, [synthesize reflective long CoT](./data/cot-generator)\n\n\n### 1 AgentNetTool – Annotation \u0026 Verification Tool\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/images/agn_tool_fig.png\" width=\"700\" alt=\"AgentNet Tool\"\u003e\n\u003c/div\u003e\n\n\nOur **AgentNetTool** is a cross-platform GUI recorder that runs unobtrusively on annotators’ machines. It captures synchronized **screen video**, **mouse/keyboard events**, and **accessibility trees**, then provides an in-browser UI for reviewing, trimming, and submitting demonstrations. AgentNet Tool is available on Windows, macOS and Ubuntu. \n\n👉 **[AgentNetTool Document](https://agentnet-tool.xlang.ai/)**\n\n\n\n### 2 DataProcessor – Action Reduction \u0026 State–Action Matching\nRaw demonstrations can contain thousands of low-level events that are too dense for model training.  \nThe **DataProcessor** module (`./data/data-process/`) performs two key steps:\n\n1. **Action Reduction** — merges granular signals into concise, semantically meaningful PyAutoGUI actions (e.g., collapsing mouse moves → click, coalescing scrolls, grouping key-press sequences into text or hotkeys).  \n2. **State–Action Matching** — aligns every reduced action with the *last visually distinct frame* **before** the action begins, avoiding future-information leakage and yielding compact state–action pairs.\n\nThese processed trajectories underlie all downstream training and evaluation.\n\n---\n\n### 3 CoTGenerator – Synthesizing Reflective Long Chain-of-Thought Inner Monologue\nTo boost robustness and interpretability, we augment each trajectory with **reflective long Chain-of-Thought (CoT) reasoning**.  \nThe **CoTGenerator** pipeline (`./data/cot-generator/`) synthesizes step-level reflections that:\n\n* reflect on the previous action,\n* explain *why* an action is chosen given the current observation and history,  \n* note potential alternative actions, and  \n* forecast the expected next state.\n\nEmpirically, models trained with these rich CoTs scale better with data and generalize across unseen applications.\n\n\n## AgentNetBench\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/images/AgentNetBench.png\" width=\"800\" alt=\"AgentNetBench\"\u003e\n\u003c/div\u003e\n\n\n**AgentNetBench** (`./AgentNetBench/`) provides a realistic offline evaluator for OS agent trajectories. It compares model-predicted low-level actions (click, moveTo, write, press, scroll, terminate, etc.) against ground-truth human actions and reports detailed metrics.\n\n👉 See **[AgentNetBench/README.md](./evaluation/agentnetbench/README.md)** for usage instructions.\n\n## TODO\n- [x] **vLLM Support** ✅\n  - vLLM now fully supports OpenCUA-7B, OpenCUA-32B, and OpenCUA-72B.\n  - See [vLLM Serve](#vllm-serve) section for usage instructions.\n  - Thanks to the Meituan EvoCUA Team for their contributions!\n\n- [ ] **Training Code**\n  - OpenCUA models are developed based on the training infrastructure of Kimi Team.\n  - Currently developing the training pipeline based on open-source infrastructure.\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=xlang-ai/OpenCUA\u0026type=date\u0026legend=top-left)](https://www.star-history.com/#xlang-ai/OpenCUA\u0026type=date\u0026legend=top-left)\n\n## Acknowledge\n\u003cp\u003e\nWe thank Yu Su, Caiming Xiong, and the anonymous reviewers for their insightful discussions and valuable feedback.\nWe are grateful to Moonshot AI for providing training infrastructure and annotated data.\nWe also sincerely appreciate Hao Yang, Zhengtao Wang, and Yanxu Chen from the Kimi Team for their strong infrastructure support and helpful guidance.\nWe thank Chong Peng, Taofeng Xue, and Qiumian Huang from the \u003ca href=\"https://github.com/meituan/EvoCUA\" target=\"_blank\"\u003eMeituan EvoCUA Team\u003c/a\u003e for their contributions to vLLM integration.\nThe development of our tool is based on the open-source projects-\u003ca href=\"https://github.com/TheDuckAI/DuckTrack\" target=\"_blank\"\u003eDuckTrack\u003c/a\u003e and \u003ca href=\"https://github.com/OpenAdaptAI/OpenAdapt\" target=\"_blank\"\u003eOpenAdapt\u003c/a\u003e.\nWe are very grateful to their commitment to the open source community. Finally, we extend our deepest thanks to all annotators for their tremendous effort and contributions to this project.\n\u003c/p\u003e\n\n## Research and Commercial Use\n\nOpenCUA (including the model, dataset, tools, and code) may be used for **research, educational, and commercial purposes** under the **MIT License** (see `LICENSE`).\n\n### Citation and Acknowledgement\nIf you use **OpenCUA models** and/or the **AgentNet dataset** in any **report, technical report, publication, thesis, presentation, blog post, documentation, or other publicly shared material**, we **kindly ask** that you include an explicit acknowledgement in the main text and cite the OpenCUA paper.\n\n### Prohibited Uses\n- The model, dataset, tool, and code may **not** be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction\n- Use for illegal, unethical, or harmful activities is strictly prohibited\n\n### Disclaimer\n- The authors, contributors, and copyright holders are **not responsible** for any illegal, unethical, or harmful use of the Software, nor for any direct or indirect damages resulting from such use\n- Use of the \"OpenCUA\" name, logo, or trademarks does **not** imply any endorsement or affiliation unless separate written permission is obtained\n- Users are solely responsible for ensuring their use complies with applicable laws and regulations\n\n## Citation\n\nIf you use OpenCUA in your research, please cite our work:\n\n```bibtex\n@misc{wang2025opencuaopenfoundationscomputeruse,\n      title={OpenCUA: Open Foundations for Computer-Use Agents}, \n      author={Xinyuan Wang and Bowen Wang and Dunjie Lu and Junlin Yang and Tianbao Xie and Junli Wang and Jiaqi Deng and Xiaole Guo and Yiheng Xu and Chen Henry Wu and Zhennan Shen and Zhuokai Li and Ryan Li and Xiaochuan Li and Junda Chen and Boyuan Zheng and Peihang Li and Fangyu Lei and Ruisheng Cao and Yeqiao Fu and Dongchan Shin and Martin Shin and Jiarui Hu and Yuyan Wang and Jixuan Chen and Yuxiao Ye and Danyang Zhang and Dikang Du and Hao Hu and Huarong Chen and Zaida Zhou and Haotian Yao and Ziwei Chen and Qizheng Gu and Yipu Wang and Heng Wang and Diyi Yang and Victor Zhong and Flood Sung and Y. Charles and Zhilin Yang and Tao Yu},\n      year={2025},\n      eprint={2508.09123},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https://arxiv.org/abs/2508.09123}, \n}\n```\n\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxlang-ai%2FOpenCUA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxlang-ai%2FOpenCUA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxlang-ai%2FOpenCUA/lists"}