An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with computer-use

A curated list of projects in awesome lists tagged with computer-use .

https://github.com/bytedance/UI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

agent agent-tars browser-use computer-use gui-agent gui-operator mcp mcp-server multimodal tars ui-tars vision vlm

Last synced: 06 Oct 2025

https://github.com/trycua/cua

Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).

agent ai-agent apple computer-use computer-use-agent containerization cua desktop-automation hacktoberfest lume macos manus operator swift virtualization virtualization-framework windows windows-sandbox

Last synced: 26 Apr 2026

https://github.com/bytedance/ui-tars-desktop

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

agent browser-use computer-use electron gui-agents mcp mcp-server vision vite vlm

Last synced: 09 Sep 2025

https://github.com/web-infra-dev/midscene

AI-powered, vision-driven UI automation for every platform.

ai ai-test browser-use computer-use gpt-operator javascript phone-use testing

Last synced: 21 Apr 2026

https://github.com/upsonic/upsonic

The most reliable AI agent framework that supports MCP.

agent agent-framework claude computer-use llms mcp model-context-protocol openai rag reliability

Last synced: 09 Apr 2026

https://github.com/nanobrowser/nanobrowser

Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Operator.

agent ai ai-agents ai-tools automation browser-extension browser-use chrome-extension computer-use gpt-operator javascript manus multi-agent openai opensource operator web-agent web-automation

Last synced: 13 May 2025

https://github.com/a9t9/rpa

Ui.Vision Open-Source RPA Software with Computer Vision, OCR, Anthropic Computer Use/LLM. Selenium IDE import/export.

anthropic anthropic-claude browser-automation browser-extension computer-use data-driven-tests imacros selenium-ide web-automation web-scraping

Last synced: 16 May 2025

https://github.com/openadaptai/openadapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

agents ai-agents ai-agents-framework anthropic computer-use generative-process-automation google-gemini gpt4o huggingface large-action-model large-language-models large-multimodal-models omniparser openai process-automation process-mining python segment-anything transformers ultralytics

Last synced: 04 Mar 2026

https://github.com/showlab/showui

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

agent computer-use gui-agent vision-language-action vision-language-model

Last synced: 13 Sep 2025

https://github.com/OpenAdaptAI/OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

agents ai-agents ai-agents-framework anthropic computer-use generative-process-automation google-gemini gpt4o huggingface large-action-model large-language-models large-multimodal-models omniparser openai process-automation process-mining python segment-anything transformers ultralytics

Last synced: 05 Apr 2025

https://github.com/OpenCoworkAI/open-cowork

Open-source AI agent desktop app for Windows & macOS. One-click install Claude Code, MCP tools, and Skills — with sandbox isolation, multi-model support, and Feishu/Slack integration.

ai-agent ai-coding ai-tools anthropic claude-code coding-agent computer-use desktop-app electron mcp multi-model open-cowork sandbox skills

Last synced: 27 Apr 2026

https://github.com/thudm/cogagent

An open-sourced end-to-end VLM-based GUI Agent

agent computer-use glm gui-agent vlm

Last synced: 15 May 2025

https://github.com/showlab/ShowUI

Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

agent computer-use gui-agent vision-language-action vision-language-model

Last synced: 02 Oct 2025

https://github.com/cuga-project/cuga-agent

CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware features.

computer-use enterprise generalist-agent mcp

Last synced: 10 Mar 2026

https://github.com/microsoft/windowsagentarena

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

agentic ai ai-agent ai-benchmark ai-research computer computer-use desktop-agent windows

Last synced: 15 May 2025

https://github.com/e2b-dev/open-computer-use

Secure AI computer use powered by E2B Desktop Sandbox

agent ai anthropic claude computer-use llm

Last synced: 18 Jun 2025

https://microsoft.github.io/WindowsAgentArena/

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

agentic ai ai-agent ai-benchmark ai-research computer computer-use desktop-agent windows

Last synced: 23 Feb 2025

https://github.com/microsoft/WindowsAgentArena

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

agentic ai ai-agent ai-benchmark ai-research computer computer-use desktop-agent windows

Last synced: 11 Sep 2025

https://github.com/suitedaces/computer-agent

Desktop app to control your computer with AI, written in Rust

ai ai-tools anthropic claude-3-5-sonnet claude-4-5-sonnet computer-use gui react reactjs rust typescript

Last synced: 16 Feb 2026

https://github.com/bandarlabs/clickclickclick

A framework to enable autonomous android and computer use using any LLM (local or remote)

agents ai-agents-framework android-automation antrophic computer-use framework gemini generative-ai molmo ollama openai operator python

Last synced: 16 May 2025

https://github.com/celestoai/smolvm

Open-source AI sandbox infrastructure for code execution, browser use, and AI agents.

agent-runtime browser-agent browser-use computer-use openclaw sandbox

Last synced: 27 Apr 2026

https://github.com/inclusionAI/AWorld

Build, evaluate and run General Multi-Agent Assistance with ease

agent-swarm agentic-ai computer-use gym-environment mcp mcp-server phone-use world-model

Last synced: 02 May 2025

https://github.com/aditya-nadkarni/spongecake

Spongecake is the easiest way to launch computer use agents.

ai-agents ai-agents-framework automation computer-use docker llm openai python

Last synced: 13 Mar 2026

https://github.com/baryhuang/mcp-remote-macos-use

The only general AI agent that does NOT requires extra API key, giving you full control on your local and remote MacOs from Claude Desktop App

claude-desktop computer-use general-agent macos macos-use mcp-server

Last synced: 12 Jan 2026

https://github.com/inclusionai/aworld

Build, evaluate and run General Multi-Agent Assistance with ease

agent-swarm agentic-ai computer-use gym-environment mcp mcp-server phone-use world-model

Last synced: 03 Feb 2026

https://github.com/jeffrey-zang/opus

for when your fingers are greasy 🪄

computer-use electron opus react tailwind

Last synced: 30 Jun 2025

https://github.com/open-compass/mmbench-gui

Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.

benchmark-framework computer-use gui-agent vision-language-model

Last synced: 15 Sep 2025

https://github.com/arcboxlabs/arcbox

Run AI agents on real and isolated machines — own kernel, filesystem, and network — with <200ms boot. Local first, OCI compatible, pure Rust.

ai-agents computer-use containers docker firecracker microvm rust sandbox virtual-machine virtualization

Last synced: 05 Apr 2026

https://github.com/lvqq/intelli-browser

✨ Use natural language to control your browser, powered by LLM and playwright

anthropic claude claude-3-5-sonnet computer-use e2e e2e-tests playwright

Last synced: 27 Feb 2026

https://github.com/bytebot-ai/bytebot

A containerized framework for computer use agents with a virtual desktop environment.

ai-agents anthropic computer-use docker llm openai qemu

Last synced: 01 Apr 2025

https://github.com/cyberdesk-hq/cyberdesk

Open source virtual desktops for AI agents

ai-agents computer-use fastapi hono kubernetes nextjs terraform virtual-machine

Last synced: 10 Sep 2025

https://github.com/browser-use/contact-use

✉️ Use the power of browser-use to contact any person or organization... by any means necessary

ai browser-agent browser-use computer-use contact-info-scraper operator sales sales-automation scraping scraping-bot

Last synced: 04 Oct 2025

https://github.com/sh3ll3x3c/native-devtools-mcp

MCP server for native app testing — screenshot, OCR, click, type, find_text, template matching. macOS, Windows & Android. Works with Claude, Cursor, and any MCP client.

accessibility adb ai-agent android claude claude-code computer-use cursor e2e-testing macos mcp mobile-automation mobile-testing model-context-protocol ocr rpa screenshot template-matching ui-automation windows

Last synced: 04 Mar 2026

https://github.com/pnmartinez/simple-computer-use

Open source implementation for computer use, using light OCR models and LLMs. Get Android app in link below.

automation computer-use ocr ollama

Last synced: 21 Jun 2025

https://github.com/ginsing1226/screenclaw

Screenshot + percentage grids enabling any multimodal LLM for non-blocking RPA/Computer Use。为任意多模态大模型提供截图+百分比坐标网格,实现无感无阻塞的RPA和电脑使用

agent claude-code computer-use openclaw python rpa skills vision-language-model

Last synced: 13 Apr 2026

https://github.com/SALT-NLP/PopupAttack

Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups

attack claude-3-5-sonnet computer-use llm-agent pop-up vision-language-model

Last synced: 23 Feb 2025

https://github.com/ghostwright/ghost-os

Full computer-use for AI agents. Self-learning workflows. Native macOS. No screenshots required.

accessibility ai-agents automation claude-code computer-use llm-tools macos mcp recipes swift

Last synced: 10 Mar 2026

https://github.com/philfung/awesome-computer-use

Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.

anthropic anthropic-claude computer-use computer-vision gpt-4-vision gui-agents llm rpa rpa-robotic-process-automation tool-use vision

Last synced: 27 Jan 2026

https://github.com/iris-networks/iris

This is the crud backend for our QA test application

ai automation computer-use qa-automation-test

Last synced: 10 Oct 2025

https://github.com/sawyerhood/computer-use-extension

This is OpenAI's computer use hooked up to a chrome extension.

ai chrome-extension computer-use llm openai

Last synced: 11 Mar 2026

https://github.com/zubax/bro

An LLM computer-using agent (CUA) designed to autonomously perform mundane tasks related to business operations and administration, such as doing accounting, filing paperwork, and submitting applications. The accountant is not your bro, but Bro is.

agent agentic-ai automation computer-use computer-use-agent llm no-code-automation nocode office-automation

Last synced: 10 Oct 2025

https://github.com/ab498/computer-control-mcp

MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies.

automation computer-use mcp

Last synced: 29 Jun 2025

https://github.com/webhiveos/webhive

Meet WebHive, the AI-powered browser that takes care of tasks for you. No more endless clicks, tell it what you need, and it gets it done.

agent agent-framework assistant chagpt chatgpt-app chatgpt-operator claude computer-use gca gpt gpt-4o langchain llms mcp model-context-protocol openai

Last synced: 13 Apr 2025

https://github.com/archivebox/abx-spec-behaviors

🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser environments, puppeteer, playwright, extensions, AI tools, and many other contexts with minimal adjustment.

abx archivebox automation behaviors browser browsertrix-behaviors claude computer-use crawling digipres ecosystem greasemonkey playwright plugins puppeteer rfp scraping specification tampermonkey tool-use

Last synced: 01 Sep 2025

https://github.com/justmalhar/claude-ubuntu-os

Claude Computer Use API with Ubuntu that enables Claude to interact with and automate desktop environments. It allows seamless command execution through VNC or noVNC, enhancing productivity with secure, containerized workflows with Github Codespaces.

agents ai anthropic claude computer-use github-codespaces-cde large-language-models ubuntu vnc-viewer

Last synced: 13 Apr 2025

https://github.com/AB498/computer-control-mcp

MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime. Similar to 'computer-use' by Anthropic. With Zero External Dependencies.

automation computer-use mcp

Last synced: 17 Jun 2025

https://github.com/e2b-dev/computer-use-app

A web playground for a secure and open source computer use. Powered by E2B.

ai computer-use llama3 open-source qwen

Last synced: 15 Apr 2025

https://github.com/pmbstyle/fara-agent

A local browser automation agent based on Microsoft Fara-7B model optimized for LM Studio inference.

ai-agents browser-automation browser-use computer-use fara lm-studio playwright vision-agents

Last synced: 16 Apr 2026

https://github.com/lx-0/computer-use-nodejs-demo

🤖 LLM-powered computer control through local and Docker environments. Features VNC integration, automated interactions, and a chat interface for natural language system control.

ai computer-use docker function-calling llm

Last synced: 16 Aug 2025

https://github.com/iris-networks/gpt-agent

Fully self hosted chatgpt agent alternative

browser-use chatgpt-agent computer-use cua

Last synced: 26 Jul 2025

https://github.com/haltakov/browsafex

Web interface for the Gemini 2.5 Computer Use model

agent ai computer-use computer-use-agent

Last synced: 16 Jan 2026

https://github.com/aadya940/orbit

Building Blocks to automate desktop workflows end-to-end using AI

agentic-ai ai ai-agent automation browser-automation computer-use cua desktop-automation operating-system python

Last synced: 23 Apr 2026

https://github.com/nicholasoxford/computer-use-mac-demo

Anthropic's computer use controlling a Macbook

anthropic claude computer-use

Last synced: 08 May 2025

https://github.com/pnmartinez/computer-use-android-app

🎤📱 Control your desktop PC with voice from an Android app! This is an Android client for the Simple Computer Use. Install Simple Computer Use in link below.

automation computer-use ocr ollama voice

Last synced: 30 Aug 2025

https://github.com/presidio-oss/factif-ai

AI-powered computer control for automated testing. FactifAI uses vision models (Claude, GPT-4o, Gemini) to interact with applications naturally - clicking, typing, and verifying results just like a human would.

anthropic automated-testing automation bedrock claude computer-use docker-vnc factif-ai gpt-4o hai human-ai omniparser puppeteer testing

Last synced: 26 Sep 2025

https://github.com/openadaptai/openadapt-ml

OpenAdapt’s open-source ML toolkit for training and evaluating general multimodal GUI-action models.

computer-use gui-automation machine-learning openadapt python vlm

Last synced: 03 Mar 2026

https://github.com/anonymitaet/gacua_preview

The World's First Out-of-the-Box Computer Use Agent Powered by Gemini-CLI @openmule

agent ai computer-use gacua mcp

Last synced: 10 Oct 2025

https://github.com/vcaesar/robotgo-pro

RobotGo-Pro, multi langs native cross-platform RPA, GUI automation, Auto test and Computer use

ai auto-test automation computer-use javascript js lua opencv python robot rpa

Last synced: 13 Jan 2026

https://github.com/rajaniraiyn/ccu

Anthropic's Computer Use tools within VSCode

ai anthropic anthropic-claude claude computer-use llm vscode vscodeextension

Last synced: 27 Mar 2025

https://github.com/nottelabs/open-operator-evals

Opensource benchmark evaluating web operators/agents performance

ai-agents ai-tools browser-automation browser-use computer-use cua llm notte web-agent

Last synced: 24 Dec 2025

https://github.com/plyght/superctrl

superctrl is an ai automation daemon for macOS.

computer-use macos openai superwhisper voice-control

Last synced: 01 Mar 2026

https://github.com/techgniouss/pdagent

Your PC in your pocket — a Telegram bot for remote control, Gemini AI automation, and developer tools.

ai-agent automation computer-use computer-vision gemini-ai python python-telegram-bot remote-control rpa telegram-bot tesseract-ocr ui-automation windows

Last synced: 15 Apr 2026

https://github.com/emrek0ca/elyan

Local-first AI operator with zero-permission sandboxing, evidence-backed execution, and investor-ready docs.

agent ai automation computer-use sandbox

Last synced: 26 Apr 2026

https://github.com/osmandkitay/odk-shell

ODK: An open-source AI shell to control your computer with natural language.

ai computer-use local-models python rust tauri

Last synced: 14 Apr 2026

https://github.com/agent-sandbox/agent-sandbox

Agent-sandbox is an enterprise-grade ai-first, cloud-native runtime environment for AI Agents. Allows Agents to securely run untrusted LLM-generated Code, Browser use, Computer use, and Shell commands etc. with stateful, long-running, multi-session and multi-tenant.

agent agent-sandbox ai-infra ai-sandbox browser-use code-executor computer-use container mcp sandbox

Last synced: 13 Jan 2026

https://github.com/samestrin/chromium-screenshots

Vision AI "Cortex" for Agents. A Playwright-based MCP Server & API that captures screenshots with ground-truth DOM extraction and full auth state injection. Containerized.

ai-agents automation computer-use docker-image dom-extraction headless-chrome llm-tools mcp-server ocr playwright-python python-fastapi scraping screenshot-api vision-ai zero-drift

Last synced: 13 Jan 2026

https://github.com/mubashir1osmani/m4

build custom asics and fpga's using llms.

ai chip-design computer-use gpu hardware-designs llm

Last synced: 22 Aug 2025

https://github.com/mihonarium/food_ordering_agent

Use an LLM agent to automate ordering food and other items from Deliveroo, Uber Eats, DoorDash, etc.

agent-based agents amazon api assistant assistant-chat-bots assistants-api computer-use computer-using-agent computeruse deliveroo doordash home-assistant llm llm-agents ubereats

Last synced: 29 Jul 2025

https://github.com/montbrain/vadgr-computer-use

MCP server for desktop automation. Accessibility-first (UIA/AT-SPI/AX) with vision fallback. Local, on-device, CPU-friendly.

accessibility agent automation computer-use mcp

Last synced: 26 Apr 2026

https://github.com/syedazharmbnr1/computer-use-mcp

macOS Computer Use MCP Server - 33 tools for screen control via Model Context Protocol. Works with Claude Code, Cursor, LM Studio, Ollama, llama.cpp, MLX, and all MCP clients.

ai-tools automation claude-code computer-use cursor llama-cpp lm-studio macos mcp mcp-server mlx model-context-protocol ollama

Last synced: 27 Apr 2026

https://github.com/passerby2049/macpilot-mcp

MCP server that lets Claude Code drive macOS apps — screenshots, accessibility tree, clicks, keyboard, and menu bar. Inspired by Lakr233/ComputerUse.

accessibility agent-tool anthropic automation claude claude-code computer-use macos mcp mcp-server model-context-protocol screencapturekit swift ui-automation

Last synced: 26 Apr 2026

https://github.com/phact/agentsitter

A babysitter for your AI agents

agents ai browser-use computer-use

Last synced: 19 Apr 2026

https://github.com/gabrielvuksani/wotann

WOTANN — The All-Father of AI agent harnesses. One install. Every model. Every channel. Full autonomy.

agent-framework ai-agent anthropic claude computer-use free-tier harness llm mcp norse ollama openai provider-agnostic swift tauri typescript

Last synced: 20 Apr 2026