{"id":49292372,"url":"https://github.com/montbrain/vadgr-computer-use","last_synced_at":"2026-04-26T01:01:29.389Z","repository":{"id":352493336,"uuid":"1215350293","full_name":"MONTBRAIN/vadgr-computer-use","owner":"MONTBRAIN","description":"MCP server for desktop automation. Accessibility-first (UIA/AT-SPI/AX) with vision fallback. Local, on-device, CPU-friendly.","archived":false,"fork":false,"pushed_at":"2026-04-19T20:32:22.000Z","size":19566,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-19T22:13:17.621Z","etag":null,"topics":["accessibility","agent","automation","computer-use","mcp"],"latest_commit_sha":null,"homepage":"https://github.com/MONTBRAIN/vadgr","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MONTBRAIN.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-19T19:55:51.000Z","updated_at":"2026-04-19T20:32:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/MONTBRAIN/vadgr-computer-use","commit_stats":null,"previous_names":["montbrain/vadgr-computer-use"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/MONTBRAIN/vadgr-computer-use","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MONTBRAIN%2Fvadgr-computer-use","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MONTBRAIN%2Fvadgr-computer-use/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MONTBRAIN%2Fvadgr-computer-use/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MONTBRAIN%2Fvadgr-computer-use/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MONTBRAIN","download_url":"https://codeload.github.com/MONTBRAIN/vadgr-computer-use/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MONTBRAIN%2Fvadgr-computer-use/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32282187,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"ssl_error","status_checked_at":"2026-04-25T18:29:32.149Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accessibility","agent","automation","computer-use","mcp"],"created_at":"2026-04-26T01:01:06.196Z","updated_at":"2026-04-26T01:01:29.374Z","avatar_url":"https://github.com/MONTBRAIN.png","language":"Python","readme":"# vadgr-computer-use\n\nLocal MCP server for desktop automation. 13 tools for capture, mouse, keyboard, and platform introspection. The calling agent takes a screenshot, reasons over the pixels, and drives mouse/keyboard through the server.\n\nTested with **Claude Code**, **Codex CLI**, and **Gemini CLI** (same server, same tools, same prompt).\n\n\u003e **Platforms:** works on **Linux (X11 and Wayland incl. GNOME)**, **Windows native**, and **WSL2**. **macOS support is a work in progress** and not usable yet. See [Platform support](#platform-support) for detail.\n\n---\n\n## Install\n\n```bash\npip install vadgr-computer-use\n```\n\nThat ships a console script called `vadgr-cua`. Verify:\n\n```bash\nvadgr-cua doctor\n# {\"daemon_running\": false, \"windows_python\": null, \"port\": 19542, ...}\n```\n\nOn WSL2, the bridge daemon auto-launches the first time a tool is called. On other platforms it's a no-op; direct backends handle everything.\n\n---\n\n## Wire it into your agent\n\nPick your client. The server command is `vadgr-cua --transport stdio` in every case. Each agent launches that stdio process itself, so it needs the full path to the binary unless `vadgr-cua` is already on the agent's `PATH`.\n\nFirst, find the path:\n\n```bash\nwhich vadgr-cua\n# global install: /home/you/.local/bin/vadgr-cua\n# venv install:  /path/to/.venv/bin/vadgr-cua\n```\n\nSubstitute that path in each config below.\n\n### Claude Code\n\nProject-level (`.mcp.json` at the repo root you want to automate from):\n\n```json\n{\n  \"mcpServers\": {\n    \"vadgr-computer-use\": {\n      \"type\": \"stdio\",\n      \"command\": \"/path/to/vadgr-cua\",\n      \"args\": [\"--transport\", \"stdio\"]\n    }\n  }\n}\n```\n\nUser-level (add to `~/.claude.json` under `mcpServers` with the same shape).\n\nVerify: `claude mcp list` should print `vadgr-computer-use: ... ✓ Connected`.\n\n### Codex CLI\n\nAdd to `~/.codex/config.toml`:\n\n```toml\n[mcp_servers.vadgr-computer-use]\ncommand = \"/path/to/vadgr-cua\"\nargs = [\"--transport\", \"stdio\"]\n```\n\nVerify: `codex mcp list` should list `vadgr-computer-use` with status `enabled`.\n\n### Gemini CLI\n\n```bash\ngemini mcp add --scope user --trust \\\n  vadgr-computer-use /path/to/vadgr-cua \\\n  -- --transport stdio\n```\n\nThat writes `~/.gemini/settings.json`. Verify by running an interactive session: Gemini shows MCP tool calls inline.\n\n---\n\n## Try it\n\nOnce the wire-up is done, any of these commands launch the client, which starts `vadgr-cua --transport stdio` in the background via MCP, and drives your desktop. Same prompt, same tools: pick the client you already use.\n\n**Sanity check (focus + Ctrl+A):**\n\n```\nTake a screenshot, tell me in one sentence what application is in focus,\nthen press Ctrl+A and take another screenshot to confirm the action.\n```\n\n### Claude Code\n\nInteractive (most common):\n\n```bash\nclaude --dangerously-skip-permissions\n# then paste the prompt at the \u003e cursor\n```\n\nHeadless one-shot:\n\n```bash\nclaude --dangerously-skip-permissions -p \\\n  \"Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again.\"\n```\n\n### Codex CLI\n\nHeadless one-shot (the usual way to drive Codex):\n\n```bash\ncodex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check \\\n  \"Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again.\"\n```\n\nExpected output (abbreviated):\n\n```\nmcp: vadgr-computer-use/screenshot (completed)\nmcp: vadgr-computer-use/key_press (completed)\nmcp: vadgr-computer-use/screenshot (completed)\nThe focused app is \u003c...\u003e; Ctrl+A selected its content.\n```\n\n### Gemini CLI\n\nWorks end-to-end, but pixel grounding on full-screen shots is weaker than Claude/Codex: first-attempt clicks on small targets can miss by 20-60 px (the model usually recovers via `screenshot_region` crops). **Pass the model explicitly**, since the default may silently fall back to an older Gemini on some accounts:\n\n```bash\ngemini -m gemini-3.1-pro-preview -p \\\n  \"Use only vadgr-computer-use tools. Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again.\" \\\n  -y --allowed-mcp-server-names vadgr-computer-use\n```\n\n---\n\n## Fuller example: play a song on YouTube Music (Codex)\n\nA Chrome window is already open with a \"YouTube Music\" tab. One call:\n\n```bash\ncodex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check \\\n  \"Use only vadgr-computer-use MCP tools. In the already-open Chrome,\n   switch to the YouTube Music tab, search 'Space Oddity David Bowie',\n   and play the first result.\"\n```\n\nReal transcript (trimmed):\n\n```\nmcp: vadgr-computer-use/screenshot (completed)\nmcp: vadgr-computer-use/click (completed)        # YouTube Music tab\nmcp: vadgr-computer-use/click (completed)        # search box\nmcp: vadgr-computer-use/type_text (completed)\nmcp: vadgr-computer-use/key_press (completed)    # enter\nmcp: vadgr-computer-use/click (completed)        # first result\nmcp: vadgr-computer-use/click (completed)        # dismiss ad overlay\nmcp: vadgr-computer-use/screenshot (completed)   # verify now-playing bar\nYes, \"Space Oddity\" by David Bowie is now playing.\n```\n\n---\n\n## How it works\n\nThe LLM owns the \"where to click\" decision; the server owns \"how to click it precisely\". No other abstraction in between.\n\n## Platform support\n\n| Platform | Screenshots | Mouse / keyboard | Install notes |\n|----------|-------------|------------------|----------------|\n| Linux / X11 | `mss` | `xdotool` | `apt install xdotool` (or distro equivalent) |\n| Linux / Wayland (GNOME) | `gnome-screenshot` | Mutter RemoteDesktop via `jeepney` | nothing extra; pre-installed on stock GNOME, deps pulled by pip |\n| Linux / Wayland (Sway, Hyprland, wlroots) | `grim` | `evdev` | `apt install grim`; `sudo usermod -aG input $USER` then re-login |\n| Windows native | Win32 GDI | SendInput | nothing extra |\n| WSL2 → Windows host | TCP bridge daemon (`mss` on Windows) | TCP bridge daemon (Win32 `SendInput`) | bridge daemon auto-launches |\n| macOS | `screencapture` | `osascript` / `cliclick` | WIP, not functional yet |\n\n`pip install vadgr-computer-use` pulls `jeepney` and `evdev` automatically on Linux (both are pure-Python or shipped as wheels, no `libdbus-1-dev` or compilation needed). Foreground-window detection on Wayland uses AT-SPI2 if available; install with `pip install vadgr-computer-use[linux-atspi]` to enable it.\n\nIf the WSL2 daemon can't start (e.g. no Windows Python available), the server falls back to a slower PowerShell path. See [Daemon management](#daemon-management-wsl2) below.\n\n## MCP tools (13)\n\nCapture (2)\n- `screenshot()`: full screen, downscaled to `CU_MAX_WIDTH` (auto-picks 1024 / 1280 / 1366).\n- `screenshot_region(x, y, w, h)`: cropped region.\n\nInput (8)\n- `click(x, y)` / `double_click(x, y)` / `right_click(x, y)`\n- `move_mouse(x, y)` / `drag(start_x, start_y, end_x, end_y, duration=0.5)`\n- `scroll(x, y, amount)`: positive = up, negative = down\n- `type_text(text)` / `key_press(keys)`: keys like `ctrl+s`, `alt+tab`, `enter`\n\nPlatform info (3)\n- `get_platform()` / `get_platform_info()` / `get_screen_size()`\n\n## Daemon management (WSL2)\n\nMost users never touch this. For when you do:\n\n```bash\nvadgr-cua doctor           # JSON: platform, Windows Python, daemon state, port, hash\nvadgr-cua install-daemon   # Eager deploy + launch\nvadgr-cua stop-daemon      # Kill the running daemon\nvadgr-cua restart-daemon   # Stop then start\n```\n\nThe daemon file is deployed to `%USERPROFILE%\\vadgr\\daemon.py` and listens on TCP `127.0.0.1:19542`. After `pip install -U vadgr-computer-use`, the next MCP session detects the version-hash drift via a `ping` handshake and redeploys the daemon automatically.\n\n## Library usage\n\n```python\nfrom computer_use import ComputerUseEngine\n\nengine = ComputerUseEngine()\nshot = engine.screenshot()\nengine.click(500, 300)\nengine.type_text(\"hello\")\n```\n\nThe library is just the input/capture primitives, no LLM or agent loop inside. To drive it with a model, point an MCP client (Claude Code, Codex, Gemini, or your own) at the `vadgr-cua` server as shown above.\n\n## Environment\n\n| Variable | Purpose |\n|----------|---------|\n| `CU_MAX_WIDTH` | Override screenshot downscale target (default: auto 1024/1280/1366) |\n| `CUE_BRIDGE_PORT` | Override WSL2 bridge daemon TCP port (default: 19542) |\n| `VADGR_DEBUG` | Set to `1` to dump screenshots to `\u003cpackage\u003e/.debug/` |\n\n## Tests\n\n```bash\npip install -e \".[dev]\"\npytest computer_use/tests -q\n```\n\n## License\n\nApache 2.0. See `LICENSE`.\n\n## Part of Vadgr\n\n- [vadgr](https://github.com/MONTBRAIN/vadgr): workflow engine (brain)\n- **[vadgr-computer-use](https://github.com/MONTBRAIN/vadgr-computer-use)**: desktop automation MCP (eyes)\n- [vadgr-agent-os](https://github.com/MONTBRAIN/vadgr-agent-os): containerized agent runtime\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmontbrain%2Fvadgr-computer-use","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmontbrain%2Fvadgr-computer-use","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmontbrain%2Fvadgr-computer-use/lists"}