https://github.com/montbrain/vadgr-computer-use

MCP server for desktop automation. Accessibility-first (UIA/AT-SPI/AX) with vision fallback. Local, on-device, CPU-friendly.
https://github.com/montbrain/vadgr-computer-use

accessibility agent automation computer-use mcp

Last synced: about 2 months ago
JSON representation

MCP server for desktop automation. Accessibility-first (UIA/AT-SPI/AX) with vision fallback. Local, on-device, CPU-friendly.

Host: GitHub
URL: https://github.com/montbrain/vadgr-computer-use
Owner: MONTBRAIN
License: apache-2.0
Created: 2026-04-19T19:55:51.000Z (about 2 months ago)
Default Branch: master
Last Pushed: 2026-04-19T20:32:22.000Z (about 2 months ago)
Last Synced: 2026-04-19T22:13:17.621Z (about 2 months ago)
Topics: accessibility, agent, automation, computer-use, mcp
Language: Python
Homepage: https://github.com/MONTBRAIN/vadgr
Size: 18.7 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Notice: NOTICE

Awesome Lists containing this project

README

          # vadgr-computer-use

Local MCP server for desktop automation. 13 tools for capture, mouse, keyboard, and platform introspection. The calling agent takes a screenshot, reasons over the pixels, and drives mouse/keyboard through the server.

Tested with **Claude Code**, **Codex CLI**, and **Gemini CLI** (same server, same tools, same prompt).

> **Platforms:** works on **Linux (X11 and Wayland incl. GNOME)**, **Windows native**, and **WSL2**. **macOS support is a work in progress** and not usable yet. See [Platform support](#platform-support) for detail.

---

## Install

```bash

pip install vadgr-computer-use

```

That ships a console script called `vadgr-cua`. Verify:

```bash

vadgr-cua doctor

# {"daemon_running": false, "windows_python": null, "port": 19542, ...}

```

On WSL2, the bridge daemon auto-launches the first time a tool is called. On other platforms it's a no-op; direct backends handle everything.

---

## Wire it into your agent

Pick your client. The server command is `vadgr-cua --transport stdio` in every case. Each agent launches that stdio process itself, so it needs the full path to the binary unless `vadgr-cua` is already on the agent's `PATH`.

First, find the path:

```bash

which vadgr-cua

# global install: /home/you/.local/bin/vadgr-cua

# venv install:  /path/to/.venv/bin/vadgr-cua

```

Substitute that path in each config below.

### Claude Code

Project-level (`.mcp.json` at the repo root you want to automate from):

```json

{

  "mcpServers": {

    "vadgr-computer-use": {

      "type": "stdio",

      "command": "/path/to/vadgr-cua",

      "args": ["--transport", "stdio"]

    }

  }

}

```

User-level (add to `~/.claude.json` under `mcpServers` with the same shape).

Verify: `claude mcp list` should print `vadgr-computer-use: ... ✓ Connected`.

### Codex CLI

Add to `~/.codex/config.toml`:

```toml

[mcp_servers.vadgr-computer-use]

command = "/path/to/vadgr-cua"

args = ["--transport", "stdio"]

```

Verify: `codex mcp list` should list `vadgr-computer-use` with status `enabled`.

### Gemini CLI

```bash

gemini mcp add --scope user --trust \

  vadgr-computer-use /path/to/vadgr-cua \

  -- --transport stdio

```

That writes `~/.gemini/settings.json`. Verify by running an interactive session: Gemini shows MCP tool calls inline.

---

## Try it

Once the wire-up is done, any of these commands launch the client, which starts `vadgr-cua --transport stdio` in the background via MCP, and drives your desktop. Same prompt, same tools: pick the client you already use.

**Sanity check (focus + Ctrl+A):**

```

Take a screenshot, tell me in one sentence what application is in focus,

then press Ctrl+A and take another screenshot to confirm the action.

```

### Claude Code

Interactive (most common):

```bash

claude --dangerously-skip-permissions

# then paste the prompt at the > cursor

```

Headless one-shot:

```bash

claude --dangerously-skip-permissions -p \

  "Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again."

```

### Codex CLI

Headless one-shot (the usual way to drive Codex):

```bash

codex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check \

  "Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again."

```

Expected output (abbreviated):

```

mcp: vadgr-computer-use/screenshot (completed)

mcp: vadgr-computer-use/key_press (completed)

mcp: vadgr-computer-use/screenshot (completed)

The focused app is <...>; Ctrl+A selected its content.

```

### Gemini CLI

Works end-to-end, but pixel grounding on full-screen shots is weaker than Claude/Codex: first-attempt clicks on small targets can miss by 20-60 px (the model usually recovers via `screenshot_region` crops). **Pass the model explicitly**, since the default may silently fall back to an older Gemini on some accounts:

```bash

gemini -m gemini-3.1-pro-preview -p \

  "Use only vadgr-computer-use tools. Take a screenshot, tell me what app is in focus, then press Ctrl+A and screenshot again." \

  -y --allowed-mcp-server-names vadgr-computer-use

```

---

## Fuller example: play a song on YouTube Music (Codex)

A Chrome window is already open with a "YouTube Music" tab. One call:

```bash

codex exec --dangerously-bypass-approvals-and-sandbox --skip-git-repo-check \

  "Use only vadgr-computer-use MCP tools. In the already-open Chrome,

   switch to the YouTube Music tab, search 'Space Oddity David Bowie',

   and play the first result."

```

Real transcript (trimmed):

```

mcp: vadgr-computer-use/screenshot (completed)

mcp: vadgr-computer-use/click (completed)        # YouTube Music tab

mcp: vadgr-computer-use/click (completed)        # search box

mcp: vadgr-computer-use/type_text (completed)

mcp: vadgr-computer-use/key_press (completed)    # enter

mcp: vadgr-computer-use/click (completed)        # first result

mcp: vadgr-computer-use/click (completed)        # dismiss ad overlay

mcp: vadgr-computer-use/screenshot (completed)   # verify now-playing bar

Yes, "Space Oddity" by David Bowie is now playing.

```

---

## How it works

The LLM owns the "where to click" decision; the server owns "how to click it precisely". No other abstraction in between.

## Platform support

| Platform | Screenshots | Mouse / keyboard | Install notes |

|----------|-------------|------------------|----------------|

| Linux / X11 | `mss` | `xdotool` | `apt install xdotool` (or distro equivalent) |

| Linux / Wayland (GNOME) | `gnome-screenshot` | Mutter RemoteDesktop via `jeepney` | nothing extra; pre-installed on stock GNOME, deps pulled by pip |

| Linux / Wayland (Sway, Hyprland, wlroots) | `grim` | `evdev` | `apt install grim`; `sudo usermod -aG input $USER` then re-login |

| Windows native | Win32 GDI | SendInput | nothing extra |

| WSL2 → Windows host | TCP bridge daemon (`mss` on Windows) | TCP bridge daemon (Win32 `SendInput`) | bridge daemon auto-launches |

| macOS | `screencapture` | `osascript` / `cliclick` | WIP, not functional yet |

`pip install vadgr-computer-use` pulls `jeepney` and `evdev` automatically on Linux (both are pure-Python or shipped as wheels, no `libdbus-1-dev` or compilation needed). Foreground-window detection on Wayland uses AT-SPI2 if available; install with `pip install vadgr-computer-use[linux-atspi]` to enable it.

If the WSL2 daemon can't start (e.g. no Windows Python available), the server falls back to a slower PowerShell path. See [Daemon management](#daemon-management-wsl2) below.

## MCP tools (13)

Capture (2)

- `screenshot()`: full screen, downscaled to `CU_MAX_WIDTH` (auto-picks 1024 / 1280 / 1366).

- `screenshot_region(x, y, w, h)`: cropped region.

Input (8)

- `click(x, y)` / `double_click(x, y)` / `right_click(x, y)`

- `move_mouse(x, y)` / `drag(start_x, start_y, end_x, end_y, duration=0.5)`

- `scroll(x, y, amount)`: positive = up, negative = down

- `type_text(text)` / `key_press(keys)`: keys like `ctrl+s`, `alt+tab`, `enter`

Platform info (3)

- `get_platform()` / `get_platform_info()` / `get_screen_size()`

## Daemon management (WSL2)

Most users never touch this. For when you do:

```bash

vadgr-cua doctor           # JSON: platform, Windows Python, daemon state, port, hash

vadgr-cua install-daemon   # Eager deploy + launch

vadgr-cua stop-daemon      # Kill the running daemon

vadgr-cua restart-daemon   # Stop then start

```

The daemon file is deployed to `%USERPROFILE%\vadgr\daemon.py` and listens on TCP `127.0.0.1:19542`. After `pip install -U vadgr-computer-use`, the next MCP session detects the version-hash drift via a `ping` handshake and redeploys the daemon automatically.

## Library usage

```python

from computer_use import ComputerUseEngine

engine = ComputerUseEngine()

shot = engine.screenshot()

engine.click(500, 300)

engine.type_text("hello")

```

The library is just the input/capture primitives, no LLM or agent loop inside. To drive it with a model, point an MCP client (Claude Code, Codex, Gemini, or your own) at the `vadgr-cua` server as shown above.

## Environment

| Variable | Purpose |

|----------|---------|

| `CU_MAX_WIDTH` | Override screenshot downscale target (default: auto 1024/1280/1366) |

| `CUE_BRIDGE_PORT` | Override WSL2 bridge daemon TCP port (default: 19542) |

| `VADGR_DEBUG` | Set to `1` to dump screenshots to `/.debug/` |

## Tests

```bash

pip install -e ".[dev]"

pytest computer_use/tests -q

```

## License

Apache 2.0. See `LICENSE`.

## Part of Vadgr

- [vadgr](https://github.com/MONTBRAIN/vadgr): workflow engine (brain)

- **[vadgr-computer-use](https://github.com/MONTBRAIN/vadgr-computer-use)**: desktop automation MCP (eyes)

- [vadgr-agent-os](https://github.com/MONTBRAIN/vadgr-agent-os): containerized agent runtime

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/montbrain/vadgr-computer-use

Awesome Lists containing this project

README