{"id":50069968,"url":"https://github.com/vitalops/opendesk","last_synced_at":"2026-05-22T02:35:41.669Z","repository":{"id":356620975,"uuid":"1232239020","full_name":"vitalops/opendesk","owner":"vitalops","description":"Control 1 or more machines using computer use tools that integrates with your agents","archived":false,"fork":false,"pushed_at":"2026-05-18T17:43:50.000Z","size":9365,"stargazers_count":65,"open_issues_count":1,"forks_count":15,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-05-18T18:36:22.856Z","etag":null,"topics":["claude-code","computer-use","desktop-control","mcp"],"latest_commit_sha":null,"homepage":"https://vitalops.ai/opendesk/docs/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vitalops.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-07T18:20:29.000Z","updated_at":"2026-05-18T17:02:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vitalops/opendesk","commit_stats":null,"previous_names":["vitalops/opencua"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vitalops/opendesk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitalops%2Fopendesk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitalops%2Fopendesk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitalops%2Fopendesk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitalops%2Fopendesk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vitalops","download_url":"https://codeload.github.com/vitalops/opendesk/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vitalops%2Fopendesk/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33325786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-21T12:23:38.849Z","status":"online","status_checked_at":"2026-05-22T02:00:06.671Z","response_time":265,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude-code","computer-use","desktop-control","mcp"],"created_at":"2026-05-22T02:35:36.739Z","updated_at":"2026-05-22T02:35:41.660Z","avatar_url":"https://github.com/vitalops.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# opendesk\n\n**Give any AI agent eyes and hands on your desktop.**\n\nOpendesk is a computer use framework that lets AI agents navigate your computer just like a human would — screenshots, mouse, keyboard, UI interaction, OCR, workflow recording, scheduling, and remote machine control.\n\n**macOS · Linux · Windows**\n\n[![PyPI](https://img.shields.io/pypi/v/opendesk?label=pypi%20opendesk)](https://pypi.org/project/opendesk/)\n[![npm](https://img.shields.io/npm/v/@vitalops/opendesk-sdk?label=npm%20opendesk-sdk)](https://www.npmjs.com/package/@vitalops/opendesk-sdk)\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Docs](https://img.shields.io/badge/docs-vitalops.github.io-blue)](https://vitalops.github.io/opendesk/docs/)\n\n\u003c/div\u003e\n\n---\n\n![opendesk demo](docs/opendesk_demo.gif)\n\n---\n\n## SDKs\n\n| Language | Location | Package | Install |\n|----------|----------|---------|---------|\n| Python | [`python/`](python/) | `opendesk` (PyPI) | `pip install 'opendesk[core,mcp]'` |\n| JavaScript / TypeScript | [`js/`](js/) | `@vitalops/opendesk-sdk` (npm) | `npm install @vitalops/opendesk-sdk` |\n\nMore SDKs can be added to this repo following the same pattern.\n\n---\n\n## MCP install\n\nopendesk works as an MCP server with any MCP-compatible client — Claude Code, Claude Desktop, Cursor, Windsurf, Continue, or any custom tool.\n\n### Python\n\n```bash\npip install 'opendesk[core,mcp]'\nopendesk install        # shortcut for Claude Code\n```\n\n\u003e Requires Python 3.10+\n\n### JavaScript / TypeScript\n\n```bash\nnpm install @vitalops/opendesk-sdk\nnpx opendesk-js install        # shortcut for Claude Code\n```\n\n### Other MCP clients (Cursor, Windsurf, Continue, custom)\n\nPoint your client at the `opendesk-mcp` binary:\n\n```json\n{\n  \"mcpServers\": {\n    \"opendesk\": { \"command\": \"opendesk-mcp\" }\n  }\n}\n```\n\nFor JS:\n\n```json\n{\n  \"mcpServers\": {\n    \"opendesk\": {\n      \"command\": \"node\",\n      \"args\": [\"/path/to/node_modules/@vitalops/opendesk-sdk/bin/opendesk-mcp.js\"]\n    }\n  }\n}\n```\n\nOnce connected, try:\n\n```\nTake a screenshot of my screen\nClick the Chrome icon\nOpen Spotify and play lo-fi beats\nShow me the audit log\nReplay everything from this session\n```\n\n---\n\n## SDK usage\n\nUse opendesk programmatically in your own agent or app.\n\n### Python\n\n```python\nfrom opendesk import create_registry, allow_all_context\n\nregistry = create_registry()\nctx = allow_all_context()\n\nresult = await registry.get(\"screenshot\").execute(ctx, ...)\n```\n\n### JavaScript / TypeScript\n\n```typescript\nimport { OpenDeskClient } from \"@vitalops/opendesk-sdk\";\n\nconst client = new OpenDeskClient();\nawait client.screenshot({ marks: true });\nawait client.ui({ action: \"click\", app: \"Safari\", title: \"Go\" });\n```\n\n---\n\n## Architecture\n\nopendesk is built in independently-importable layers:\n\n```\n┌──────────────────────────────────────────────────────────────┐\n│  Integrations   MCP  ·  Claude Code  ·  OpenAI  ·  LangChain │\n├──────────────────────────────────────────────────────────────┤\n│  Tools          screenshot · mouse · keyboard · ui ·         │\n│                 clipboard · ocr · learn · schedule · audit   │\n├──────────────────────────────────────────────────────────────┤\n│  Computer       LocalComputer  ·  RemoteComputer  (ABC)      │\n├──────────────────────────────────────────────────────────────┤\n│  Remote         server · client · discovery (mDNS)           │\n├──────────────────────────────────────────────────────────────┤\n│  Protocol       frames · codec (msgpack) · peer · transports │\n│                 auth (X25519 + AEAD, pairing)                │\n└──────────────────────────────────────────────────────────────┘\n```\n\n| Layer | What it does |\n|-------|-------------|\n| **Computer** | The capability surface of a computer (observe / act / subscribe). `LocalComputer` drives the local machine; `RemoteComputer` forwards every call over the wire to a paired peer. Tools and integrations target this ABC — they never know whether the machine is local or remote. |\n| **Tools** | One class per capability, agent-friendly Pydantic schemas. Calls into the active `Computer` on the `ToolContext`. |\n| **Integrations** | Thin adapters for MCP, Anthropic, OpenAI, LangChain — add one tool, get all four. |\n| **Remote** | `opendesk serve` / `opendesk pair`, mDNS discovery, client helper. |\n| **Protocol** | Five-frame wire protocol (msgpack binary, no base64 ever), WebSocket transport, mutual X25519 + ChaCha20-Poly1305 auth and encryption. |\n| **Automation** | `learn` + `schedule` backed by pynput recording, JSON storage, APScheduler daemon. |\n\nFull details → [docs/architecture.md](docs/architecture.md)\n\n---\n\n## Tools\n\n| Tool | What it does |\n|------|-------------|\n| `screenshot` | Capture the screen with numbered boxes on every clickable element (Set-of-Marks) |\n| `ui` | Click and type by element name — no coordinates needed |\n| `mouse` | Pixel-level mouse control for anything `ui` can't reach |\n| `keyboard` | Type text, press keys, send hotkeys |\n| `app` | Open, close, and focus applications |\n| `clipboard` | Read and write the system clipboard |\n| `ocr` | Extract text from any region of the screen |\n| `learn` | Record a workflow once, replay it anytime |\n| `schedule` | Run any task or learned procedure on a timer |\n\nFull reference → [docs/tools.md](docs/tools.md)\n\n---\n\n## Automation\n\nRecord a task once, replay it forever, or put it on a schedule.\n\n**Record**\n```\n\"Start recording task expense-form\"\n```\nPerform the workflow yourself. The agent captures every click, keystroke, and screenshot.\n\n**Replay**\n```\n\"Stop recording\"\n\"Replay expense-form\"\n```\nThe agent re-executes using the current screen state — no hardcoded coordinates.\n\n**Schedule**\n```\n\"Every morning at 9am, open my email in Chrome, take a screenshot, and summarize what's there\"\n\"Schedule expense-form every friday at 5pm\"\n```\n```bash\nopendesk scheduler start\n```\n\nSupported timing: `every 30m` · `every 2h` · `every day at 09:00` · `every friday at 17:00` · raw cron\n\nFull guide → [docs/automation.md](docs/automation.md)\n\n---\n\n## Remote computer use\n\nControl another machine from your agent — same tools, same MCP server, the\n`Computer` abstraction just lives on the other end of an encrypted WebSocket.\n\n**On the machine being controlled** (one time):\n\n```bash\npip install 'opendesk[core,remote]'\nopendesk pair        # prints a 6-digit code, listens\n```\n\n**On the controller** (one time):\n\n```bash\npip install 'opendesk[remote]'\nopendesk discover                          # list opendesk peers on the LAN\nopendesk pair-with \u003chost\u003e \u003ccode\u003e --name mini\n```\n\n**After pairing**, the controlled machine runs the long-lived server:\n\n```bash\nopendesk serve            # accepts paired peers only\n```\n\n…and the controller drives it through the existing MCP server (Claude Code,\nClaude Desktop, Cursor — anything that speaks MCP). The agent gets new admin\ntools — `opendesk_peers`, `opendesk_use`, `opendesk_status` — and every\nexisting tool accepts an optional `peer:` argument:\n\n```\nscreenshot                       → controls the local machine\nscreenshot peer=mini             → controls the paired remote\nopendesk_use mini                → make mini the default for this session\nscreenshot                       → [on mini] ...\n```\n\nWith exactly one paired peer the agent doesn't have to specify anything —\nit becomes the implicit default. With multiple, the agent must pick\nexplicitly (no silent fallback).\n\n**One controller at a time.** Pair as many machines as you like, but only\none drives the desktop at a time — a second peer trying to connect while\none is active gets a clean `BUSY` error. Same peer reconnecting bumps\nthe previous session (no waiting out a stale TCP). Two ways to free the\nslot from the controlled machine:\n\n- `opendesk disconnect` — **cooperative**. Server asks the controller to\n  leave via a `session.evicted` PUSH; a cooperative client (the in-tree\n  `RemoteComputer`) suppresses its auto-reconnect and raises\n  `SessionEvicted`. Trust is preserved.\n- `opendesk unpair \u003cname\u003e` — **enforced**. Revokes trust + closes the\n  session; next reconnect fails authentication.\n\n**Security model:** pairing exchanges long-lived X25519 keypairs via a 6-digit\ncode-authenticated handshake (PBKDF2-stretched, ~CPU-month to brute force).\nSubsequent connections use mutual static-key authentication. Every frame is\nChaCha20-Poly1305 AEAD-encrypted with per-direction counters. No CA-signed\ncertificates required — the keys ARE the trust.\n\nFull guide → [docs/remote.md](docs/remote.md)\n\n---\n\n## Installation options\n\n```bash\npip install opendesk                              # core framework only\npip install 'opendesk[core,mcp]'                  # + screen capture + MCP server (recommended)\npip install 'opendesk[core,mcp,remote]'           # + control another machine over LAN\npip install 'opendesk[core,mcp,learn]'            # + task recording and replay\npip install 'opendesk[core,mcp,learn,schedule]'   # + scheduled tasks\npip install 'opendesk[all]'                       # everything\n```\n\n---\n\n## Platform support\n\n| Feature | macOS | Linux | Windows |\n|---------|:-----:|:-----:|:-------:|\n| Screenshot | ✓ | ✓ | ✓ |\n| Mouse \u0026 keyboard | ✓ | ✓ | ✓ |\n| UI element access | AppleScript | AT-SPI2 | UI Automation |\n| Clipboard | pbcopy/pbpaste | xclip/xsel | pyperclip |\n| OCR | Vision / tesseract | tesseract | WinRT / tesseract |\n| App control | `open -a` | `xdg-open` | `start` |\n| Task recording | ✓ | ✓ | ✓ |\n| Scheduled tasks | ✓ | ✓ | ✓ |\n| Remote control (LAN) | ✓ | ✓ | ✓ |\n| LAN discovery (mDNS) | ✓ | ✓ | ✓ |\n\n---\n\n## System permissions\n\n### macOS\n- **System Settings → Privacy \u0026 Security → Screen Recording** — enable for your terminal\n- **System Settings → Privacy \u0026 Security → Accessibility** — enable for mouse/keyboard control\n\n### Linux\n```bash\nsudo apt install xclip xdotool python3-atspi\n```\n\n### Windows\nNo extra permissions needed — opendesk uses Win32 APIs by default.\n\nSee [docs/permissions.md](docs/permissions.md) for full setup guide.\n\n---\n\n## Integrations\n\n### Claude Code\n```bash\nopendesk install        # registers opendesk-mcp globally\nopendesk uninstall      # removes the registration\n```\n\n### Claude Desktop\n\nAdd to your config file:\n- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`\n- **Windows**: `%APPDATA%\\Claude\\claude_desktop_config.json`\n- **Linux**: `~/.config/Claude/claude_desktop_config.json`\n\n```json\n{\n  \"mcpServers\": {\n    \"opendesk\": { \"command\": \"opendesk-mcp\" }\n  }\n}\n```\n\n### Python API\n\n```python\nimport asyncio\nfrom opendesk import create_registry, allow_all_context\n\nasync def main():\n    registry = create_registry()\n    ctx = allow_all_context()\n\n    result = await registry.get(\"screenshot\").execute(\n        ctx, registry.get(\"screenshot\").Params(marks=True)\n    )\n    print(result.output)\n\nasyncio.run(main())\n```\n\nWorks with Anthropic SDK, OpenAI, and LangChain — see [docs/integrations.md](docs/integrations.md)\n\n### On-device models (Ollama, LM Studio, vLLM, llama.cpp)\n\nAny OpenAI-compatible local server works out of the box:\n\n```python\nfrom openai import OpenAI\nfrom opendesk.integrations.openai_compat import OpenAIAdapter\n\nclient = OpenAI(base_url=\"http://localhost:11434/v1\", api_key=\"ollama\")\nadapter = OpenAIAdapter()\nresult = await adapter.run_loop(client, model=\"qwen2.5:72b\", messages=messages)\n```\n\n---\n\n## Citation\n\nIf you use opendesk in your research or project, please cite it:\n\n```bibtex\n@software{opendesk,\n  author  = {Abraham, Abhigith Neil and Rahman, Fariz and Rahman, Fadil},\n  title   = {opendesk: Open Desktop Automation Framework},\n  year    = {2026},\n  url     = {https://github.com/vitalops/opendesk},\n  version = {0.2.0},\n  license = {MIT}\n}\n```\n\nA `CITATION.cff` is included — GitHub's \"Cite this repository\" button will pick it up automatically.\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvitalops%2Fopendesk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvitalops%2Fopendesk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvitalops%2Fopendesk/lists"}