{"id":50055957,"url":"https://github.com/code-yeongyu/macos-cua","last_synced_at":"2026-05-21T13:13:54.574Z","repository":{"id":357749817,"uuid":"1238355517","full_name":"code-yeongyu/macos-cua","owner":"code-yeongyu","description":"Native macOS computer-use control with Codex-style background per-PID mouse + scroll + keyboard via SkyLight SPIs (TypeScript + Swift helper)","archived":false,"fork":false,"pushed_at":"2026-05-21T10:03:41.000Z","size":1412,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-05-21T12:17:33.887Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/code-yeongyu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-14T03:39:53.000Z","updated_at":"2026-05-21T10:03:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/code-yeongyu/macos-cua","commit_stats":null,"previous_names":["code-yeongyu/macos-cua"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/code-yeongyu/macos-cua","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-yeongyu%2Fmacos-cua","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-yeongyu%2Fmacos-cua/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-yeongyu%2Fmacos-cua/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-yeongyu%2Fmacos-cua/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/code-yeongyu","download_url":"https://codeload.github.com/code-yeongyu/macos-cua/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code-yeongyu%2Fmacos-cua/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33301803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-21T12:23:38.849Z","status":"ssl_error","status_checked_at":"2026-05-21T12:22:11.673Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-21T13:13:50.375Z","updated_at":"2026-05-21T13:13:54.559Z","avatar_url":"https://github.com/code-yeongyu.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# macos-cua\n\nNative macOS computer-use control, designed for the OpenAI computer-use action vocabulary. Host-native (CGEvent / ScreenCaptureKit-class) speed, no VM sandbox required.\n\n[![license: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Node.js \u003e=20](https://img.shields.io/badge/node-%3E%3D20-brightgreen.svg)](package.json)\n\n## Why this exists\n\nOpenAI Codex Computer Use is fast because it runs on the host with macOS-native APIs (ScreenCaptureKit, CoreGraphics, local MCP stdio). By contrast, [trycua/cua](https://github.com/trycua/cua) is portable but slow because of the multi-hop VM/HTTP/PIL pipeline: Python agent loop, 500 ms post-action screenshot delay, HTTP/WebSocket JSON to a guest FastAPI server, PIL encode, base64 SSE, client decode/re-encode. Codex removes the VM boundary and repeated image serialization; cua keeps it for sandbox isolation.\n\n`macos-cua` is the Codex-style local path with cua's clean platform abstraction, written in strict TypeScript. It gives you the same app-oriented `list_apps / get_app_state / click / type_text / press_key / scroll / drag` vocabulary that models expect, but executes directly on your Mac through native macOS APIs: `screencapture`/`sips` for screenshot capture, `koffi`-bound CoreGraphics for global input, Accessibility for app state/actions, and SkyLight/AppKit FFI for app-targeted window sessions. No Docker, no QEMU, no VNC, no helper binary, no cloud API key.\n\nThe design trade-off is documented in [`codex-cua-comparison.md`](./codex-cua-comparison.md). If you need strong VM isolation, use cua. If you need low-latency host-native control, use this.\n\n| | Codex | cua | macos-cua |\n|---|---|---|---|\n| Runs on | Host Mac | VM / container / cloud | Host Mac |\n| Needs VM | No | Yes (default) | No |\n| Needs API key | OpenAI only | Optional `CUA_API_KEY` for cloud | No |\n| Screenshot path | Native ScreenCaptureKit / IOSurface | PIL `ImageGrab` in guest | Native `screencapture` + `sips` fallback |\n| Input path | Native CGEvent / Apple Events | `pynput` in guest | CoreGraphics CGEvent via koffi + SkyLight/AppKit FFI for app-targeted windows |\n| Transport | Local MCP stdio | HTTP/WebSocket JSON + SSE | Local process / MCP stdio / pi extension |\n| Post-action delay | None reported | 500 ms default | None |\n| Isolation | macOS permissions + app scoping | VM / container sandbox | macOS permissions only |\n\n## Quickstart\n\n```bash\ngit clone \u003crepo\u003e\ncd macos-cua\npnpm install\npnpm --filter @macos-cua/core build\npnpm --filter @macos-cua/cli build\n./packages/cli/dist/cli.js --version\n./packages/cli/dist/cli.js screenshot -o /tmp/shot.png\n```\n\nExpected output:\n\n```text\n0.1.0\nScreenshot saved to /tmp/shot.png\n```\n\nIf the PNG is 0 bytes or black, grant Screen Recording permission to your terminal in **System Settings → Privacy \u0026 Security → Screen Recording**. See [`skills/macos-cua/references/installation.md`](./skills/macos-cua/references/installation.md) for the full permission walkthrough.\n\n## The four surfaces\n\n### CLI\n\nThe `macos-cua` binary is a thin `commander.js` wrapper over `MacOSHostComputer`.\n\n```bash\n# Screenshot (full screen or region)\nmacos-cua screenshot -o shot.png\nmacos-cua screenshot -o shot.png -x 100 -y 200 -w 800 -h 600\n\n# Click and type\nmacos-cua click -x 500 -y 300\nmacos-cua type \"Hello, world\"\n\n# Key chord\nmacos-cua key cmd --modifiers cmd,shift\n\n# Query state\nmacos-cua cursor\nmacos-cua screen\n```\n\nSample output:\n\n```text\nScreenshot saved to shot.png\nClicked at 500,300\nTyped: Hello, world\nPressed: command+shift+cmd\n1200,800\n2560x1600\n```\n\n### Per-PID targeting\n\nBy default, input events go to the globally focused application. If you want the agent to drive a specific app, call `get_app_state` for that app first or pass `--target-pid \u003cpid\u003e` after the app has a visible window. The host implementation caches the app window session and routes mouse, drag, keyboard, text, and scroll events through CoreGraphics plus SkyLight/AppKit FFI. If no visible target window is known, targeted input fails loudly instead of falling back to global cursor-moving input.\n\nExample: send a URL to Safari while Terminal stays focused:\n\n```bash\n# 1. get Safari's PID\nSAFARI_PID=$(pgrep -x Safari)\n\n# 2. focus Safari's address bar, type the URL, and press Return\n# each CLI call primes the visible target window before dispatch\nmacos-cua --target-pid \"$SAFARI_PID\" key l -m cmd\nmacos-cua --target-pid \"$SAFARI_PID\" type \"https://example.com\"\nmacos-cua --target-pid \"$SAFARI_PID\" key Return\n\n# click/scroll/drag Safari content while Slack stays frontmost\nmacos-cua --target-pid \"$SAFARI_PID\" click -x 500 -y 300\nmacos-cua --target-pid \"$SAFARI_PID\" scroll --direction down --amount 5\nmacos-cua --target-pid \"$SAFARI_PID\" drag --from-x 100 --from-y 100 --to-x 300 --to-y 300\n```\n\nIf `--target-pid` is used before a target window has been discovered, the command fails with a clear app-session error instead of falling back to the global path.\n\n### Per-PID mouse/scroll/keyboard architecture\n\n- **Global input** (no `--target-pid`) stays on the koffi CoreGraphics HID-tap path and remains backward compatible.\n- **Targeted mouse/drag** resolves a visible app window, creates AppKit-backed `CGEvent`s when a window is known, stamps target-window fields plus SkyLight field 40, activates the window without raising it, and posts through SkyLight plus the window owner's process serial number.\n- **Targeted keyboard** requires a remembered app window and uses SkyLight `SLSEventAuthenticationMessage` before `SLEventPostToPid`.\n- **Targeted text** uses per-character Unicode CGEvent payloads routed through the remembered app session.\n- **Targeted scroll** requires the same remembered app window and refuses to fall back to the global event tap.\n\n### MCP server\n\nSpawn the stdio MCP server and wire it to Claude Desktop, VS Code, or any MCP client:\n\n```bash\n# Build\npnpm --filter @macos-cua/mcp build\n\n# Run\n./packages/mcp/dist/server.js\n```\n\nClaude Desktop `claude_desktop_config.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"macos-cua\": {\n      \"command\": \"node\",\n      \"args\": [\"/absolute/path/to/packages/mcp/dist/server.js\"]\n    }\n  }\n}\n```\n\nVS Code `settings.json` (MCP extension):\n\n```json\n{\n  \"mcp.servers\": {\n    \"macos-cua\": {\n      \"type\": \"stdio\",\n      \"command\": \"node\",\n      \"args\": [\"/absolute/path/to/packages/mcp/dist/server.js\"]\n    }\n  }\n}\n```\n\nThe server exposes 9 Codex Computer Use tools. See the [Action surface](#action-surface) table below.\n\n### pi-extension\n\nInstall into a [pi coding agent](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent) session:\n\n```bash\npi install file://./packages/pi-extension\n```\n\nLoading the extension auto-enables native computer-use for Anthropic Messages and OpenAI Responses models. Anthropic requests receive the `computer-use-2025-01-24` native `computer` tool plus the required beta header/body fields and a short system prompt. OpenAI Responses requests receive only `{ \"type\": \"computer\" }` in `payload.tools` — no headers, no `extra_body`, and no extra system prompt. No configuration is required; advanced users can opt out of both providers with `MACOS_CUA_DISABLE_COMPUTER_USE_BETA=1` (`true`, `yes`, and `on` also work).\n\nThe extension resolves the host display in logical macOS points, captures model-facing screenshots at a 1280px long edge (1280x720 on 16:9 displays), declares those dimensions to Anthropic, and unscales returned model coordinates back to logical points before dispatching clicks, moves, and drags. OpenAI Responses uses the same screenshot invariant: model coordinates are always in the image space the model received, while `MacOSHostComputer` still receives logical points.\n\nThe extension also registers Codex-compatible Computer Use tools:\n\n| Tool | Purpose |\n|---|---|\n| `list_apps` | List running apps |\n| `get_app_state` | Capture screenshot + accessibility tree for an app |\n| `click` | Click by element index or screenshot coordinate |\n| `perform_secondary_action` | Invoke an accessibility action by element index |\n| `set_value` | Set a settable accessibility element value |\n| `drag` | Drag between screenshot coordinates |\n| `scroll` | Scroll an app by pages |\n| `type_text` | Type literal text |\n| `press_key` | Press a key or key chord |\n\nThe extension default-exports a pi extension factory and keeps these tools available even when native computer-use auto-activation is disabled.\n\n### Programmatic API\n\nImport `MacOSHostComputer` from `@macos-cua/core` and drive macOS directly:\n\n```typescript\nimport { MacOSHostComputer } from \"@macos-cua/core\";\n\nconst computer = new MacOSHostComputer();\n\nconst { data, width, height } = await computer.screenshot();\nawait computer.click({ x: 500, y: 300 });\nawait computer.type(\"Hello from TypeScript\");\nawait computer.key(\"Return\", { modifiers: [\"command\"] });\nawait computer.scroll({ direction: \"down\", amount: 10 });\nawait computer.drag({ from: { x: 100, y: 200 }, to: { x: 300, y: 400 } });\n\nconst pos = await computer.getCursorPosition();\nconst size = await computer.getScreenSize();\n\nawait computer.close();\n```\n\nAll methods return Promises. The API is intentionally identical to the OpenAI `Computer` abstraction so you can drop it into an agent loop without translation.\n\n## Action surface\n\nEvery tool/action exposed by CLI, MCP, and pi-extension:\n\n| Action | Parameters | Returns | What it does |\n|---|---|---|---|\n| `screenshot` | `targetSize?: { width, height }` | PNG `Buffer` + dimensions | Full-screen capture via `screencapture` and `sips` |\n| `click` | `x: number`, `y: number` | void | Single click via CoreGraphics `CGEventCreateMouseEvent` / `CGEventPost` |\n| `double_click` | `x: number`, `y: number` | void | Double click via CoreGraphics `CGEventCreateMouseEvent` / `CGEventPost` |\n| `type` | `text: string` | void | Type literal text via CoreGraphics `CGEventCreateKeyboardEvent` |\n| `key` | `key: string`, `modifiers?: string[]` | void | Key press with optional cmd/alt/ctrl/shift modifiers via CoreGraphics |\n| `scroll` | `direction: \"up\" \\| \"down\" \\| \"left\" \\| \"right\"`, `amount: number` | void | Scroll wheel event via CoreGraphics `CGEventCreateScrollWheelEvent` |\n| `drag` | `fromX, fromY, toX, toY` | void | Mouse down, move, up via CoreGraphics `CGEventCreateMouseEvent` |\n| `cursor_position` | none | `{ x, y }` | Current mouse coordinates via `CGEventGetLocation` |\n| `screen_size` | none | `{ width, height }` | Logical desktop bounds via Finder, with `system_profiler` fallback |\n\n## Permissions\n\nmacOS gates screen capture, input synthesis, and app lookup behind separate permission dialogs. The first time you run `screenshot` or `click`, macOS may prompt automatically. If it does not, grant them manually:\n\n1. **System Settings → Privacy \u0026 Security → Screen Recording** — toggle your terminal/IDE ON.\n2. **System Settings → Privacy \u0026 Security → Accessibility** — toggle the same terminal/IDE ON.\n3. **System Settings → Privacy \u0026 Security → Apple Events** — allow the terminal/IDE if you use `--target-bundle-id` or permission helpers that query System Events.\n4. Restart the terminal (some apps cache the permission state at launch).\n\nPermission is per-binary. If you switch from iTerm2 to Ghostty, you must re-grant for the new app.\n\nFull walkthrough: [`skills/macos-cua/references/installation.md`](./skills/macos-cua/references/installation.md).\n\n## Architecture\n\n```text\n+----------------------------------------------------------+\n|  Agent / CLI / MCP client / pi session                   |\n|  +----------------------------------------------------+  |\n|  |  @macos-cua/core                                   |  |\n|  |   ComputerInterface (abstract)                     |  |\n|  |   +-- HostComputer  (macOS implemented)            |  |\n|  |   +-- VMComputer    (stub: QEMU/Lume/VirtualBox)     |  |\n|  |   +-- CloudComputer (stub: cloud provider)         |  |\n|  +----------------------------------------------------+  |\n|                    |                                     |\n|  +-----------------+------------------+                 |\n|  |                 |                  |                  |\n|  v                 v                  v                  |\n|  CLI            MCP server       pi-extension            |\n|  commander.js   @modelcontext    registerTool factory    |\n|                 protocol/sdk       default export          |\n|  +----------------+------------------+                 |\n|                    |                                     |\n|  v                 v                  v                  |\n|  screencapture    koffi/CGEvent    SkyLight/AppKit FFI   |\n|  (screenshots)    (global input)   (targeted sessions)   |\n+----------------------------------------------------------+\n```\n\n| Package | Path | Role |\n|---|---|---|\n| `@macos-cua/core` | [`packages/core`](./packages/core) | `ComputerInterface` + platform abstractions (`HostComputer`, `VMComputer`, `CloudComputer`) + `MacOSHostComputer` implementation |\n| `@macos-cua/cli` | [`packages/cli`](./packages/cli) | `commander.js` binary (`macos-cua`) |\n| `@macos-cua/mcp` | [`packages/mcp`](./packages/mcp) | MCP stdio server (`macos-cua-mcp`) exposing Codex Computer Use tools |\n| `@macos-cua/pi-extension` | [`packages/pi-extension`](./packages/pi-extension) | Pi coding-agent extension with Codex-compatible Computer Use tools |\n| `skills/macos-cua` | [`skills/macos-cua`](./skills/macos-cua) | OpenCode-style skill definition + installation reference |\n\n## Roadmap\n\n| Feature | Status | Notes |\n|---|---|---|\n| macOS host-native screenshot | Implemented | `screencapture` + `sips` capture and resize |\n| macOS host-native input | Implemented | Native CoreGraphics CGEvent via koffi for global input; SkyLight/AppKit FFI for targeted app windows |\n| QEMU runtime | Interface stub | [`packages/core/src/platform/vm.ts`](./packages/core/src/platform/vm.ts) |\n| Lume runtime | Interface stub | Apple Virtualization.Framework VM |\n| VirtualBox / Parallels runtime | Interface stub | Planned |\n| Cloud provider runtime | Interface stub | [`packages/core/src/platform/cloud.ts`](./packages/core/src/platform/cloud.ts) |\n| ScreenCaptureKit capture | Planned | Current implementation uses the system screenshot fallback while the TypeScript FFI path stays helper-free |\n| SkyLight authenticated targeted input | Implemented | TypeScript FFI uses `SLEventPostToPid`, focus-without-raise, AppKit-backed mouse events, and keyboard auth messages |\n| Accessibility API queries | Implemented | `AXUIElement` tree extraction, `set_value`, and secondary actions |\n\n## Development\n\n```bash\n# Install dependencies\npnpm install\n\n# Type check + lint + test\npnpm check\n\n# Test only\npnpm test\n\n# Build all packages\npnpm build\n```\n\nPer-package builds:\n\n```bash\npnpm --filter @macos-cua/core build\npnpm --filter @macos-cua/cli build\npnpm --filter @macos-cua/mcp build\npnpm --filter @macos-cua/pi-extension build\n```\n\nStandards: ultra-strict TypeScript, ESM with `.js` imports, Biome formatting, Vitest, tabs, line width 120. See [`AGENTS.md`](./AGENTS.md) for the full convention.\n\n## Comparison vs cua / codex\n\n| Dimension | cua | codex | macos-cua |\n|---|---|---|---|\n| Language | Python | Rust + proprietary plugin | TypeScript |\n| Sandbox | VM / container / cloud | Host macOS (permission-scoped) | Host macOS (permission-scoped) |\n| Screenshot latency | ~500 ms + encode + transport | Native frame interval + local IPC | Native one-shot screenshot fallback |\n| Input latency | HTTP → guest → pynput | Native CGEvent / Apple Events | Native CoreGraphics CGEvent via koffi (~microseconds per event) |\n| Portability | Linux, macOS, Windows, Android, cloud | macOS only | macOS only (stubs for VM/cloud) |\n| Open source | Full SDK | Plugin host OSS, Computer Use plugin proprietary | Fully open source |\n| Agent integration | Any Python agent | Codex desktop only | CLI, MCP, pi-extension, or any TS agent |\n\nFull analysis: [`codex-cua-comparison.md`](./codex-cua-comparison.md).\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n## Related\n\n- [trycua/cua](https://github.com/trycua/cua) — upstream portable computer-use SDK (Python, VM-based)\n- [OpenAI Codex](https://github.com/openai/codex) — Codex desktop app with proprietary Computer Use plugin\n- [pi-mono](https://github.com/badlogic/pi-mono) — the pi coding-agent runtime\n- [pi-cua-integration](https://github.com/code-yeongyu/pi-cua-integration) — pi extension that wraps cua sandboxes (the model for this README)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-yeongyu%2Fmacos-cua","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode-yeongyu%2Fmacos-cua","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-yeongyu%2Fmacos-cua/lists"}