{"id":39031714,"url":"https://github.com/sh3ll3x3c/native-devtools-mcp","last_synced_at":"2026-03-04T11:07:18.896Z","repository":{"id":333003286,"uuid":"1134117514","full_name":"sh3ll3x3c/native-devtools-mcp","owner":"sh3ll3x3c","description":"MCP server for native app testing — screenshot, OCR, click, type, find_text, template matching. macOS, Windows \u0026 Android. Works with Claude, Cursor, and any MCP client.","archived":false,"fork":false,"pushed_at":"2026-02-22T11:59:46.000Z","size":11750,"stargazers_count":26,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-02-22T17:19:51.486Z","etag":null,"topics":["accessibility","adb","ai-agent","android","claude","claude-code","computer-use","cursor","e2e-testing","macos","mcp","mobile-automation","mobile-testing","model-context-protocol","ocr","rpa","screenshot","template-matching","ui-automation","windows"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/native-devtools-mcp","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sh3ll3x3c.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY_AUDIT.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-01-14T09:19:53.000Z","updated_at":"2026-02-22T11:59:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sh3ll3x3c/native-devtools-mcp","commit_stats":null,"previous_names":["sh3ll3x3c/native-devtools-mcp"],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/sh3ll3x3c/native-devtools-mcp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sh3ll3x3c%2Fnative-devtools-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sh3ll3x3c%2Fnative-devtools-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sh3ll3x3c%2Fnative-devtools-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sh3ll3x3c%2Fnative-devtools-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sh3ll3x3c","download_url":"https://codeload.github.com/sh3ll3x3c/native-devtools-mcp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sh3ll3x3c%2Fnative-devtools-mcp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30078524,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T08:01:56.766Z","status":"ssl_error","status_checked_at":"2026-03-04T08:00:42.919Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accessibility","adb","ai-agent","android","claude","claude-code","computer-use","cursor","e2e-testing","macos","mcp","mobile-automation","mobile-testing","model-context-protocol","ocr","rpa","screenshot","template-matching","ui-automation","windows"],"created_at":"2026-01-17T17:40:31.402Z","updated_at":"2026-03-04T11:07:18.886Z","avatar_url":"https://github.com/sh3ll3x3c.png","language":"Rust","readme":"# native-devtools-mcp\n\n\u003cdiv align=\"center\"\u003e\n\n![Version](https://img.shields.io/npm/v/native-devtools-mcp?style=flat-square)\n![License](https://img.shields.io/npm/l/native-devtools-mcp?style=flat-square)\n![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows%20%7C%20Android-blue?style=flat-square)\n![Downloads](https://img.shields.io/npm/dt/native-devtools-mcp?style=flat-square)\n\n**Give your AI agent \"eyes\" and \"hands\" for native desktop and mobile applications.**\n\nA Model Context Protocol (MCP) server that provides **Computer Use** capabilities: screenshots, OCR, input simulation, and window management — for **native desktop apps** and **Android devices**, not just browsers.\n\n**Works with:** [Claude Desktop](https://claude.ai/download) • [Claude Code](https://docs.anthropic.com/en/docs/claude-code) • [Cursor](https://cursor.com) • Any MCP-compatible client\n\n[//]: # \"Search keywords: MCP, MCP server, Model Context Protocol, computer use, desktop automation, UI automation, native app testing, test automation, e2e testing, RPA, screenshots, OCR, template matching, accessibility, mouse, keyboard, screen reading, macOS, Windows, Android, ADB, mobile testing, Claude, Claude Code, Cursor, AI agent, native-devtools-mcp\"\n\n[Features](#-features) • [Installation](#-installation) • [Getting Started](#-getting-started) • [Security \u0026 Trust](#-security--trust) • [For AI Agents](#-for-ai-agents-llms) • [Android](#-android-support)\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003e\u003cstrong\u003emacOS\u003c/strong\u003e\u003c/td\u003e\n\u003ctd align=\"center\"\u003e\u003cstrong\u003eWindows\u003c/strong\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"demo.gif\" width=\"450\" alt=\"macOS Demo\"\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"windows-demo-1.gif\" width=\"450\" alt=\"Windows Demo\"\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\u003c/div\u003e\n\n---\n\n## 🚀 Features\n\n- **👀 Computer Vision:** Capture screenshots of screens, windows, or specific regions. Includes built-in OCR (text recognition) to \"read\" the screen.\n- **🖱️ Input Simulation:** Click, drag, scroll, and type text naturally. Supports global coordinates and window-relative actions.\n- **🪟 Window Management:** List open windows, find applications, and bring them to focus.\n- **🧩 Template Matching:** Find non-text UI elements (icons, shapes) using `load_image` + `find_image`, returning precise click coordinates.\n- **🔒 Local \u0026 Private:** 100% local execution. No screenshots or data are ever sent to external servers.\n- **📱 Android Support:** Connect to Android devices over ADB for screenshots, input simulation, UI element search, and app management — all from the same MCP server.\n- **🔌 Dual-Mode Interaction:**\n    1.  **Visual/Native:** Works with *any* app via screenshots \u0026 coordinates (Universal).\n    2.  **AppDebugKit:** Deep integration for supported apps to inspect the UI tree (DOM-like structure).\n\n## 🤖 For AI Agents (LLMs)\n\nThis MCP server is designed to be **highly discoverable and usable** by AI models (Claude, Gemini, GPT).\n\n- **[📄 Read `AGENTS.md`](./AGENTS.md):** A compact, token-optimized technical reference designed specifically for ingestion by LLMs. It contains intent definitions, schema examples, and reasoning patterns.\n\n**Core Capabilities for System Prompts:**\n1.  `take_screenshot`: The \"eyes\". Returns images + layout metadata + text locations (OCR).\n2.  `click` / `type_text`: The \"hands\". Interacts with the system based on visual feedback.\n3.  `find_text`: A shortcut to find text on screen and get its coordinates immediately. Uses the platform **accessibility API** (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR fallback.\n4.  `element_at_point`: Inspect the accessibility element at given screen coordinates — returns name, role, label, value, bounds, pid, and app_name. Note: privacy-focused Electron apps (e.g. Signal) may restrict their AX tree, returning only a container — use `take_screenshot` with OCR as a fallback.\n5.  `load_image` / `find_image`: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.\n\n## 📦 Installation\n\nThe install steps are identical on macOS and Windows.\n\n### Option 1: Run with `npx` (no install needed)\n\n```bash\nnpx -y native-devtools-mcp\n```\n\n### Option 2: Global install\n\n```bash\nnpm install -g native-devtools-mcp\n```\n\n### Option 3: Build from source (Rust)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand build instructions\u003c/summary\u003e\n\n**Using the build script** (clones, builds, and runs setup):\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash\n```\n\n**Or manually:**\n\n```bash\ngit clone https://github.com/sh3ll3x3c/native-devtools-mcp\ncd native-devtools-mcp\ncargo build --release\n# Binary: ./target/release/native-devtools-mcp\n```\n\n\u003c/details\u003e\n\n## 🏁 Getting Started\n\nAfter installing, run the setup wizard:\n\n```bash\nnpx native-devtools-mcp setup\n```\n\nThis will:\n1. **Check permissions** (macOS) — verifies Accessibility and Screen Recording, opens System Settings if needed\n2. **Detect your MCP clients** — finds Claude Desktop, Claude Code, Cursor\n3. **Write the configuration** — generates the correct JSON config and offers to write it for you\n\nThen restart your MCP client and you're ready to go.\n\n\u003e **Claude Desktop on macOS** requires the signed app bundle (Gatekeeper blocks npx). Download `NativeDevtools-X.X.X.dmg` from [GitHub Releases](https://github.com/sh3ll3x3c/native-devtools-mcp/releases), drag to `/Applications`, then run setup — it will detect the app and configure Claude Desktop to use it.\n\n\u003e **VS Code, Windsurf, and other clients:** `setup` doesn't auto-detect these yet. Run `setup` for the permission checks, then see the manual configuration below for the JSON config snippet.\n\n\u003e **Claude Code tip:** To avoid approving every tool call (clicks, screenshots), add this to `.claude/settings.local.json`:\n\u003e ```json\n\u003e { \"permissions\": { \"allow\": [\"mcp__native-devtools__*\"] } }\n\u003e ```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eManual configuration (without setup)\u003c/strong\u003e\u003c/summary\u003e\n\n#### macOS — Claude Desktop\n\nConfig file: `~/Library/Application Support/Claude/claude_desktop_config.json`\n\n```json\n{\n  \"mcpServers\": {\n    \"native-devtools\": {\n      \"command\": \"/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp\"\n    }\n  }\n}\n```\n\n#### Windows — Claude Desktop\n\nConfig file: `%APPDATA%\\Claude\\claude_desktop_config.json`\n\n#### Claude Code, Cursor, and other MCP clients\n\n```json\n{\n  \"mcpServers\": {\n    \"native-devtools\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"native-devtools-mcp\"]\n    }\n  }\n}\n```\n\nRequires Node.js 18+.\n\n\u003c/details\u003e\n\n## 🔐 Security \u0026 Trust\n\nThis tool requires Accessibility and Screen Recording permissions — that's a lot of trust. Here's how to verify it deserves it.\n\n### Verify your binary\n\n```bash\nnative-devtools-mcp verify\n```\n\nComputes the SHA-256 hash of the running binary and checks it against the official checksums published on the [GitHub Releases](https://github.com/sh3ll3x3c/native-devtools-mcp/releases) page. If the hash matches, you're running an unmodified official build.\n\n### Build from source\n\nDon't trust pre-built binaries? Build it yourself:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash\n```\n\nThe script clones the repo, optionally opens it for review before building, compiles the release binary, and runs setup. See [`scripts/build-from-source.sh`](scripts/build-from-source.sh).\n\n### Audit the code\n\n[`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) documents exactly which permissions are used, where in the source code, and includes an LLM audit prompt you can paste into any AI model to perform an independent security review.\n\n### What this server does NOT do\n\n- **No unsolicited network access** — the server never phones home. Network is only used when the MCP client explicitly invokes `app_connect` (WebSocket to a local debug server) or when you run the `verify` subcommand (fetches checksums from GitHub)\n- **No file scanning** — does not read or index your files. The only file reads are `load_image` (reads a path the MCP client explicitly provides) and short-lived temp files for screenshots (deleted immediately after capture)\n- **No background persistence** — exits when the MCP client disconnects\n- **No data exfiltration** — screenshots are returned to the MCP client via stdout, never stored or transmitted elsewhere\n\n## 🔍 Two Approaches to Interaction\n\nWe provide two ways for agents to interact, allowing them to choose the best tool for the job.\n\n### 1. The \"Visual\" Approach (Universal)\n**Best for:** 99% of apps (Electron, Qt, Games, Browsers).\n*   **How it works:** The agent takes a screenshot, analyzes it visually (or uses OCR), and clicks at coordinates.\n*   **Tools:** `take_screenshot`, `find_text`, `click`, `type_text` (plus `load_image` / `find_image` for icons and shapes).\n*   **Example:** \"Click the button that looks like a gear icon.\" → use `find_image` with a gear template.\n\n### 2. The \"Structural\" Approach (AppDebugKit)\n**Best for:** Apps specifically instrumented with our AppDebugKit library (mostly for developers testing their own apps).\n*   **How it works:** The agent connects to a debug port and queries the UI tree (like HTML DOM).\n*   **Tools:** `app_connect`, `app_query`, `app_click`.\n*   **Example:** `app_click(element_id=\"submit-button\")`.\n\n## 🧩 Template Matching (find_image)\n\nUse `find_image` when the target is **not text** (icons, toggles, custom controls) and OCR or `find_text` cannot identify it.\n\n**Typical flow:**\n1. `take_screenshot(app_name=\"MyApp\")` → `screenshot_id`\n2. `load_image(path=\"/path/to/icon.png\")` → `template_id`\n3. `find_image(screenshot_id=\"...\", template_id=\"...\")` → `matches` with `screen_x/screen_y`\n4. `click(x=..., y=...)`\n\n**Fast vs Accurate:**\n- **fast** (default): uses downscaling and early-exit for speed.\n- **accurate**: uses full-resolution, wider scale search, and smaller stride for thorough matching.\n\nOptional inputs like `mask_id`, `search_region`, `scales`, and `rotations` can improve precision and performance.\n\n## 📱 Android Support\n\nAndroid support is built-in. The MCP server communicates with Android devices over ADB (USB or Wi-Fi), providing screenshots, input simulation, UI element search, and app management.\n\n### Prerequisites\n\n1. **ADB installed** on the host machine (`brew install android-platform-tools` on macOS, or install via [Android SDK](https://developer.android.com/tools/releases/platform-tools))\n2. **USB debugging enabled** on the Android device (Settings \u003e Developer options \u003e USB debugging)\n3. **ADB server running** — starts automatically when you run `adb devices`\n\n### Android tools\n\nAll Android tools are prefixed with `android_` and appear dynamically after connecting to a device:\n\n| Tool | Description |\n|------|-------------|\n| `android_list_devices` | List all ADB-connected devices (always available) |\n| `android_connect` | Connect to a device by serial number |\n| `android_disconnect` | Disconnect from the current device |\n| `android_screenshot` | Capture the device screen |\n| `android_find_text` | Find UI elements by text (via uiautomator) |\n| `android_click` | Tap at screen coordinates |\n| `android_swipe` | Swipe between two points |\n| `android_type_text` | Type text on the device |\n| `android_press_key` | Press a key (e.g., `KEYCODE_HOME`, `KEYCODE_BACK`) |\n| `android_launch_app` | Launch an app by package name |\n| `android_list_apps` | List installed packages |\n| `android_get_display_info` | Get screen resolution and density |\n| `android_get_current_activity` | Get the current foreground activity |\n\n### Typical workflow\n\n```\nandroid_list_devices          → find your device serial\nandroid_connect(serial=\"...\")  → connect (unlocks android_* tools)\nandroid_screenshot            → see what's on screen\nandroid_find_text(text=\"OK\")  → locate a button\nandroid_click(x=..., y=...)   → tap it\n```\n\n### Known issues\n\n\u003e **MIUI / HyperOS (Xiaomi, Redmi, POCO devices):** Input injection (`android_click`, `android_type_text`, `android_press_key`, `android_swipe`) and `android_find_text` (via uiautomator) require an additional security toggle:\n\u003e\n\u003e **Settings \u003e Developer options \u003e USB debugging (Security settings)** — enable this toggle. MIUI may require you to sign in with a Mi account to enable it.\n\u003e\n\u003e Without this, you'll see `INJECT_EVENTS permission` errors for input tools and `could not get idle state` errors for `android_find_text`. Screenshot and device info tools work without this toggle.\n\n\u003e **Wireless ADB:** To connect without a USB cable, first connect via USB and run:\n\u003e ```bash\n\u003e adb tcpip 5555\n\u003e adb connect \u003cphone-ip\u003e:5555\n\u003e ```\n\u003e Then use the `\u003cphone-ip\u003e:5555` serial in `android_connect`.\n\n### Smoke tests\n\nSmoke tests verify all Android tools against a real connected device. They are `#[ignore]`d by default and must be run explicitly:\n\n```bash\ncargo test --test android_smoke_tests -- --ignored --test-threads=1\n```\n\nTests must run sequentially (`--test-threads=1`) since they share a single physical device. The device must be unlocked and awake.\n\n## 🏗️ Architecture\n\n```mermaid\ngraph TD\n    Client[Claude / LLM Client] \u003c--\u003e|JSON-RPC 2.0| Server[native-devtools-mcp]\n    Server --\u003e|Direct API| Sys[System APIs]\n    Server --\u003e|WebSocket| Debug[AppDebugKit]\n    Server --\u003e|ADB Protocol| Android[Android Device]\n\n    subgraph \"Your Machine\"\n        Sys --\u003e|Screen/OCR| macOS[CoreGraphics / Vision]\n        Sys --\u003e|Input| Win[Win32 / SendInput]\n        Sys --\u003e|Text Search| UIA[UI Automation]\n        Debug -.-\u003e|Inspect| App[Target App]\n    end\n\n    subgraph \"Android Device (USB/Wi-Fi)\"\n        Android --\u003e|screencap| Screen[Screenshots]\n        Android --\u003e|input| Input[Tap / Swipe / Type]\n        Android --\u003e|uiautomator| UITree[UI Hierarchy]\n    end\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003e🔧 Technical Details (Under the Hood)\u003c/strong\u003e\u003c/summary\u003e\n\n| OS | Feature | API Used |\n|----|---------|----------|\n| **macOS** | Screenshots | `screencapture` (CLI) |\n| | Input | `CGEvent` (CoreGraphics) |\n| | Text Search (`find_text`) | `Accessibility API` (primary), Vision OCR (fallback) |\n| | Element Inspection (`element_at_point`) | `AXUIElementCopyElementAtPosition` + AX tree walk fallback (Accessibility API) |\n| | OCR | `VNRecognizeTextRequest` (Vision Framework) |\n| **Windows** | Screenshots | `BitBlt` (GDI) |\n| | Input | `SendInput` (Win32) |\n| | Text Search (`find_text`) | `UI Automation` (primary), WinRT OCR (fallback) |\n| | Element Inspection (`element_at_point`) | `IUIAutomation::ElementFromPoint` (UI Automation) |\n| | OCR | `Windows.Media.Ocr` (WinRT) |\n| **Android** | Screenshots | `screencap` / ADB framebuffer |\n| | Input | `adb shell input` (tap, swipe, text, keyevent) |\n| | Text Search (`find_text`) | `uiautomator dump` (accessibility tree) |\n| | Device Communication | `adb_client` crate (native Rust ADB protocol) |\n\n### Screenshot Coordinate Precision\n\nScreenshots include metadata for accurate coordinate conversion:\n\n- `screenshot_origin_x/y`: Screen-space origin of the captured area (in points)\n- `screenshot_scale`: Display scale factor (e.g., 2.0 for Retina displays)\n- `screenshot_pixel_width/height`: Actual pixel dimensions of the image\n- `screenshot_window_id`: Window ID (for window captures)\n\n**Coordinate conversion:**\n```\nscreen_x = screenshot_origin_x + (pixel_x / screenshot_scale)\nscreen_y = screenshot_origin_y + (pixel_y / screenshot_scale)\n```\n\n**Implementation notes:**\n- **Window captures** (macOS): Uses `screencapture -o` which excludes window shadow. The captured image dimensions match `kCGWindowBounds × scale` exactly, ensuring click coordinates derived from screenshots land on intended UI elements.\n- **Region captures**: Origin coordinates are aligned to integers to match the actual captured area.\n\n\u003c/details\u003e\n\n## ⚠️ Operational Safety\n\n*   **Hands Off:** When the agent is \"driving\" (clicking/typing), **do not move your mouse or type**.\n    *   *Why?* Real hardware inputs can conflict with the simulated ones, causing clicks to land in the wrong place.\n*   **Focus Matters:** Ensure the window you want the agent to use is visible. If a popup steals focus, the agent might type into the wrong window unless it checks first.\n\n## 🪟 Windows Notes\n\nWorks out of the box on **Windows 10/11**.\n*   Uses standard Win32 APIs (GDI, SendInput).\n*   `find_text` uses **UI Automation (UIA)** as the primary search mechanism, querying the accessibility tree for element names. This is the same accessibility-first approach used on macOS (with the Accessibility API). Falls back to OCR automatically when UIA finds no matches.\n*   OCR uses the built-in Windows Media OCR engine (offline).\n*   **Note:** Cannot interact with \"Run as Administrator\" windows unless the MCP server itself is also running as Administrator.\n\n## 📜 License\n\nMIT © [sh3ll3x3c](https://github.com/sh3ll3x3c)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsh3ll3x3c%2Fnative-devtools-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsh3ll3x3c%2Fnative-devtools-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsh3ll3x3c%2Fnative-devtools-mcp/lists"}