{"id":45579130,"url":"https://github.com/lahfir/agent-desktop","last_synced_at":"2026-06-28T05:00:48.937Z","repository":{"id":339462340,"uuid":"1161958643","full_name":"lahfir/agent-desktop","owner":"lahfir","description":"Native desktop automation CLI for AI agents. Control any application through OS accessibility trees with structured JSON output and deterministic element refs.","archived":false,"fork":false,"pushed_at":"2026-06-24T04:46:46.000Z","size":6089,"stargazers_count":885,"open_issues_count":3,"forks_count":53,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-06-24T06:25:44.267Z","etag":null,"topics":["accessibility","accessibility-api","ai-agents","automation","cli","desktop-automation","macos","mcp","rust"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lahfir.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["lahfir"]}},"created_at":"2026-02-19T18:00:49.000Z","updated_at":"2026-06-23T13:39:09.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lahfir/agent-desktop","commit_stats":null,"previous_names":["lahfir/agent-desktop"],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/lahfir/agent-desktop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lahfir%2Fagent-desktop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lahfir%2Fagent-desktop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lahfir%2Fagent-desktop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lahfir%2Fagent-desktop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lahfir","download_url":"https://codeload.github.com/lahfir/agent-desktop/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lahfir%2Fagent-desktop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34877471,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-28T02:00:05.809Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accessibility","accessibility-api","ai-agents","automation","cli","desktop-automation","macos","mcp","rust"],"created_at":"2026-02-23T11:31:27.613Z","updated_at":"2026-06-28T05:00:48.930Z","avatar_url":"https://github.com/lahfir.png","language":"Rust","funding_links":["https://github.com/sponsors/lahfir"],"categories":["Desktop Agents"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eAGENT DESKTOP\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eOBSERVE. DECIDE. ACT.\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/lahfir/agent-desktop/actions/workflows/ci.yml?query=branch%3Amain\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/lahfir/agent-desktop/ci.yml?branch=main\u0026style=for-the-badge\" alt=\"CI status\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/lahfir/agent-desktop/releases\"\u003e\u003cimg src=\"https://img.shields.io/github/v/release/lahfir/agent-desktop?include_prereleases\u0026style=for-the-badge\" alt=\"GitHub release\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.npmjs.com/package/agent-desktop\"\u003e\u003cimg src=\"https://img.shields.io/npm/v/agent-desktop?label=npm\u0026style=for-the-badge\" alt=\"npm version\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://clawhub.ai/lahfir/agent-desktop\"\u003e\u003cimg src=\"https://img.shields.io/badge/ClawHub-skill-f97316?style=for-the-badge\" alt=\"ClawHub skill\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://skills.sh/lahfir/agent-desktop/agent-desktop\"\u003e\u003cimg src=\"https://img.shields.io/badge/skills.sh-listed-8b5cf6?style=for-the-badge\" alt=\"skills.sh listing\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-Apache--2.0-blue.svg?style=for-the-badge\" alt=\"Apache-2.0 License\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/Tutorial.gif\" alt=\"agent-desktop tutorial demo\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n**agent-desktop** is a native desktop automation CLI designed for AI agents, built with Rust. It gives structured access to any application through OS accessibility trees — no screenshots, no pixel matching, no browser required.\n\n## Architecture\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/architecture.png\" alt=\"agent-desktop architecture diagram\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/example.png\" alt=\"agent-desktop real-world example — Slack accessibility tree with 97% token savings\" width=\"900\" /\u003e\n\u003c/p\u003e\n\n\u003ca href=\"https://star-history.com/#lahfir/agent-desktop\u0026Date\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://api.star-history.com/svg?repos=lahfir/agent-desktop\u0026type=Date\u0026theme=dark\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://api.star-history.com/svg?repos=lahfir/agent-desktop\u0026type=Date\"\u003e\n    \u003cimg alt=\"Star history for lahfir/agent-desktop\" src=\"https://api.star-history.com/svg?repos=lahfir/agent-desktop\u0026type=Date\"\u003e\n  \u003c/picture\u003e\n\u003c/a\u003e\n\n## Key Features\n\n- **Native Rust CLI**: Fast, single binary, no runtime dependencies\n- **C-ABI cdylib** (`libagent_desktop_ffi`): Load once from Python / Swift / Go / Ruby / Node / C instead of forking the CLI per call\n- **54 commands**: Observation, interaction, keyboard, mouse, notifications, clipboard, window management, plus a bundled `skills` doc loader\n- **Progressive skeleton traversal**: 78–96% token reduction on dense apps via shallow overview + targeted drill-down\n- **Snapshot \u0026 refs**: AI-optimized workflow using compact snapshot IDs and deterministic element references (`@e1`, `@e2`)\n- **Headless-by-default interactions**: Ref actions use accessibility APIs and block silent focus, cursor, keyboard, or pasteboard side effects\n- **Structured JSON output**: Machine-readable responses with error codes and recovery hints\n- **Works with any app**: Finder, Safari, System Settings, Xcode, Slack — anything with an accessibility tree\n\n## Installation\n\n### npm (recommended)\n\n```bash\nnpm install -g agent-desktop        # downloads prebuilt binary automatically\n```\n\nOr without installing:\n\n```bash\nnpx agent-desktop snapshot --app Finder -i\n```\n\n### From source\n\n```bash\ngit clone https://github.com/lahfir/agent-desktop\ncd agent-desktop\ncargo build --release\ncp target/release/agent-desktop /usr/local/bin/\n```\n\nRequires Rust 1.85+ and macOS 13.0+.\n\n### Permissions\n\nmacOS requires Accessibility permission. Screenshots also require Screen Recording permission. Grant them in **System Settings \u003e Privacy \u0026 Security** by adding the app that launches agent-desktop, or:\n\n```bash\nagent-desktop permissions --request   # trigger platform permission request path\n```\n\nPermission fields are explicit objects, for example:\n\n```json\n{\n  \"accessibility\": { \"state\": \"granted\" },\n  \"screen_recording\": { \"state\": \"denied\", \"suggestion\": \"Grant Screen Recording permission\" },\n  \"automation\": { \"state\": \"not_required\" }\n}\n```\n\n## Language bindings (FFI)\n\nEvery GitHub Release ships a prebuilt C-ABI cdylib (`libagent_desktop_ffi`) for macOS, Linux, and Windows alongside the CLI tarballs. `dlopen` it and call the functions declared in `agent_desktop.h` for in-process calls instead of fork-exec per command.\n\n```python\nimport ctypes\nlib = ctypes.CDLL(\"./lib/libagent_desktop_ffi.dylib\")\nlib.ad_init(1)  # verify ABI major (AD_ABI_VERSION_MAJOR) before any call\nadapter = lib.ad_adapter_create()\n# observe -\u003e act: ad_snapshot -\u003e parse an @e ref -\u003e ad_execute_by_ref ...\nlib.ad_adapter_destroy(adapter)\n```\n\nFull consumer guide — entrypoints, ownership, threading, error-handling, build/link, release archives, and verification: **[`skills/agent-desktop-ffi/`](skills/agent-desktop-ffi/)**.\n\n## Core Workflow for AI\n\nFor dense apps (Slack, VS Code, Notion), use **progressive skeleton traversal** to minimize token usage:\n\n```bash\n# 1. Shallow overview — depth-3 map, truncated containers show children_count\nagent-desktop snapshot --skeleton --app Slack -i --compact\n# Keep snapshot_id, for example s8f3k2p9\n\n# 2. Drill into a region of interest (named containers get refs as drill targets)\nagent-desktop snapshot --root @e3 --snapshot s8f3k2p9 -i --compact\n\n# 3. Act on an element found in the drill-down\nagent-desktop click @e12 --snapshot s8f3k2p9\n\n# 4. Re-drill the same region to verify the state change\nagent-desktop snapshot --root @e3 --snapshot s8f3k2p9 -i --compact\n```\n\nFor simple apps, a full snapshot is fine:\n\n```bash\nagent-desktop snapshot --app Finder -i   # get interactive elements with refs and snapshot_id\nagent-desktop click @e3 --snapshot s8f3k2p9  # click a button by ref\nagent-desktop type @e5 --snapshot s8f3k2p9 \"quarterly report\"  # insert text into a field\nagent-desktop press cmd+s               # keyboard shortcut\nagent-desktop snapshot -i               # re-observe after UI changes\n```\n\n```\nAgent loop:  snapshot → decide → act → snapshot → decide → act → ...\n```\n\n### Shared sessions for multi-agent workflows\n\nUse the same `--session \u003cid\u003e` when multiple agents coordinate on one desktop task. A session owns a latest-snapshot pointer, not a security boundary. Each snapshot gets its own `snapshot_id`; pass `--snapshot \u003cid\u003e` when an agent must act on a specific observation. Explicit snapshot IDs can be used without repeating `--session`; keep `--session` when you omit `--snapshot` and want that session's latest snapshot.\n\n```mermaid\nflowchart LR\n    S[\"--session release-fix\"] --\u003e A[\"snapshot -\u003e s1\"]\n    S --\u003e B[\"snapshot -\u003e s2\"]\n    A --\u003e C[\"Agent A: click @e4 --snapshot s1\"]\n    B --\u003e D[\"Agent B: wait --element @e9 --predicate actionable\"]\n    S --\u003e E[\"latest_snapshot_id points at newest snapshot\"]\n    C --\u003e F[\"Explicit snapshot id works outside session too\"]\n```\n\n```bash\nagent-desktop --session release-fix snapshot --app Xcode -i --compact\nagent-desktop --session release-fix wait --element @e9 --predicate actionable --timeout 5000\nagent-desktop --session release-fix click @e9\nagent-desktop click @e9 --snapshot s2\n```\n\n## Commands\n\n### Observation\n\n```bash\nagent-desktop snapshot --app Safari -i           # accessibility tree with refs\nagent-desktop snapshot --surface menu            # capture open menu\nagent-desktop screenshot --app Finder            # PNG screenshot\nagent-desktop find --role button --app TextEdit  # search by role, name, value, text\nagent-desktop get @e3 --snapshot s8f3k2p9 --property value  # read element property\nagent-desktop is @e7 --snapshot s8f3k2p9 --property checked # check boolean state\nagent-desktop list-surfaces --app Notes          # list menus, sheets, popovers, alerts\n```\n\n`get` and `is` resolve the ref once, prefer live platform reads when available, and fall back only when that live read is unsupported by the adapter.\n\n### Interaction\n\n```bash\nagent-desktop click @e3                  # semantic AX-first click\nagent-desktop double-click @e3           # AXOpen; physical double-click uses --headed mouse-click --count 2\nagent-desktop triple-click @e3           # POLICY_DENIED if physical input is disabled\nagent-desktop right-click @e3            # open verified context menu\nagent-desktop type @e5 \"hello world\"     # insert text into element\nagent-desktop set-value @e5 \"new value\"  # set value directly via AX\nagent-desktop clear @e5                  # clear element value\nagent-desktop focus @e5                  # set keyboard focus\nagent-desktop select @e9 \"Option B\"      # select verified dropdown/list option\nagent-desktop toggle @e12                # flip checkbox or switch\nagent-desktop check @e12                 # idempotent check\nagent-desktop uncheck @e12               # idempotent uncheck\nagent-desktop expand @e15                # expand disclosure/tree item\nagent-desktop collapse @e15              # collapse disclosure/tree item\nagent-desktop scroll @e1 --direction down --amount 3  # scroll (AX-first)\nagent-desktop scroll-to @e20             # scroll element into view\n```\n\n\u003e **(macOS, Phase 1)** Pure cursor gestures have no accessibility equivalent, so `triple-click`, `hover`, and `drag` are always physical; `double-click` is headless via `AXOpen` and only needs `--headed` for gesture-only targets. Windows (UIA) and Linux (AT-SPI) adapters may expose different capabilities. See `skills/agent-desktop/references/commands-interaction.md`.\n\n### Keyboard\n\n```bash\nagent-desktop press cmd+s               # key combo\nagent-desktop press cmd+shift+z          # multi-modifier\nagent-desktop press escape               # single key\nagent-desktop key-down shift             # hold key\nagent-desktop key-up shift               # release key\n```\n\n### Mouse\n\n```bash\nagent-desktop --headed hover @e3                  # move cursor to element\nagent-desktop --headed hover --xy 500,300         # move cursor to coordinates\nagent-desktop --headed drag --from @e3 --to @e8   # drag between elements\nagent-desktop --headed drag --from-xy 100,200 --to-xy 400,200  # drag between coordinates\nagent-desktop --headed mouse-click --xy 500,300   # click at coordinates\nagent-desktop --headed mouse-down --xy 500,300    # press at coordinates\nagent-desktop --headed mouse-up --xy 500,300      # release at coordinates\n```\n\n### App \u0026 Window Management\n\n```bash\nagent-desktop launch Safari              # launch app by name\nagent-desktop launch com.apple.Safari    # launch by bundle ID\nagent-desktop close-app Safari           # quit app\nagent-desktop close-app Safari --force   # force quit (SIGTERM, then SIGKILL if needed)\nagent-desktop list-apps                  # list running GUI apps\nagent-desktop list-windows               # list visible windows\nagent-desktop list-windows --app Finder  # windows for specific app\nagent-desktop focus-window w-4521        # bring window to front\nagent-desktop resize-window w-4521 800 600  # resize\nagent-desktop move-window w-4521 100 100    # move\nagent-desktop minimize w-4521            # minimize\nagent-desktop maximize w-4521            # maximize\nagent-desktop restore w-4521             # restore\n```\n\n### Notifications *(macOS only)*\n\n```bash\nagent-desktop list-notifications                       # list all notifications\nagent-desktop list-notifications --app \"Slack\"         # filter by app\nagent-desktop list-notifications --text \"deploy\" --limit 5  # filter by text\nagent-desktop dismiss-notification 1                   # dismiss by index\nagent-desktop dismiss-all-notifications                # dismiss all\nagent-desktop dismiss-all-notifications --app \"Slack\"  # dismiss all from app\nagent-desktop notification-action 1 --action \"Reply\"   # click action button\n```\n\n### Clipboard\n\n```bash\nagent-desktop clipboard-get              # read clipboard text\nagent-desktop clipboard-set \"copied\"     # write to clipboard\nagent-desktop clipboard-clear            # clear clipboard\n```\n\n### Wait\n\n```bash\nagent-desktop wait 500                                       # sleep 500ms\nagent-desktop wait --element @e3 --timeout 5000              # wait for element\nagent-desktop wait --element @e3 --predicate actionable      # wait until safe to act\nagent-desktop wait --element @e5 --predicate value --value ready\nagent-desktop wait --window \"Save\" --timeout 10000           # wait for window\nagent-desktop wait --text \"Loading complete\" --app Safari    # wait for text\nagent-desktop wait --text \"Done\" --count 1 --app Xcode       # wait for exact match count\nagent-desktop wait --notification --text \"Build Succeeded\"   # wait for new matching notification\nagent-desktop wait --menu --timeout 3000                     # wait for menu\n```\n\n### Batch\n\n```bash\nagent-desktop batch '[\n  {\"command\": \"click\", \"args\": {\"ref_id\": \"@e2\", \"snapshot\": \"\u003csnapshot_id\u003e\"}},\n  {\"command\": \"type\", \"args\": {\"ref_id\": \"@e5\", \"snapshot\": \"\u003csnapshot_id\u003e\", \"text\": \"hello\"}},\n  {\"command\": \"press\", \"args\": {\"combo\": \"return\"}}\n]' --stop-on-error\n\nagent-desktop --session run-a batch '[\n  {\"command\": \"snapshot\", \"args\": {\"app\": \"Finder\", \"interactive_only\": true}},\n  {\"command\": \"status\", \"session\": \"run-b\", \"args\": {}}\n]'\n```\n\n### System\n\n```bash\nagent-desktop status                     # platform, permission report, latest snapshot\nagent-desktop permissions                # check accessibility/screen-recording/automation\nagent-desktop permissions --request      # invoke platform request path\nagent-desktop version                    # version string\n```\n\n## Snapshot Options\n\n```bash\nagent-desktop snapshot [OPTIONS]\n```\n\n| Flag | Default | Description |\n|------|---------|-------------|\n| `--app \u003cNAME\u003e` | focused app | Filter to a specific application |\n| `--window-id \u003cID\u003e` | - | Filter to a specific window |\n| `-i` / `--interactive-only` | off | Only include interactive elements |\n| `--compact` | off | Omit empty structural nodes |\n| `--include-bounds` | off | Include pixel bounds (x, y, width, height) |\n| `--max-depth \u003cN\u003e` | 10 | Maximum tree depth |\n| `--skeleton` | off | Shallow 3-level overview; truncated containers show `children_count` and get refs as drill targets |\n| `--root \u003cREF\u003e` | - | Start traversal from this ref; merges into existing refmap with scoped invalidation |\n| `--snapshot \u003csnapshot_id\u003e` | latest | Snapshot ID to use when resolving `--root` |\n| `--surface \u003cTYPE\u003e` | window | `window`, `focused`, `menu`, `menubar`, `sheet`, `popover`, `alert` |\n\n## JSON Output\n\nEvery command returns structured JSON:\n\n```json\n{\n  \"version\": \"2.0\",\n  \"ok\": true,\n  \"command\": \"click\",\n  \"data\": { \"action\": \"click\" }\n}\n```\n\nErrors include machine-readable codes and recovery hints:\n\n```json\n{\n  \"version\": \"2.0\",\n  \"ok\": false,\n  \"command\": \"click\",\n  \"error\": {\n    \"code\": \"STALE_REF\",\n    \"message\": \"Element at @e7 no longer matches the last snapshot\",\n    \"suggestion\": \"Run 'snapshot' to refresh refs, then retry\"\n  }\n}\n```\n\n### Error Codes\n\n| Code | Meaning |\n|------|---------|\n| `PERM_DENIED` | Accessibility permission not granted |\n| `ELEMENT_NOT_FOUND` | No element matched the ref or query |\n| `APP_NOT_FOUND` | Application not running or no windows |\n| `STALE_REF` | Ref could not be re-identified in the live UI |\n| `AMBIGUOUS_TARGET` | Ref recovery matched multiple plausible targets |\n| `SNAPSHOT_NOT_FOUND` | Snapshot ID is missing or expired |\n| `POLICY_DENIED` | Physical/headed path blocked by policy |\n| `ACTION_FAILED` | The OS rejected the action |\n| `PLATFORM_NOT_SUPPORTED` | Adapter method not implemented on this platform |\n| `TIMEOUT` | Wait condition expired |\n| `INVALID_ARGS` | Invalid argument values |\n\n### Exit Codes\n\n`0` success, `1` structured error (JSON on stdout), `2` argument parse error.\n\n## Ref System\n\n`snapshot` assigns refs to interactive elements in depth-first order: `@e1`, `@e2`, `@e3`, etc. Refs are scoped to a compact `snapshot_id` such as `s8f3k2p9`. Commands can omit `--snapshot` to use the active session's latest snapshot pointer, but passing the ID is more deterministic in multi-step flows and does not require also passing `--session`.\n\nInteractive roles that receive refs: `button`, `textfield`, `checkbox`, `link`, `menuitem`, `tab`, `slider`, `combobox`, `treeitem`, `cell`, `radiobutton`, `incrementor`, `menubutton`, `switch`, `colorwell`, `dockitem`.\n\nStatic elements (labels, groups, containers) appear in the tree for context but have no ref.\n\nReliability contract:\n\n- `--session \u003cid\u003e` scopes the latest snapshot pointer to one caller or agent team; explicit `--snapshot \u003cid\u003e` resolves the saved snapshot directly.\n- Ref actions re-identify targets at action time: a moved unique target can proceed, while missing or changed stable identity returns `STALE_REF`.\n- Mutable value text is not treated as stable identity, so text fields and timers can keep resolving when the saved window, path, role, and bounds evidence still identify the same element.\n- Multiple plausible targets return `AMBIGUOUS_TARGET` instead of choosing arbitrarily.\n- Actions run an actionability preflight before dispatch: visibility, stability, enabled state, supported action, policy, and editability.\n- `wait --element @e3 --predicate actionable` polls until the target can be acted on.\n- `--trace \u003cpath\u003e` appends JSONL diagnostics outside stdout; `--trace-strict` fails on trace setup and pre-action trace writes, while post-action success traces are best-effort after the desktop mutation has already happened.\n\nStale ref recovery:\n\n```\nsnapshot → act → STALE_REF or AMBIGUOUS_TARGET? → wait/snapshot again → retry with the new ref\n```\n\n## Platform Support\n\n| | macOS | Windows | Linux |\n|---|:---:|:---:|:---:|\n| Accessibility tree | **Yes** | Planned | Planned |\n| Click / type / keyboard | **Yes** | Planned | Planned |\n| Mouse input | **Yes** | Planned | Planned |\n| Screenshot | **Yes** | Planned | Planned |\n| Clipboard | **Yes** | Planned | Planned |\n| App \u0026 window management | **Yes** | Planned | Planned |\n| Notifications | **Yes** | Planned | Planned |\n\n## Development\n\n```bash\ncargo build                               # debug build\ncargo build --release                     # optimized (\u003c15MB)\ncargo test --lib --workspace              # run tests\ncargo clippy --all-targets -- -D warnings # lint (must pass with zero warnings)\n```\n\n## FAQ\n\n### What is agent-desktop?\n\nagent-desktop is a native desktop automation CLI for AI agents. It lets agents observe and control desktop apps through OS accessibility trees, using structured JSON instead of screenshots, pixel matching, or browser-only automation.\n\n### Does agent-desktop require screenshots or pixel matching?\n\nNo. The core workflow reads native accessibility trees and assigns refs to interactive elements. Screenshots are available as a separate command, but agents do not need screenshots or pixel matching to click buttons, type into fields, inspect menus, or navigate app windows.\n\n### How does agent-desktop work?\n\n| Component | Function |\n|-----------|----------|\n| **Native Rust CLI** | Fast, single binary, no runtime dependencies |\n| **C-ABI cdylib** | Load once from Python, Swift, Go, Ruby, Node, or C instead of forking |\n| **54 Commands** | Observation, interaction, keyboard, mouse, notifications, clipboard, window management, and bundled `skills` docs |\n| **Snapshot \u0026 Refs** | Compact snapshot IDs and deterministic element refs like `@e1`, `@e2` |\n| **Structured JSON** | Machine-readable responses with error codes and recovery hints |\n\n### What makes agent-desktop useful for AI agents?\n\n| Feature | Benefit |\n|---------|---------|\n| **Progressive Skeleton Traversal** | 78–96% token reduction on dense apps |\n| **Headless-by-Default Actions** | Ref actions use accessibility APIs and block unintended physical side effects |\n| **Snapshot Refs** | Agents act on stable refs within a snapshot instead of guessing coordinates |\n| **Recovery Hints** | Errors include machine-readable codes and suggestions for the next agent step |\n| **Cross-Language FFI** | Python, Swift, Go, Ruby, Node, C, and C++ hosts can call the native library directly |\n\n### Which platforms are supported?\n\n| Feature | macOS | Windows | Linux |\n|---------|:-----:|:-------:|:-----:|\n| Accessibility tree | **Yes** | Planned | Planned |\n| Click/type/keyboard | **Yes** | Planned | Planned |\n| Mouse input | **Yes** | Planned | Planned |\n| Screenshot | **Yes** | Planned | Planned |\n| Clipboard | **Yes** | Planned | Planned |\n| App/window management | **Yes** | Planned | Planned |\n| Notifications | **Yes** | Planned | Planned |\n\n### How do I install agent-desktop?\n\nInstall the CLI from npm:\n\n```bash\nnpm install -g agent-desktop\nagent-desktop snapshot --app Safari\n```\n\nBuild the FFI library from source:\n\n```bash\ncargo build --release\n# Outputs: libagent_desktop_ffi.dylib/.so/.dll\n```\n\n### What is the ref system?\n\n`snapshot` assigns refs to interactive elements in depth-first order: `@e1`, `@e2`, `@e3`, etc. Refs are scoped to a compact `snapshot_id` such as `s8f3k2p9`. Commands can omit `--snapshot` to use the active session's latest snapshot pointer, but explicit snapshot IDs are the deterministic path and do not require also passing `--session`.\n\nInteractive roles that receive refs:\n\n`button`, `textfield`, `checkbox`, `link`, `menuitem`, `tab`, `slider`, `combobox`, `treeitem`, `cell`, `radiobutton`, `incrementor`, `menubutton`, `switch`, `colorwell`, `dockitem`.\n\nStale ref recovery:\n\n```text\nsnapshot -\u003e act -\u003e STALE_REF? -\u003e snapshot again -\u003e retry\n```\n\n### Is agent-desktop free and open source?\n\nYes. agent-desktop is Apache-2.0 licensed for personal and commercial use.\n\n### Where can I get help?\n\n| Resource | Link |\n|----------|------|\n| **Repository** | [github.com/lahfir/agent-desktop](https://github.com/lahfir/agent-desktop) |\n| **ClawHub Skill** | [clawhub.ai/lahfir/agent-desktop](https://clawhub.ai/lahfir/agent-desktop) |\n| **skills.sh Listing** | [skills.sh/lahfir/agent-desktop/agent-desktop](https://skills.sh/lahfir/agent-desktop/agent-desktop) |\n| **npm Package** | [npmjs.com/package/agent-desktop](https://www.npmjs.com/package/agent-desktop) |\n| **CI Status** | [GitHub Actions](https://github.com/lahfir/agent-desktop/actions/workflows/ci.yml?query=branch%3Amain) |\n| **Releases** | [GitHub Releases](https://github.com/lahfir/agent-desktop/releases) |\n| **Issues** | [GitHub Issues](https://github.com/lahfir/agent-desktop/issues) |\n\n## License\n\nApache-2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flahfir%2Fagent-desktop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flahfir%2Fagent-desktop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flahfir%2Fagent-desktop/lists"}