{"id":50478316,"url":"https://github.com/etherman-os/agent-desktop-harness","last_synced_at":"2026-06-01T15:03:56.884Z","repository":{"id":360725381,"uuid":"1243353086","full_name":"etherman-os/agent-desktop-harness","owner":"etherman-os","description":"Linux-first GUI QA and visual handoff cockpit for coding agents.","archived":false,"fork":false,"pushed_at":"2026-05-27T16:09:38.000Z","size":477,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-27T17:15:56.566Z","etag":null,"topics":["coding-agents","desktop-automation","electron","gui-automation","linux","mcp","novnc","playwright","qa-automation","tauri","typescript","visual-qa","visual-testing","webdriver","x11","xvfb"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/etherman-os.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":"docs/ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-19T09:06:58.000Z","updated_at":"2026-05-27T16:09:43.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/etherman-os/agent-desktop-harness","commit_stats":null,"previous_names":["etherman-os/agent-desktop-harness"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/etherman-os/agent-desktop-harness","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etherman-os%2Fagent-desktop-harness","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etherman-os%2Fagent-desktop-harness/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etherman-os%2Fagent-desktop-harness/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etherman-os%2Fagent-desktop-harness/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/etherman-os","download_url":"https://codeload.github.com/etherman-os/agent-desktop-harness/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/etherman-os%2Fagent-desktop-harness/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33780090,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coding-agents","desktop-automation","electron","gui-automation","linux","mcp","novnc","playwright","qa-automation","tauri","typescript","visual-qa","visual-testing","webdriver","x11","xvfb"],"created_at":"2026-06-01T15:03:54.931Z","updated_at":"2026-06-01T15:03:56.878Z","avatar_url":"https://github.com/etherman-os.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Agent Desktop Harness\n\nLinux-first GUI QA and visual handoff cockpit for coding agents.\n\nLet agents see, click, verify, and prove GUI changes.\n\nCoding agents often pass terminal tests while breaking the real UI. Agent Desktop Harness gives them an isolated Linux desktop where they can launch apps, interact through semantic drivers or X11 fallback, capture screenshots, annotate issues, compare visual evidence, and clean up safely.\n\nThis project focuses on the gap between generic computer-use tools and coding-agent GUI QA workflows.\n\n## Who It Is For\n\n- Claude Code, Codex, Cursor, and other MCP clients.\n- Hermes Agent and custom orchestration systems that prefer HTTP.\n- Coding agents that need to verify browser, Electron, Tauri, or Linux GUI apps.\n- Developers who need screenshot evidence and a clear human-to-agent visual bug handoff.\n\n## What It Is\n\n- A Linux-first harness for agent-driven GUI verification.\n- An isolated Xvfb desktop runtime by default.\n- A shared TypeScript core exposed through CLI, HTTP JSON API, and MCP stdio server.\n- An evidence system for screenshots, action logs, session metadata, reports, annotations, crops, and visual handoffs.\n- A Playwright-powered browser semantic driver for web apps running inside the isolated desktop session.\n- A verified Tauri native-window X11 fallback workflow and an experimental Tauri WebDriver semantic spike.\n- An experimental Playwright Electron semantic driver for development-mode Electron apps.\n- A high-level driver router that chooses browser, Tauri, Electron, or X11 interaction paths explicitly.\n- A lightweight Visual QA layer for PNG screenshot diffing, region comparison, and before/after reports.\n- An optional localhost noVNC live observer for watching isolated Xvfb sessions in a browser.\n\n## What It Is Not\n\n- It is not a generic remote-control tool for the user's real desktop.\n- It is not a replacement for Playwright, WebDriver, or accessibility tooling.\n- It is not an auth-protected remote desktop service.\n- It does not implement production-grade Tauri or Electron semantic drivers yet.\n- It does not claim autonomous repair intelligence. The current repair demo proves the handoff and verification workflow.\n\n## Architecture\n\n```text\nClaude Code / Codex / Hermes / Cursor\n        |\n        v\nMCP / HTTP / CLI\nAgent Desktop Harness\n        |\n        v\nDriver Router\n        |-- Browser -\u003e Playwright semantic driver\n        |-- Electron -\u003e Playwright Electron driver\n        |-- Tauri -\u003e tauri-driver / WebDriver experimental driver\n        `-- Unknown/native -\u003e X11 fallback\n        |\n        v\nXvfb isolated Linux desktop\n        |\n        v\nScreenshots + actions.jsonl + visual-handoff.md\n  + visual-diffs/ + visual-assertions.jsonl + baselines\n```\n\nThe core engine owns session lifecycle, policy checks, process cleanup, screenshot capture, input actions, window actions, and evidence. The adapters call the same core instead of duplicating runtime logic.\n\n## Verified in v0.2\n\n- Xvfb isolated desktop sessions.\n- CLI, HTTP, and MCP interfaces.\n- Browser semantic driver.\n- Electron semantic driver smoke with the sample Electron app.\n- Tauri WebDriver experimental path when prerequisites and app configuration are provided.\n- Driver router.\n- Visual Annotation Handoff.\n- Visual diff and baseline assertions.\n- Annotation-region visual assertions.\n- Optional live observer layer.\n\nBrowser semantic support is verified. Electron semantic support is experimental and sample-verified. Tauri WebDriver support is experimental and configured-app verified when local prerequisites are available. The live observer is implemented but optional; its smoke skips honestly unless `x11vnc`, noVNC, and websockify dependencies are installed.\n\n## Main Demo\n\nThe strongest v0.2 proof path is the annotation-driven repair demo:\n\n```sh\npnpm smoke:annotation-repair\n```\n\nIt verifies that the harness can:\n\n- start the sample Vite app locally;\n- launch a graphical browser inside an isolated Xvfb session;\n- capture a screenshot of an intentional UI issue;\n- create a visual annotation handoff with a rectangle and note;\n- save a crop image and `visual-handoff.md`;\n- capture an after-fix comparison screenshot;\n- generate Visual QA diff/assertion evidence for the before/after change;\n- stop the browser, HTTP server, Vite server, window manager, and Xvfb cleanly.\n\nThe evidence layout is:\n\n```text\n.desktop-harness/\n  sessions/\n    \u003csessionId\u003e/\n      screenshots/\n      annotations/\n      visual-diffs/\n      annotations.jsonl\n      visual-assertions.jsonl\n      visual-handoff.md\n      actions.jsonl\n      session.json\n      report.md\n```\n\n## Quick Start\n\nUbuntu is the primary target for v0.2.\n\n```sh\ngit clone https://github.com/etherman-os/agent-desktop-harness.git\ncd agent-desktop-harness\npnpm install\n./scripts/install-ubuntu-deps.sh\npnpm build\npnpm --filter @agent-desktop-harness/cli dev -- doctor\npnpm smoke:annotation-repair\n```\n\nManual dependency install:\n\n```sh\nsudo apt update\nsudo apt install -y xvfb openbox x11-utils scrot xdotool wmctrl xterm\n```\n\nOptional live observer dependencies:\n\n```sh\nsudo apt install -y x11vnc novnc websockify\n```\n\nThe Vite/browser smokes also require a graphical browser such as Chromium, Chrome, or Firefox. You can override browser detection:\n\n```sh\nAGENT_DESKTOP_HARNESS_BROWSER=/usr/bin/firefox pnpm smoke:vite:http\n```\n\nThe Hermes Studio capture contract smoke also requires a graphical browser. It writes a deterministic PNG path that a caller can verify:\n\n```sh\npnpm smoke:studio-capture-contract\n```\n\nThe dependency helper is never run automatically by package scripts or tests.\n\n## CLI\n\nUseful local commands:\n\n```sh\npnpm --filter @agent-desktop-harness/cli dev -- doctor\npnpm smoke:x11\npnpm smoke:http\npnpm smoke:mcp\npnpm smoke:vite\npnpm smoke:annotation-repair\npnpm smoke:browser-semantic\npnpm smoke:tauri-driver\npnpm smoke:electron-driver\npnpm smoke:driver-router\npnpm smoke:visual-qa\npnpm smoke:visual-baseline\npnpm smoke:observer\npnpm smoke:studio-capture-contract\n```\n\nThe smoke commands are manual integration checks. They are not part of `pnpm test` because they require local Linux GUI dependencies and a real Xvfb runtime.\n\n## HTTP JSON API\n\nHermes and custom agents can use the local HTTP API.\n\n```sh\npnpm --filter @agent-desktop-harness/http-server dev\ncurl http://127.0.0.1:7341/health\n```\n\nThe default bind host is `127.0.0.1`. The server rejects non-loopback bind hosts; use only `127.0.0.1`, `localhost`, or `::1`. Do not expose the HTTP server to the public internet.\n\nTypical HTTP workflow:\n\n```text\nPOST /sessions\nPOST /sessions/:sessionId/launch\nPOST /sessions/:sessionId/wait-for-window\nPOST /sessions/:sessionId/wait-for-stable-screen\nPOST /sessions/:sessionId/screenshot\nPOST /sessions/:sessionId/click\nPOST /sessions/:sessionId/type-text\nPOST /sessions/:sessionId/browser/open\nPOST /sessions/:sessionId/browser/fill\nPOST /sessions/:sessionId/browser/click\nPOST /sessions/:sessionId/browser/assert-text\nPOST /sessions/:sessionId/browser/screenshot\nGET  /drivers/status\nPOST /sessions/:sessionId/apps/open\nPOST /sessions/:sessionId/apps/fill\nPOST /sessions/:sessionId/apps/click\nPOST /sessions/:sessionId/apps/assert-text\nPOST /sessions/:sessionId/apps/screenshot\nPOST /sessions/:sessionId/visual/compare\nPOST /sessions/:sessionId/visual/assert-changed\nPOST /sessions/:sessionId/visual/assert-similar\nPOST /sessions/:sessionId/visual/baselines\nGET  /sessions/:sessionId/visual/baselines\nPOST /sessions/:sessionId/visual/compare-baseline\nPOST /sessions/:sessionId/visual/assert-annotation-changed\nPOST /sessions/:sessionId/visual/assert-change-contained\nGET  /sessions/:sessionId/visual/assertions\nGET  /observer/status\nGET  /sessions/:sessionId/observers\nPOST /sessions/:sessionId/observers\nDELETE /sessions/:sessionId/observers/:observerId\nGET  /electron/status\nPOST /sessions/:sessionId/electron/open\nPOST /sessions/:sessionId/electron/fill\nPOST /sessions/:sessionId/electron/click\nPOST /sessions/:sessionId/electron/assert-text\nPOST /sessions/:sessionId/electron/screenshot\nGET  /sessions/:sessionId/visual-handoff\nDELETE /sessions/:sessionId\n```\n\nSee [Hermes Integration](docs/HERMES_INTEGRATION.md) for curl examples and orchestration guidance.\n\nUse the browser semantic routes for web app interactions such as filling inputs, clicking buttons by role/name, and asserting visible text. The existing desktop routes remain the X11 fallback for native windows and visual evidence.\n\nUse the high-level driver-router routes when an agent should choose the best available path from `appKind` and capability status. The router response always reports `selectedDriver`, `semantic`, `fallbackUsed`, and warnings. See [Driver Router](docs/DRIVER_ROUTER.md).\n\nUse Visual QA routes after capturing before/after PNG screenshots when the agent should measure pixel change, generate a diff PNG, assert that a region changed, compare a named baseline, or verify that changes stayed inside expected rectangles. See [Visual QA Assertions](docs/VISUAL_QA_ASSERTIONS.md) and [Visual Baselines](docs/VISUAL_BASELINES.md).\n\nUse live observer routes when a human should watch the isolated session through a local browser. The observer is optional, localhost-only by default, and stopped automatically with the session. See [Live Observer](docs/LIVE_OBSERVER.md).\n\nUse the experimental Electron routes for development-mode Electron apps that can be launched by Playwright's Electron API. If Electron opens in fallback mode or semantic launch fails, continue with the desktop routes.\n\n## MCP Stdio Server\n\nClaude Code, Codex, Cursor, and MCP-compatible clients can use the MCP stdio server.\n\n```sh\npnpm build\nnode packages/mcp-server/dist/index.js\n```\n\nClaude Code-style registration:\n\n```sh\nclaude mcp add --transport stdio desktop-harness -- \\\n  node /absolute/path/to/agent-desktop-harness/packages/mcp-server/dist/index.js\n```\n\nMCP stdio tools are implemented and smoke-tested locally. See [MCP Usage](docs/MCP_USAGE.md) for tool workflows, MCP Inspector notes, and troubleshooting.\n\n## Visual Annotation Handoff\n\nDraw what you mean. Let the agent fix it.\n\nSometimes the human cannot describe the exact broken UI area in words. Visual Annotation Handoff lets the human draw a rectangle on a screenshot and attach a note. The harness saves the annotation, a crop image, and `visual-handoff.md` for the agent.\n\nHuman workflow:\n\n```text\n1. Capture screenshot.\n2. Open /sessions/\u003csessionId\u003e/annotate.\n3. Draw rectangle.\n4. Write note.\n5. Save annotation.\n6. Give visual-handoff.md to the coding agent.\n7. Agent makes a targeted fix and captures after evidence.\n```\n\nPrint an annotation URL for an active HTTP session:\n\n```sh\npnpm --filter @agent-desktop-harness/cli dev -- annotate-url --session \u003csessionId\u003e\n```\n\nThe HTTP server must already be running and must still have that session in memory.\n\nAgent prompt template:\n\n```text\ndocs/prompts/ANNOTATION_REPAIR_AGENT_PROMPT.md\n```\n\nSee [Visual Annotation Handoff](docs/VISUAL_ANNOTATION_HANDOFF.md) for routes, MCP tools, artifact layout, and security notes.\n\n## Visual QA Assertions\n\nVisual QA turns before/after screenshots into measurable evidence. It can compare full PNG screenshots, compare a selected region, create a diff PNG, save/list/compare local baselines, use rectangle annotations as assertion regions, check pixel change containment, and write compact summaries into `visual-assertions.jsonl`, `report.md`, and annotation handoff reports.\n\nRun:\n\n```sh\npnpm smoke:visual-qa\npnpm smoke:visual-baseline\n```\n\nSee [Visual QA Assertions](docs/VISUAL_QA_ASSERTIONS.md) and [Visual Baselines](docs/VISUAL_BASELINES.md) for HTTP and MCP examples.\n\n## Live Observer\n\nThe optional live observer starts `x11vnc` against the isolated Xvfb display and serves a noVNC browser page on `127.0.0.1`. It is useful for demos, long-running GUI debugging, and pairing live observation with screenshot annotation.\n\nRun:\n\n```sh\npnpm --filter @agent-desktop-harness/cli dev -- observer-status\npnpm smoke:observer\n```\n\nInstall optional packages with:\n\n```sh\n./scripts/install-ubuntu-deps.sh --with-observer\n```\n\nThe smoke passes when observer dependencies are available and skips honestly when they are missing. See [Live Observer](docs/LIVE_OBSERVER.md) and [Security](docs/SECURITY.md).\n\n## Current Capabilities\n\n| Capability                |                                 Status | Smoke                   |\n| ------------------------- | -------------------------------------: | ----------------------- |\n| Isolated Xvfb session     |                               Verified | `smoke:x11`             |\n| CLI interface             |                               Verified | `smoke:x11`             |\n| HTTP API                  |                               Verified | `smoke:http`            |\n| MCP stdio server          |                               Verified | `smoke:mcp`             |\n| Browser semantic driver   |                               Verified | `smoke:browser-semantic` |\n| Electron semantic driver  |         Experimental / sample verified | `smoke:electron-driver` |\n| Tauri X11 fallback        |                               Verified | manual workflow         |\n| Tauri WebDriver driver    | Experimental / configured-app verified | `smoke:tauri-driver`    |\n| Driver router             |                               Verified | `smoke:driver-router`   |\n| Visual Annotation Handoff |                           MVP verified | `smoke:annotation-repair` |\n| Visual diff               |                               Verified | `smoke:visual-qa`       |\n| Visual baselines          |                               Verified | `smoke:visual-baseline` |\n| Hermes Studio capture contract |                         Verified | `smoke:studio-capture-contract` |\n| Annotation region assertions |                            Verified | `smoke:annotation-repair` |\n| Change containment        |                               Verified | `smoke:annotation-repair` |\n| noVNC live observer       |            Optional / dependency-gated | `smoke:observer`        |\n| X11 fallback              |                               Verified | `smoke:x11`             |\n| OCR                       |                        Not implemented | N/A                     |\n| Wayland backend           |                                 Future | N/A                     |\n\n## Known Limitations\n\n- Linux/X11/Xvfb-first.\n- Real desktop control is not enabled by default.\n- Tauri WebDriver semantic support is experimental and may fall back to X11.\n- Tauri WebDriver mode usually needs `tauri-driver`, `WebKitWebDriver`, a built app binary, and any `build.devUrl` frontend server started separately or through smoke prelaunch.\n- Electron semantic support is experimental and focused on development-mode Electron apps launched through Playwright's Electron API.\n- Packaged Electron app support may require a different launch or CDP connection path.\n- The driver router reports fallback explicitly; it does not make X11 fallback understand semantic selectors.\n- Visual QA is PNG-only, has no OCR, and does not perform automatic layout or element detection.\n- Visual change containment is pixel-based and checks known rectangles only.\n- noVNC live observer requires optional `x11vnc`, `novnc`, and `websockify` packages and is local-only by default.\n- Non-browser GUI interaction fallback is still coordinate-based.\n- Browser semantic screenshots are page-content screenshots; desktop screenshots remain X11 root-window screenshots.\n- Stable-check screenshots may be moved to `transient/` so reports focus on retained screenshots.\n- Built-in annotation UI supports rectangle drawing only.\n- HTTP server has no authentication and is loopback-only.\n- Sessions are in-memory per server process.\n- Visual annotation repair smoke proves handoff and verification, not autonomous LLM repair.\n- Evidence may contain sensitive screenshots, local paths, and typed text unless redaction is explicitly used.\n\n## Generated Evidence\n\nThe repository intentionally does not commit generated smoke screenshots, visual diffs, baselines, or session evidence by default. Run the smoke commands locally and inspect `.desktop-harness/sessions/\u003csessionId\u003e/` for screenshots, `report.md`, `visual-handoff.md`, `visual-diffs/`, and `visual-assertions.jsonl`. The Hermes Studio contract smoke additionally writes `.desktop-harness/studio-capture-contract/screenshot.png` so external callers can verify a stable local file path.\n\n## Development Checks\n\n```sh\npnpm install\npnpm typecheck\npnpm build\npnpm lint\npnpm test\npnpm --filter @agent-desktop-harness/cli dev -- doctor\n```\n\nRelease-candidate smoke checks:\n\n```sh\npnpm smoke:x11\npnpm smoke:http\npnpm smoke:mcp\npnpm smoke:vite:http\npnpm smoke:vite:mcp\npnpm smoke:annotation-repair\npnpm smoke:browser-semantic\npnpm smoke:electron-driver\npnpm smoke:driver-router\npnpm smoke:visual-qa\npnpm smoke:visual-baseline\n```\n\nOptional or dependency-gated checks:\n\n```sh\npnpm smoke:observer\npnpm smoke:tauri-driver\n```\n\n## Documentation\n\n- [Architecture](docs/ARCHITECTURE.md)\n- [Security](docs/SECURITY.md)\n- [Roadmap](docs/ROADMAP.md)\n- [Linux Troubleshooting](docs/TROUBLESHOOTING_LINUX.md)\n- [MCP Usage](docs/MCP_USAGE.md)\n- [Hermes Integration](docs/HERMES_INTEGRATION.md)\n- [Driver Router](docs/DRIVER_ROUTER.md)\n- [Agent GUI QA Cockpit Workflow](docs/AGENT_GUI_QA_COCKPIT.md)\n- [Visual QA Assertions](docs/VISUAL_QA_ASSERTIONS.md)\n- [Visual Baselines](docs/VISUAL_BASELINES.md)\n- [Live Observer](docs/LIVE_OBSERVER.md)\n- [Browser Semantic Driver](docs/BROWSER_SEMANTIC_DRIVER.md)\n- [Tauri Workflow](docs/TAURI_WORKFLOW.md)\n- [Tauri Driver Spike](docs/TAURI_DRIVER_SPIKE.md)\n- [Electron Driver Spike](docs/ELECTRON_DRIVER_SPIKE.md)\n- [Visual Annotation Handoff](docs/VISUAL_ANNOTATION_HANDOFF.md)\n- [License Decision Notes](docs/LICENSE_DECISION.md)\n- [Release Checklist](docs/RELEASE_CHECKLIST.md)\n- [v0.2.0 Release Notes Draft](docs/releases/v0.2.0.md)\n- [Standalone Repo Setup](docs/STANDALONE_REPO_SETUP.md)\n- [v0.1.0 Release Notes Draft](docs/releases/v0.1.0.md)\n- [Contributing](CONTRIBUTING.md)\n\n## License\n\nApache-2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fetherman-os%2Fagent-desktop-harness","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fetherman-os%2Fagent-desktop-harness","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fetherman-os%2Fagent-desktop-harness/lists"}