{"id":50326560,"url":"https://github.com/frankbria/semantic-gui-control","last_synced_at":"2026-05-29T06:30:58.592Z","repository":{"id":358942923,"uuid":"1243803887","full_name":"frankbria/semantic-gui-control","owner":"frankbria","description":"A cross-platform, text-first control layer that exposes GUIs to agents as structured affordances  instead of pixels. Discovers the interface, normalizes it, executes through a small command vocabulary, and verifies state changes. Vision is the spare tire, not the steering wheel.","archived":false,"fork":false,"pushed_at":"2026-05-19T21:02:23.000Z","size":162,"stargazers_count":0,"open_issues_count":5,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-19T21:06:23.353Z","etag":null,"topics":["accessibility","agent-tools","agentic-ai","ai","ai-agents","automation","computer-use","cross-platform","desktop-automation","gui-automation","llm","llm-agents","llm-tools","mcp","python","rpa","ui-automation","uiautomation","windows"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/frankbria.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/roadmap-blunt-wins.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-19T17:20:17.000Z","updated_at":"2026-05-19T21:02:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/frankbria/semantic-gui-control","commit_stats":null,"previous_names":["frankbria/semantic-gui-control"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/frankbria/semantic-gui-control","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankbria%2Fsemantic-gui-control","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankbria%2Fsemantic-gui-control/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankbria%2Fsemantic-gui-control/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankbria%2Fsemantic-gui-control/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/frankbria","download_url":"https://codeload.github.com/frankbria/semantic-gui-control/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frankbria%2Fsemantic-gui-control/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33640627,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accessibility","agent-tools","agentic-ai","ai","ai-agents","automation","computer-use","cross-platform","desktop-automation","gui-automation","llm","llm-agents","llm-tools","mcp","python","rpa","ui-automation","uiautomation","windows"],"created_at":"2026-05-29T06:30:58.067Z","updated_at":"2026-05-29T06:30:58.582Z","avatar_url":"https://github.com/frankbria.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Semantic GUI Control Layer (SGCL)\n\nA text-first, cross-platform control layer for agentic interaction with graphical user interfaces.\n\n## Thesis\n\nAgents should not primarily operate GUIs through screenshots and coordinate clicks. SGCL should:\n\n1. **Discover** the usable interface layer from the environment (accessibility trees, DOM, OS automation APIs, keyboard traversal, app APIs).\n2. **Normalize** it into structured affordances.\n3. **Expose** a small standard command vocabulary.\n4. **Execute** actions through platform adapters.\n5. **Verify** state changes.\n6. **Fall back** to vision/OCR only when semantic paths are broken or incomplete.\n\n\u003e Vision is the spare tire, not the steering wheel.\n\n## Current status\n\n**Discovery / spike phase.** No production code yet. Planning and architecture only.\n\nThe first executable milestone (Phase 0) targets a Windows UIA observer that can list windows and dump an active window's control tree as JSON. Windows is a convenient first spike; the core model is intentionally cross-platform.\n\n## Blunt-win roadmap\n\nCoarse learning milestones. Each one must produce a working capability, a documented constraint, or a killed assumption. See [`docs/roadmap-blunt-wins.md`](docs/roadmap-blunt-wins.md) for detail.\n\n| # | Win | Question it answers |\n|---|-----|---------------------|\n| 1 | Observe | Can we expose a real desktop GUI as structured text without screenshots? |\n| 2 | Normalize | Can we hide UIA/AX/AT-SPI/DOM differences behind a common schema? |\n| 3 | Find | Can an agent find the thing it means without knowing screen coordinates? |\n| 4 | Read | Can the system read enough state to support agent reasoning and verification? |\n| 5 | Act | Can we perform basic actions through the affordance layer rather than pixels? |\n| 6 | Verify | Can every action return evidence, not just \"I clicked it\"? |\n| 7 | Risk | Can the system avoid becoming a blind automation monkey on committing actions? |\n| 8 | Repair \u0026 Fallback | Can the system recover from broken accessibility trees? |\n| 9 | Cross-Platform Adapter Contract | Did we build a real abstraction, or just rename Windows UIA? |\n| 10 | Agent Loop | Can an LLM use SGCL to complete a tiny task through structured state only? |\n\n## Documentation\n\n| Doc | Purpose |\n|-----|---------|\n| [`docs/project-thesis.md`](docs/project-thesis.md) | Problem, thesis, non-goals, guiding principles |\n| [`docs/roadmap-blunt-wins.md`](docs/roadmap-blunt-wins.md) | The 10 blunt wins, with exit criteria |\n| [`docs/architecture-overview.md`](docs/architecture-overview.md) | Conceptual architecture and adapter model |\n| [`docs/command-vocabulary.md`](docs/command-vocabulary.md) | Standard agent-facing commands |\n| [`docs/affordance-model.md`](docs/affordance-model.md) | Normalized affordance schema |\n| [`docs/risk-model.md`](docs/risk-model.md) | Risk classes and default policy |\n| [`docs/use-cases.md`](docs/use-cases.md) | Initial target use cases |\n| [`docs/phase-0-observe-spike.md`](docs/phase-0-observe-spike.md) | Detailed plan for the first spike |\n| [`docs/phase-1-normalize-spike.md`](docs/phase-1-normalize-spike.md) | Normalize planning |\n| [`docs/phase-2-find-read-spike.md`](docs/phase-2-find-read-spike.md) | Find + Read planning |\n| [`docs/phase-3-act-verify-risk-spike.md`](docs/phase-3-act-verify-risk-spike.md) | Act + Verify + Risk planning |\n| [`docs/open-questions.md`](docs/open-questions.md) | Unresolved questions |\n| [`docs/decisions/`](docs/decisions/) | Architecture Decision Records |\n| [`docs/github-issues-seed.md`](docs/github-issues-seed.md) | Copy-paste GitHub issue bodies for the first 7 wins |\n| [`spikes/`](spikes/) | Results of each exploratory spike |\n\nLegacy reference docs (kept for context, superseded by the above):\n\n- [`docs/level-1-spec.md`](docs/level-1-spec.md) — early system spec\n- [`docs/cross-platform-strategy.md`](docs/cross-platform-strategy.md) — adapter strategy notes\n- [`docs/development-sequence.md`](docs/development-sequence.md) — earlier phase sequence\n\n## Local development\n\nNothing to run yet. The proposed package shape is:\n\n```\nsgcl/\n  core/        # platform-neutral schemas, vocabulary, verifier, risk\n  adapters/    # windows_uia, macos_ax, linux_atspi, browser_dom, vision_ocr\n  cli.py       # `sgcl` entry point\n```\n\nThe first spike (Phase 0) will likely use Python with `pywinauto` or `uiautomation` on Windows. Setup steps will be documented once they exist.\n\n## Recommended invocation on Windows\n\nAlways use `sgcl --output PATH ...` (or pipe to `Out-File -Encoding utf8`) instead of `\u003e file.json` or `| Tee-Object file.json`. Phase 1 confirmed that PowerShell's default `[Console]::OutputEncoding` mangles non-ASCII bytes when sgcl's UTF-8 stdout flows through the pipe; `--output` writes the file directly from Python in UTF-8 and avoids the round-trip. See `docs/windows-claude-setup.md` for the optional one-time PowerShell profile additions that also fix interactive command output.\n\n## Working metaphor\n\nA terminal for the visual operating environment. Not because everything becomes text, but because the GUI becomes inspectable, commandable, and verifiable.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrankbria%2Fsemantic-gui-control","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrankbria%2Fsemantic-gui-control","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrankbria%2Fsemantic-gui-control/lists"}