{"id":51339623,"url":"https://github.com/semcod/imgl","last_synced_at":"2026-07-02T06:04:37.416Z","repository":{"id":363422296,"uuid":"1263257006","full_name":"semcod/imgl","owner":"semcod","description":"Image to Layout — screenshot OCR and semantic UI reconstruction","archived":false,"fork":false,"pushed_at":"2026-06-18T11:24:32.000Z","size":31808,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-18T12:27:49.675Z","etag":null,"topics":["automation","gui","html","layout","ocr","python","screenshot","semcod","svg","vql"],"latest_commit_sha":null,"homepage":"https://semcod.github.io/imgl/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/semcod.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-08T19:23:19.000Z","updated_at":"2026-06-18T12:11:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/semcod/imgl","commit_stats":null,"previous_names":["semcod/imgl"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/semcod/imgl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fimgl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fimgl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fimgl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fimgl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/semcod","download_url":"https://codeload.github.com/semcod/imgl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fimgl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35035001,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-02T02:00:06.368Z","response_time":173,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","gui","html","layout","ocr","python","screenshot","semcod","svg","vql"],"created_at":"2026-07-02T06:04:36.524Z","updated_at":"2026-07-02T06:04:37.408Z","avatar_url":"https://github.com/semcod.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![img.png](img.png)\n\n# ImgL - Image to Layout — convert screenshots into semantic UI models with OCR text and element bounding boxes.\n\n\n## AI Cost Tracking\n\n![AI Cost](https://img.shields.io/badge/AI%20Cost-$45.42-orange) ![AI Model](https://img.shields.io/badge/AI%20Model-openrouter%2Fqwen%2Fqwen3-coder-next-lightgrey)\n\nThis project uses AI-generated code. Total cost: **$45.4235** with **17** AI commits.\n\nGenerated on 2026-06-29 using [openrouter/qwen/qwen3-coder-next](https://openrouter.ai/models/openrouter/qwen/qwen3-coder-next)\n\n---\n\n## Installation\n\n```bash\npip install -e .              # from repo\npip install -e \".[capture]\"   # mss (X11 fallback)\npip install -e \".[diagnose]\"   # numpy for img2nl (install img2nl locally)\npip install -e \".[full]\"      # capture + diagnose + dev + llm + web\n\n# Local siblings (not on PyPI) — mirror capture on Wayland:\nmake install-dev              # .[dev,llm,capture] + vdisplay when ~/github/wronai/vdisplay exists\nimgl install vdisplay         # pip install -e ~/github/wronai/vdisplay[pillow]\npip install -e ~/github/wronai/vdisplay[pillow]   # same as above\npip install -e ~/github/wronai/img2nl[analyze]\npip install -e ~/github/oqlos/vql\npip install -e ~/github/oqlos/vql/packages/img2vql\n```\n\nFor `uri2vql adopt-imgl`, install imgl in the same venv as uri2vql:\n\n```bash\npip install -e ~/github/semcod/imgl\n# or: pip install -e ~/github/oqlos/vql/packages/uri2vql[imgl]\n```\n\nSystem dependency for OCR:\n\n```bash\n# Debian/Ubuntu\nsudo apt install tesseract-ocr tesseract-ocr-pol\n\n# macOS\nbrew install tesseract tesseract-lang\n```\n\nDevelopment install:\n\n```bash\npip install -e \".[dev]\"\npip install -e \".[llm]\"    # vision LLM catalog (OpenRouter)\n```\n\n### Makefile (szybki start)\n\n```bash\nmake help              # lista komend\nmake install-full      # imgl + capture + llm + control + web\nmake capture-interactive  # vdisplay mirror → screen.png (portal fallback na Wayland)\nmake doctor-full FORMAT=markdown\nmake execute-llm PROMPT='wpisz test w Chat input'\nmake demo-key          # dsl2imgl KEY ctrl+Return (dry-run)\nmake demo-chat         # wpisz w Chat input + ctrl+enter (dry-run)\nmake serve-rest        # rest2imgl :8219\nmake serve-web         # imgl serve :8008\nmake test-dsl2imgl     # testy Fazy 4 (Schema/Protobuf/ES)\n```\n\nIntegracja z **Koru**: `cd ~/github/semcod/koru \u0026\u0026 make install-imgl-bridge`\n\n## Documentation\n\n| Temat | Link |\n|-------|------|\n| Indeks | [docs/README.md](docs/README.md) |\n| Capture (mirror, portal, `--analyze`) | [docs/capture.md](docs/capture.md) |\n| VQL eksport i vdisplay provenance | [docs/vql-export.md](docs/vql-export.md) |\n| Architektura (imgl / vdisplay / vql) | [docs/architecture.md](docs/architecture.md) |\n| Warstwa kontroli `*2imgl` | [docs/control-layer.md](docs/control-layer.md) |\n| NL ze shell (chat input, Enter/Ctrl+Enter) | [docs/nl-shell-examples.md](docs/nl-shell-examples.md) |\n| Głos + przeglądarka | [docs/voice-browser.md](docs/voice-browser.md) |\n| Web UI (port 8008) | [docs/web-ui.md](docs/web-ui.md) |\n| Paczki kontroli | [packages/README.md](packages/README.md) |\n\n## Examples\n\nPełna dokumentacja z przykładami dla różnych systemów, aplikacji i konfiguracji:\n\n**[examples/README.md](examples/README.md)**\n\n| Temat | Link |\n|-------|------|\n| GNOME/Wayland | [examples/platforms/gnome-wayland](examples/platforms/gnome-wayland/README.md) |\n| Wybór okna / wycinki | [examples/workflows/window-picker](examples/workflows/window-picker/README.md) |\n| GitHub w przeglądarce | [examples/applications/github-browser](examples/applications/github-browser/README.md) |\n| IDE (Windsurf/VS Code) | [examples/applications/ide-editor](examples/applications/ide-editor/README.md) |\n| LLM per okno | [examples/configurations/per-window-llm](examples/configurations/per-window-llm/README.md) |\n| NL → URI (nlp2uri) | [examples/integrations/nlp2uri](examples/integrations/nlp2uri/README.md) |\n| Integracja uri2vql | [examples/integrations/uri2vql](examples/integrations/uri2vql/README.md) |\n| Pętla agenta | [examples/workflows/multi-step-agent](examples/workflows/multi-step-agent/README.md) |\n| Capture → VQL → akcja | [examples/workflows/capture-to-action](examples/workflows/capture-to-action/README.md) |\n| Web UI (port 8008) | [examples/workflows/web-ui](examples/workflows/web-ui/README.md) |\n\nSzybkie demo:\n\n```bash\nexamples/scripts/demo-windows.sh screen.png\nexamples/scripts/demo-nlp2uri.py screen.png region-top\n```\n\n## Usage\n\n### Python API\n\n```python\nfrom imgl import analyze, scene_to_json\n\nscene = analyze(\"screen.png\", lang=\"eng+pol\")\nprint(scene_to_json(scene))\n```\n\n### CLI\n\n```bash\n# Use an existing screenshot (recommended on GNOME/Wayland):\nimgl diagnose /tmp/screen.png\nimgl vql /tmp/screen.png -o layout.vql.json\n\n# Capture (vdisplay mirror wbudowany w imgl[capture] — bez dialogu GNOME):\nmake install-dev                              # vdisplay + mss w extra capture\nmake capture-interactive                      # mirror capture → screen.png\nmake capture-analyze                          # + VQL + .capture.json\nimgl capture -o screen.png --verify           # to samo bez make\nimgl capture -o screen.png --verify --analyze # capture + VQL + provenance w jednym kroku\nimgl capture --portal -o screen.png           # fallback: GNOME region picker\n\nimgl diagnose screen.png            # must show worth_analyzing: true\n\n# analyze / export (aborts on blank unless --allow-blank)\nimgl analyze /tmp/screen.png --json\nimgl analyze screen.png -o screen.imgl.json --lang eng+pol\nimgl html screen.png -o screen.html --embed-image\nimgl svg screen.png --mode overlay -o screen.svg\nimgl svg screen.png --mode wireframe -o screen.svg\nimgl vql screen.png -o layout.vql.json --with-grid\n```\n\n### Web UI (manual + agent, port 8008)\n\n```bash\npip install -e \".[web,llm,capture]\"\nimgl serve --port 8008\n# z wykonaniem na pulpicie i LLM:\nimgl serve --port 8008 --execute --llm --capture-on-start\n```\n\nOtwórz http://127.0.0.1:8008 — podgląd zrzutu z numerami, lista akcji z miniaturkami, NL i pętla agenta (capture → act → capture).\n\nSzczegóły: [docs/web-ui.md](docs/web-ui.md), [docs/voice-browser.md](docs/voice-browser.md).\n\n### Control layer (REST / DSL / NL, port 8219)\n\nSterowanie z zewnątrz (shell, curl, MCP, asystent głosowy):\n\n```bash\nmake install-control   # imgl install control\nmake capture-analyze                          # zalecane: capture + VQL\nmake capture-interactive                      # lub: imgl capture -o screen.png --verify\nmake serve-rest        # http://127.0.0.1:8219\n\n# DSL\ndsl2imgl exec 'KEY ctrl+Return EXECUTE 0'\ndsl2imgl exec 'TYPE \"hello\" IN \"Chat input\" IMAGE screen.png WINDOW region-bottom EXECUTE 0'\n\n# NL\nnlp2imgl apply \"wpisz opisz projekt w Chat input\" --image screen.png --window region-bottom\nnlp2imgl apply \"naciśnij ctrl+enter\" --execute\n```\n\nZ **Koru** (w `koru/.venv`, nie `imgl/.venv`):\n\n```bash\ncd ~/github/semcod/koru \u0026\u0026 make install-imgl-bridge\nmake imgl-capture imgl-chat\nkoru imgl execute \"wpisz test w Chat input\" --window region-bottom --dry-run\n```\n\nPełne przykłady: [docs/nl-shell-examples.md](docs/nl-shell-examples.md), [docs/control-layer.md](docs/control-layer.md), [docs/vql-export.md](docs/vql-export.md).\n\n### Window discovery (regiony na zrzucie)\n\nNa złożonych zrzutach (przeglądarka + IDE) najpierw wybierz region:\n\n```bash\nimgl windows screen.png --export-crops --annotate --open\n# → screen.region-top.png, screen.region-bottom.png (+ .numbered.png)\n\nimgl interact screen.png --llm --window region-top    # GitHub\nimgl interact screen.png --llm --window region-bottom # IDE\n```\n\nInteraktywny wybór okna (gdy jest \u003e1 region):\n\n```bash\nimgl interact screen.png --llm\n# → lista okien → wpisz numer (1, 2) lub \"podglad\"\n```\n\n### Interactive shell (pick action from catalog)\n\n```bash\nimgl interact /tmp/screen.png -o layout.vql.json\n# numer opcji, NL: \"kliknij Save\", \"mapa\", \"lista\", \"okna\", \"quit\"\n# obraz z numerami:\nimgl annotate screen.png --open\nimgl interact screen.png --annotate --open\n# filtr szumu OCR (domyślnie włączony):\nimgl interact screen.png\n# vision LLM (OPENROUTER_API_KEY + pip install -e \".[llm]\"):\nimgl interact screen.png --llm --window region-top --annotate --open\n# wykonanie na pulpicie (Linux, xdotool/ydotool):\nimgl interact /tmp/screen.png --execute\n```\n\nURI DSL (`vql://window/imgl?action=...`):\n\n| action | opis |\n|--------|------|\n| `analyze` | OCR + layout → VQL JSON (domyślne) |\n| `list` | lista elementów interaktywnych |\n| `annotate` | PNG ze zrzutu + numerowane ramki |\n| `click` | `text=`, `element_id=`, `window=` |\n| `type` | `value=`, `label=`, `text=` |\n\nVia `uri2vql` (when installed):\n\n```bash\nuri2vql query 'vql://window/imgl?image=/tmp/screen.png\u0026file=layout.vql.json\u0026lang=eng'\nuri2vql query 'vql://window/imgl?image=/tmp/screen.png\u0026file=layout.vql.json\u0026action=list'\nuri2vql query 'vql://window/imgl?image=/tmp/screen.png\u0026file=layout.vql.json\u0026action=click\u0026text=Save'\n# For Polish+English OCR in URI use encoded plus: lang=eng%2Bpol\n```\n\nNL → URI (`nlp2uri` / `imgl` built-in):\n\n```bash\n# w shellu imgl interact: \"kliknij Save\", \"wpisz test w search\", \"2\", \"lista\"\n```\n\n### HTML / SVG export\n\n```python\nfrom imgl import analyze, scene_to_html, scene_to_svg\n\nscene = analyze(\"screen.png\")\nhtml = scene_to_html(scene, embed_image=True)\nsvg = scene_to_svg(scene, mode=\"overlay\", background=\"screen.png\")\n```\n\nHTML uses absolutely positioned elements with `data-type`, `data-id`, `data-text` attributes\nfor text-based automation (`button[data-text=\"Save\"]`).\n\nSVG supports `wireframe` (flat debug view) and `overlay` (boxes on top of screenshot).\n\n## Output format\n\n`analyze()` returns a `Scene` with:\n\n- `windows` — detected UI windows/panels (local heuristics or optional `img2vql`)\n- `elements` — classified UI elements: `button`, `input`, `label`, `text`, `toolbar`\n- `ocr_boxes` — raw OCR word boxes with confidence scores\n\nExample JSON:\n\n```json\n{\n  \"version\": \"1.0\",\n  \"scene\": {\"width\": 800, \"height\": 600, \"source_image\": \"screen.png\"},\n  \"windows\": [{\n    \"id\": \"win-screen\",\n    \"bbox\": {\"x\": 0, \"y\": 0, \"w\": 800, \"h\": 600},\n    \"title\": null,\n    \"z\": 0,\n    \"elements\": [\n      {\"id\": \"text-0\", \"type\": \"text\", \"text\": \"Save\", \"bbox\": {\"x\": 100, \"y\": 50, \"w\": 40, \"h\": 16}}\n    ]\n  }],\n  \"ocr_boxes\": [],\n  \"metadata\": {\"ocr_backend\": \"tesseract\", \"lang\": \"eng+pol\"}\n}\n```\n\n## Configuration\n\n```python\nfrom imgl import ImglConfig, analyze\n\nscene = analyze(\"screen.png\", config=ImglConfig(\n    lang=\"eng+pol\",\n    use_img2vql=True,      # use img2vql when installed, else local detect\n    detect_inputs=True,\n    label_proximity_px=40,\n))\n```\n\n### VQL export\n\n```python\nfrom imgl import analyze, scene_to_vql, write_vql_program\n\nscene = analyze(\"screen.png\")  # metadata.capture + window_os gdy vdisplay + sidecar\nprogram = scene_to_vql(scene, include_grid=True, grid=12)\nwrite_vql_program(scene, \"layout.vql.json\")\n```\n\nLayers: `windows`, `ui_elements` (OCR text + optional `app_label` from vdisplay), `text_regions`, optional `screen_regions`.\n\nSidecar files: `screen.capture.json` (provenance), cache `layout.vql.imgl.json`. See [docs/vql-export.md](docs/vql-export.md).\n\n### Text-based actions\n\n```python\nfrom imgl import analyze, actions\n\nscene = analyze(\"screen.png\")\nui = actions(scene)\n\nui.click(\"button\", text=\"Save\")\n# {\"action\": \"click\", \"x\": 310, \"y\": 206, ...}\n\nui.type_into(\"alice\", label=\"Username\")\n# {\"action\": \"type\", \"x\": 245, \"y\": 99, \"text\": \"alice\", ...}\n```\n\nCLI:\n\n```bash\nimgl find screen.png --type button --text Save --click\nimgl find screen.png --label Username --type-into alice\nimgl find screen.png --list\n```\n\n## Roadmap\n\nZobacz [TODO.md](TODO.md).\n\n- uri2vql: `window_scope` w handlerze `vql://window/imgl`\n- `dsl2imgl` Faza 4: JSON Schema + Protobuf + EventStore\n- Web UI: mikrofon (Web Speech API), akcja KEY w panelu\n- koru desktop bridge for action execution\n\n## License\n\nLicensed under Apache-2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemcod%2Fimgl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsemcod%2Fimgl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemcod%2Fimgl/lists"}