{"id":37080582,"url":"https://github.com/openadaptai/openadapt-ml","last_synced_at":"2026-03-03T04:18:32.957Z","repository":{"id":329030679,"uuid":"1111893790","full_name":"OpenAdaptAI/openadapt-ml","owner":"OpenAdaptAI","description":"OpenAdapt’s open-source ML toolkit for training and evaluating general multimodal GUI-action models.","archived":false,"fork":false,"pushed_at":"2026-02-18T04:55:45.000Z","size":7849,"stargazers_count":2,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-18T09:44:12.298Z","etag":null,"topics":["computer-use","gui-automation","machine-learning","openadapt","python","vlm"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/openadapt-ml/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenAdaptAI.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-07T20:37:03.000Z","updated_at":"2026-02-18T04:55:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/OpenAdaptAI/openadapt-ml","commit_stats":null,"previous_names":["openadaptai/openadapt-ml"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/OpenAdaptAI/openadapt-ml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2Fopenadapt-ml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2Fopenadapt-ml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2Fopenadapt-ml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2Fopenadapt-ml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenAdaptAI","download_url":"https://codeload.github.com/OpenAdaptAI/openadapt-ml/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2Fopenadapt-ml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29802865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-24T21:02:39.706Z","status":"ssl_error","status_checked_at":"2026-02-24T21:02:21.834Z","response_time":75,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-use","gui-automation","machine-learning","openadapt","python","vlm"],"created_at":"2026-01-14T09:46:56.190Z","updated_at":"2026-02-24T22:02:06.827Z","avatar_url":"https://github.com/OpenAdaptAI.png","language":"Python","readme":"# OpenAdapt-ML\n\n[![Tests](https://github.com/OpenAdaptAI/openadapt-ml/actions/workflows/test.yml/badge.svg)](https://github.com/OpenAdaptAI/openadapt-ml/actions/workflows/test.yml)\n[![PyPI version](https://img.shields.io/pypi/v/openadapt-ml.svg)](https://pypi.org/project/openadapt-ml/)\n[![Downloads](https://img.shields.io/pypi/dm/openadapt-ml.svg)](https://pypi.org/project/openadapt-ml/)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**The ML engine for [OpenAdapt](https://github.com/OpenAdaptAI/OpenAdapt) -- open-source desktop automation with demo-conditioned AI agents.**\n\nOpenAdapt-ML provides the GUI-specific ML layer for training and running vision-language model (VLM) agents that automate desktop tasks. It handles everything between raw screen recordings and a production policy API: canonical schemas for GUI trajectories, VLM adapters, supervised fine-tuning with TRL + Unsloth, grounding, and demo-conditioned inference.\n\n## Demos\n\n**Synthetic Login** -- Qwen3-VL-2B fine-tuned on synthetic UI scenarios:\n\n![Login Demo](experiments/qwen_login/login_demo.gif)\n![Registration Demo](experiments/qwen_login/registration_demo.gif)\n\n## Key Features\n\n- **GUI trajectory schemas** -- Pydantic models for Episodes, Steps, Actions, and Observations with JSON Schema export and format converters (WAA, WebArena)\n- **VLM adapters** -- Unified interface for Qwen3-VL, Qwen2.5-VL, Claude, GPT, and Gemini with automatic device selection (CUDA / MPS / CPU)\n- **Supervised fine-tuning** -- TRL SFTTrainer with Unsloth optimizations for 2x faster training and 50% less VRAM via LoRA adapters\n- **Runtime policy API** -- `AgentPolicy` that predicts the next GUI action (`CLICK`, `TYPE`, `DONE`) from a screenshot and goal\n- **Demo-conditioned inference** -- Retrieval-augmented prompting using recorded demonstrations for trajectory-conditioned disambiguation\n- **Grounding module** -- Locate UI elements via Gemini vision API, oracle bounding boxes, or Set-of-Marks (SoM) overlays\n- **Cloud GPU training** -- One-command training pipelines for Lambda Labs and Azure\n- **Synthetic data generation** -- Configurable UI scenarios (login, registration) with layout jitter for rapid iteration\n\n## Installation\n\n```bash\n# Core package\npip install openadapt-ml\n\n# With training dependencies (TRL + datasets)\npip install openadapt-ml[training]\n\n# With API-backed VLMs (Claude, GPT)\npip install openadapt-ml[api]\n\n# Development (from source)\ngit clone https://github.com/OpenAdaptAI/openadapt-ml.git\ncd openadapt-ml\nuv sync\n```\n\n## Quick Start\n\n### Run a smoke test\n\n```bash\n# Model-free policy demo (no GPU required)\nuv run python -m openadapt_ml.scripts.demo_policy --backend dummy\n```\n\n### Train on synthetic data\n\n```bash\n# Fine-tune Qwen3-VL on synthetic login scenario\nuv run python -m openadapt_ml.scripts.train \\\n  --config configs/qwen3vl_synthetic.yaml\n```\n\n### Train on real recordings\n\n```bash\n# Record a workflow with openadapt-capture, then train\nuv run python -m openadapt_ml.scripts.train \\\n  --config configs/qwen3vl_capture.yaml \\\n  --capture ~/captures/my-workflow \\\n  --open  # Opens training dashboard in browser\n```\n\n### End-to-end benchmark (train + eval + plot)\n\n```bash\nuv run python -m openadapt_ml.scripts.run_qwen_login_benchmark \\\n  --config configs/qwen3vl_synthetic_dev.yaml \\\n  --out-dir experiments/qwen_login/2b_dev\n```\n\n### Use the policy API\n\n```python\nfrom openadapt_ml.runtime.policy import AgentPolicy\nfrom openadapt_ml.models.qwen_vl import QwenVLAdapter\n\nadapter = QwenVLAdapter(model_name=\"Qwen/Qwen3-VL-2B-Instruct\")\npolicy = AgentPolicy(adapter)\n\n# Given an SFT-style sample (screenshot + goal + chat history):\noutput = policy.predict(sample)\nprint(output.action)   # Action(type=CLICK, coordinates={\"x\": 0.45, \"y\": 0.71})\nprint(output.thought)  # \"Click the Login button\"\n```\n\n### Use the schema\n\n```python\nfrom openadapt_ml.schema import Episode, Step, Action, Observation, ActionType\n\nepisode = Episode(\n    episode_id=\"demo_001\",\n    instruction=\"Open Notepad and type Hello World\",\n    steps=[\n        Step(\n            step_index=0,\n            observation=Observation(screenshot_path=\"step_0.png\"),\n            action=Action(type=ActionType.CLICK, coordinates={\"x\": 100, \"y\": 200}),\n        ),\n        Step(\n            step_index=1,\n            observation=Observation(screenshot_path=\"step_1.png\"),\n            action=Action(type=ActionType.TYPE, text=\"Hello World\"),\n        ),\n    ],\n    success=True,\n)\n```\n\n## Architecture\n\n```\nopenadapt_ml/\n├── schema/              # Episode, Step, Action, Observation (Pydantic models)\n│   ├── episode.py       #   Core dataclasses + JSON Schema export\n│   └── converters.py    #   WAA/WebArena format converters\n├── models/              # VLM adapters\n│   ├── base_adapter.py  #   BaseVLMAdapter ABC\n│   ├── qwen_vl.py       #   Qwen3-VL, Qwen2.5-VL\n│   ├── api_adapter.py   #   Claude, GPT (inference-only)\n│   └── dummy_adapter.py #   Fake adapter for testing\n├── training/            # Fine-tuning pipeline\n│   ├── trl_trainer.py   #   TRL SFTTrainer + Unsloth\n│   ├── trainer.py       #   Training orchestration\n│   └── viewer.py        #   Training dashboard (HTML)\n├── runtime/             # Inference\n│   ├── policy.py        #   AgentPolicy (screenshot -\u003e action)\n│   └── safety_gate.py   #   Action safety checks\n├── datasets/            # Data loading\n│   └── next_action.py   #   Episodes -\u003e SFT chat samples\n├── ingest/              # Data ingestion\n│   ├── synthetic.py     #   Synthetic UI generation\n│   ├── capture.py       #   openadapt-capture loader\n│   └── loader.py        #   Generic episode loader\n├── grounding/           # UI element localization\n│   ├── base.py          #   OracleGrounder, GroundingModule ABC\n│   └── detector.py      #   GeminiGrounder, SoM overlays\n├── retrieval/           # Demo-conditioned inference\n│   ├── retriever.py     #   Demo retrieval for RAG prompting\n│   └── embeddings.py    #   Screenshot/action embeddings\n├── benchmarks/          # ML-specific benchmark agents\n│   └── agent.py         #   PolicyAgent, APIBenchmarkAgent, UnifiedBaselineAgent\n├── cloud/               # Cloud GPU training\n│   ├── lambda_labs.py   #   Lambda Labs integration\n│   ├── local.py         #   Local training (CUDA/MPS)\n│   └── ssh_tunnel.py    #   SSH tunnel management\n├── segmentation/        # Recording segmentation pipeline\n├── evals/               # Evaluation metrics (grounding, trajectory matching)\n├── config.py            # Settings via pydantic-settings\n└── scripts/             # CLI entry points (train, eval, compare, demo)\n```\n\n## Benchmark Results\n\n### Synthetic Login (Qwen3-VL-2B with Set-of-Marks)\n\n| Metric                | Score    |\n|-----------------------|----------|\n| Action Type Accuracy  | **100%** |\n| Element Accuracy      | **100%** |\n| Episode Success Rate  | **100%** |\n\n### Multi-Model Comparison (Synthetic Login, coordinate mode)\n\n| Model               | Action Accuracy | Coord Error | Click Hit Rate |\n|----------------------|-----------------|-------------|----------------|\n| Qwen3-VL-2B FT      | 0.469           | 0.051       | 0.850          |\n| Qwen3-VL-8B FT      | 0.286           | 0.004       | 1.000          |\n| Claude Sonnet 4.5    | 0.121           | 0.757       | 0.000          |\n| GPT-5.1              | 0.183           | 0.057       | 0.600          |\n\n\u003e These are results on a controlled synthetic benchmark with ~3 UI elements. They validate that the training pipeline works, not real-world performance. Evaluation on standard benchmarks (WAA, WebArena) is ongoing via [openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals).\n\n## Cloud GPU Training\n\n### Lambda Labs\n\n```bash\nexport LAMBDA_API_KEY=your_key_here\n\n# One-command: launch, train, download, terminate\nuv run python -m openadapt_ml.cloud.lambda_labs train \\\n  --capture ~/captures/my-workflow \\\n  --goal \"Turn off Night Shift in System Settings\"\n```\n\n### Local (CUDA / Apple Silicon)\n\n```bash\nuv run python -m openadapt_ml.cloud.local train \\\n  --capture ~/captures/my-workflow --open\n```\n\n## Ecosystem\n\nOpenAdapt-ML is one component in the OpenAdapt stack:\n\n| Package | Purpose |\n|---------|---------|\n| **[openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml)** | ML engine: schemas, VLM adapters, training, inference, grounding |\n| **[openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals)** | Evaluation infrastructure: VM management, pool orchestration, benchmark runners, `oa-vm` CLI |\n| **[openadapt-capture](https://github.com/OpenAdaptAI/openadapt-capture)** | Lightweight GUI recording and demo sharing |\n| **[OpenAdapt](https://github.com/OpenAdaptAI/OpenAdapt)** | Desktop automation platform (end-user application) |\n\n\u003e Looking for benchmark evaluation, Azure VM management, or the `oa-vm` CLI? Those live in [openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals).\n\n## Documentation\n\n- [`docs/design.md`](docs/design.md) -- System design (schemas, adapters, training, runtime)\n- [`docs/cloud_gpu_training.md`](docs/cloud_gpu_training.md) -- Lambda Labs and Azure training guide\n- [`docs/qwen_login_experiment.md`](docs/qwen_login_experiment.md) -- Synthetic benchmark reproduction\n- [`docs/gemini_grounding.md`](docs/gemini_grounding.md) -- Grounding module documentation\n\n## Contributing\n\n```bash\n# Clone and install dev dependencies\ngit clone https://github.com/OpenAdaptAI/openadapt-ml.git\ncd openadapt-ml\nuv sync --extra dev --extra training\n\n# Run tests\nuv run pytest\n\n# Lint\nuv run ruff check .\n```\n\nWe use [Angular-style commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`, `docs:`, etc.) with [Python Semantic Release](https://python-semantic-release.readthedocs.io/) for automated versioning and PyPI publishing.\n\n## License\n\n[MIT](LICENSE)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenadaptai%2Fopenadapt-ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenadaptai%2Fopenadapt-ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenadaptai%2Fopenadapt-ml/lists"}