{"id":22858443,"url":"https://github.com/openadaptai/openadapt","last_synced_at":"2026-03-04T04:03:49.341Z","repository":{"id":152921527,"uuid":"627024850","full_name":"OpenAdaptAI/OpenAdapt","owner":"OpenAdaptAI","description":"Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models","archived":false,"fork":false,"pushed_at":"2026-03-04T02:39:37.000Z","size":30652,"stargazers_count":1503,"open_issues_count":0,"forks_count":221,"subscribers_count":15,"default_branch":"main","last_synced_at":"2026-03-04T03:43:46.045Z","etag":null,"topics":["agents","ai-agents","ai-agents-framework","anthropic","computer-use","generative-process-automation","google-gemini","gpt4o","huggingface","large-action-model","large-language-models","large-multimodal-models","omniparser","openai","process-automation","process-mining","python","segment-anything","transformers","ultralytics"],"latest_commit_sha":null,"homepage":"https://www.OpenAdapt.AI","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenAdaptAI.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":"docs/roadmap-priorities.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["OpenAdaptAI"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":null,"custom":null}},"created_at":"2023-04-12T16:20:23.000Z","updated_at":"2026-03-04T02:22:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"2384de2b-6c7f-4163-b8b2-96a24b8c64bf","html_url":"https://github.com/OpenAdaptAI/OpenAdapt","commit_stats":null,"previous_names":["openadaptai/openadapt","mldsai/openadapt"],"tags_count":112,"template":false,"template_full_name":null,"purl":"pkg:github/OpenAdaptAI/OpenAdapt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOpenAdapt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOpenAdapt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOpenAdapt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOpenAdapt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenAdaptAI","download_url":"https://codeload.github.com/OpenAdaptAI/OpenAdapt/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOpenAdapt/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30071670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T03:25:38.285Z","status":"ssl_error","status_checked_at":"2026-03-04T03:25:05.086Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai-agents","ai-agents-framework","anthropic","computer-use","generative-process-automation","google-gemini","gpt4o","huggingface","large-action-model","large-language-models","large-multimodal-models","omniparser","openai","process-automation","process-mining","python","segment-anything","transformers","ultralytics"],"created_at":"2024-12-13T09:01:12.929Z","updated_at":"2026-03-04T04:03:49.335Z","avatar_url":"https://github.com/OpenAdaptAI.png","language":"Python","readme":"# OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)\n\n[![Build Status](https://github.com/OpenAdaptAI/OpenAdapt/actions/workflows/main.yml/badge.svg)](https://github.com/OpenAdaptAI/OpenAdapt/actions/workflows/main.yml)\n[![PyPI version](https://img.shields.io/pypi/v/openadapt.svg)](https://pypi.org/project/openadapt/)\n[![Downloads](https://img.shields.io/pypi/dm/openadapt.svg)](https://pypi.org/project/openadapt/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)\n[![Discord](https://img.shields.io/discord/1084481804896374814?color=7289da\u0026label=Discord\u0026logo=discord\u0026logoColor=white)](https://discord.gg/yF527cQbDG)\n\n**OpenAdapt** is the **open** source software **adapt**er between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.\n\nRecord GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.\n\n[Join us on Discord](https://discord.gg/yF527cQbDG) | [Documentation](https://docs.openadapt.ai) | [OpenAdapt.ai](https://openadapt.ai)\n\n---\n\n## Architecture\n\nOpenAdapt v1.0+ uses a **modular meta-package architecture**. The main `openadapt` package provides a unified CLI and depends on focused sub-packages via PyPI:\n\n| Package | Description | Repository |\n|---------|-------------|------------|\n| `openadapt` | Meta-package with unified CLI | This repo |\n| `openadapt-capture` | Event recording and storage | [openadapt-capture](https://github.com/OpenAdaptAI/openadapt-capture) |\n| `openadapt-ml` | ML engine, training, inference | [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml) |\n| `openadapt-evals` | Benchmark evaluation | [openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals) |\n| `openadapt-viewer` | HTML visualization | [openadapt-viewer](https://github.com/OpenAdaptAI/openadapt-viewer) |\n| `openadapt-grounding` | UI element localization | [openadapt-grounding](https://github.com/OpenAdaptAI/openadapt-grounding) |\n| `openadapt-retrieval` | Multimodal demo retrieval | [openadapt-retrieval](https://github.com/OpenAdaptAI/openadapt-retrieval) |\n| `openadapt-privacy` | PII/PHI scrubbing | [openadapt-privacy](https://github.com/OpenAdaptAI/openadapt-privacy) |\n| `openadapt-wright` | Dev automation | [openadapt-wright](https://github.com/OpenAdaptAI/openadapt-wright) |\n| `openadapt-herald` | Social media from git history | [openadapt-herald](https://github.com/OpenAdaptAI/openadapt-herald) |\n| `openadapt-crier` | Telegram approval bot | [openadapt-crier](https://github.com/OpenAdaptAI/openadapt-crier) |\n| `openadapt-consilium` | Multi-model consensus | [openadapt-consilium](https://github.com/OpenAdaptAI/openadapt-consilium) |\n| `openadapt-desktop` | Desktop GUI application | [openadapt-desktop](https://github.com/OpenAdaptAI/openadapt-desktop) |\n| `openadapt-tray` | System tray app | [openadapt-tray](https://github.com/OpenAdaptAI/openadapt-tray) |\n| `openadapt-agent` | Production execution engine | [openadapt-agent](https://github.com/OpenAdaptAI/openadapt-agent) |\n| `openadapt-telemetry` | Error tracking | [openadapt-telemetry](https://github.com/OpenAdaptAI/openadapt-telemetry) |\n\n---\n\n## Installation\n\nInstall what you need:\n\n```bash\npip install openadapt              # Minimal CLI only\npip install openadapt[capture]     # GUI capture/recording\npip install openadapt[ml]          # ML training and inference\npip install openadapt[evals]       # Benchmark evaluation\npip install openadapt[privacy]     # PII/PHI scrubbing\npip install openadapt[all]         # Everything\n```\n\n**Requirements:** Python 3.10+\n\n---\n\n## Quick Start\n\n### 1. Record a demonstration\n\n```bash\nopenadapt capture start --name my-task\n# Perform actions in your GUI, then press Ctrl+C to stop\n```\n\n### 2. Train a model\n\n```bash\nopenadapt train start --capture my-task --model qwen3vl-2b\n```\n\n### 3. Evaluate\n\n```bash\nopenadapt eval run --checkpoint training_output/model.pt --benchmark waa\n```\n\n### 4. View recordings\n\n```bash\nopenadapt capture view my-task\n```\n\n---\n\n## Ecosystem\n\n### Core Platform Components\n\n| Package | Description | Repository |\n|---------|-------------|------------|\n| `openadapt` | Meta-package with unified CLI | This repo |\n| `openadapt-capture` | Event recording and storage | [openadapt-capture](https://github.com/OpenAdaptAI/openadapt-capture) |\n| `openadapt-ml` | ML engine, training, inference | [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml) |\n| `openadapt-evals` | Benchmark evaluation | [openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals) |\n| `openadapt-viewer` | HTML visualization | [openadapt-viewer](https://github.com/OpenAdaptAI/openadapt-viewer) |\n| `openadapt-grounding` | UI element localization | [openadapt-grounding](https://github.com/OpenAdaptAI/openadapt-grounding) |\n| `openadapt-retrieval` | Multimodal demo retrieval | [openadapt-retrieval](https://github.com/OpenAdaptAI/openadapt-retrieval) |\n| `openadapt-privacy` | PII/PHI scrubbing | [openadapt-privacy](https://github.com/OpenAdaptAI/openadapt-privacy) |\n\n### Applications and Tools\n\n| Package | Description | Repository |\n|---------|-------------|------------|\n| `openadapt-desktop` | Desktop GUI application | [openadapt-desktop](https://github.com/OpenAdaptAI/openadapt-desktop) |\n| `openadapt-tray` | System tray app | [openadapt-tray](https://github.com/OpenAdaptAI/openadapt-tray) |\n| `openadapt-agent` | Production execution engine | [openadapt-agent](https://github.com/OpenAdaptAI/openadapt-agent) |\n| `openadapt-wright` | Dev automation | [openadapt-wright](https://github.com/OpenAdaptAI/openadapt-wright) |\n| `openadapt-herald` | Social media from git history | [openadapt-herald](https://github.com/OpenAdaptAI/openadapt-herald) |\n| `openadapt-crier` | Telegram approval bot | [openadapt-crier](https://github.com/OpenAdaptAI/openadapt-crier) |\n| `openadapt-consilium` | Multi-model consensus | [openadapt-consilium](https://github.com/OpenAdaptAI/openadapt-consilium) |\n| `openadapt-telemetry` | Error tracking | [openadapt-telemetry](https://github.com/OpenAdaptAI/openadapt-telemetry) |\n\n---\n\n## CLI Reference\n\n```\nopenadapt capture start --name \u003cname\u003e    Start recording\nopenadapt capture stop                    Stop recording\nopenadapt capture list                    List captures\nopenadapt capture view \u003cname\u003e             Open capture viewer\n\nopenadapt train start --capture \u003cname\u003e    Train model on capture\nopenadapt train status                    Check training progress\nopenadapt train stop                      Stop training\n\nopenadapt eval run --checkpoint \u003cpath\u003e    Evaluate trained model\nopenadapt eval run --agent api-claude     Evaluate API agent\nopenadapt eval mock --tasks 10            Run mock evaluation\n\nopenadapt serve --port 8080               Start dashboard server\nopenadapt version                         Show installed versions\nopenadapt doctor                          Check system requirements\n```\n\n---\n\n## How It Works\n\nSee the full [Architecture Evolution](docs/architecture-evolution.md) for detailed documentation.\n\n### Three-Phase Pipeline\n\nOpenAdapt follows a streamlined **Demonstrate → Learn → Execute** pipeline:\n\n**1. DEMONSTRATE (Observation Collection)**\n- **Capture**: Record user actions and screenshots with `openadapt-capture`\n- **Privacy**: Scrub PII/PHI from recordings with `openadapt-privacy`\n- **Store**: Build a searchable demonstration library\n\n**2. LEARN (Policy Acquisition)**\n- **Retrieval Path**: Embed demonstrations, index them, and enable semantic search\n- **Training Path**: Load demonstrations and fine-tune Vision-Language Models (VLMs)\n- **Abstraction**: Progress from literal replay to template-based automation\n\n**3. EXECUTE (Agent Deployment)**\n- **Observe**: Take screenshots and gather accessibility information\n- **Policy**: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL)\n- **Ground**: Map intentions to specific UI coordinates with `openadapt-grounding`\n- **Act**: Execute validated actions with safety gates\n- **Evaluate**: Measure success with `openadapt-evals` and feed results back for improvement\n\n### Core Approach: Trajectory-Conditioned Disambiguation\n\nZero-shot VLMs fail on GUI tasks not due to lack of capability, but due to **ambiguity in UI affordances**. OpenAdapt resolves this by conditioning agents on human demonstrations — \"show, don't tell.\"\n\n| | No Retrieval | With Retrieval |\n|---|---|---|\n| **No Fine-tuning** | 46.7% (zero-shot baseline) | **100%** first-action (n=45, shared entry point) |\n| **Fine-tuning** | Standard SFT (baseline) | **Demo-conditioned FT** (planned) |\n\nThe bottom-right cell is OpenAdapt's unique value: training models to **use** demonstrations they haven't seen before, combining retrieval with fine-tuning for maximum accuracy. Phase 2 (retrieval-only prompting) is validated; Phase 3 (demo-conditioned fine-tuning) is in progress.\n\n**Validated result**: On a controlled macOS benchmark (45 System Settings tasks sharing a common navigation entry point), demo-conditioned prompting improved first-action accuracy from 46.7% to 100%. A length-matched control (+11.1 pp only) confirms the benefit is semantic, not token-length. See the [research thesis](https://github.com/OpenAdaptAI/openadapt-ml/blob/main/docs/research_thesis.md) for methodology and the [publication roadmap](docs/publication-roadmap.md) for limitations.\n\n**Industry validation**: [OpenCUA](https://github.com/xlang-ai/OpenCUA) (NeurIPS 2025 Spotlight, XLANG Lab) [reused OpenAdapt's macOS accessibility capture code](https://arxiv.org/html/2508.09123v3) in their AgentNetTool, but uses demos only for model training — not runtime conditioning. No open-source CUA framework currently does demo-conditioned inference, which remains OpenAdapt's architectural differentiator.\n\n### Key Concepts\n\n- **Policy/Grounding Separation**: The Policy decides *what* to do; Grounding determines *where* to do it\n- **Safety Gate**: Runtime validation layer before action execution (confirm mode for high-risk actions)\n- **Abstraction Ladder**: Progressive generalization from literal replay to goal-level automation\n- **Evaluation-Driven Feedback**: Success traces become new training data\n\n---\n\n## Terminology\n\n| Term | Description |\n|------|-------------|\n| **Observation** | What the agent perceives (screenshot, accessibility tree) |\n| **Action** | What the agent does (click, type, scroll, etc.) |\n| **Trajectory** | Sequence of observation-action pairs |\n| **Demonstration** | Human-provided example trajectory |\n| **Policy** | Decision-making component that maps observations to actions |\n| **Grounding** | Mapping intent to specific UI elements (coordinates) |\n\n---\n\n## Demos\n\n**Legacy Version (v0.46.0) Examples:**\n- [Twitter Demo](https://twitter.com/abrichr/status/1784307190062342237) - Early OpenAdapt demonstration\n- [Loom Video](https://www.loom.com/share/9d77eb7028f34f7f87c6661fb758d1c0) - Process automation walkthrough\n\n*Note: These demos show the legacy monolithic version. For current v1.0+ modular architecture examples, see the [documentation](https://docs.openadapt.ai).*\n\n---\n\n## Permissions\n\n**macOS:** Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See [permissions guide](./legacy/permissions_in_macOS.md).\n\n**Windows:** Run as Administrator if needed for input capture.\n\n---\n\n## Legacy Version\n\nThe monolithic OpenAdapt codebase (v0.46.0) is preserved in the `legacy/` directory.\n\n**To use the legacy version:**\n```bash\npip install openadapt==0.46.0\n```\n\nSee [docs/LEGACY_FREEZE.md](docs/LEGACY_FREEZE.md) for migration guide and details.\n\n---\n\n## Contributing\n\n1. [Join Discord](https://discord.gg/yF527cQbDG)\n2. Pick an issue from the relevant sub-package repository\n3. Submit a PR\n\nFor sub-package development:\n```bash\ngit clone https://github.com/OpenAdaptAI/openadapt-ml  # or other sub-package\ncd openadapt-ml\npip install -e \".[dev]\"\n```\n\n---\n\n## Related Projects\n\n- [OpenAdaptAI/SoM](https://github.com/OpenAdaptAI/SoM) - Set-of-Mark prompting\n- [OpenAdaptAI/pynput](https://github.com/OpenAdaptAI/pynput) - Input monitoring fork\n- [OpenAdaptAI/atomacos](https://github.com/OpenAdaptAI/atomacos) - macOS accessibility\n\n---\n\n## Support\n\n- **Discord:** https://discord.gg/yF527cQbDG\n- **Issues:** Use the relevant sub-package repository\n- **Architecture docs:** [GitHub Wiki](https://github.com/OpenAdaptAI/OpenAdapt/wiki/OpenAdapt-Architecture-(draft))\n\n---\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n","funding_links":["https://github.com/sponsors/OpenAdaptAI"],"categories":["NLP"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenadaptai%2Fopenadapt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenadaptai%2Fopenadapt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenadaptai%2Fopenadapt/lists"}