https://github.com/openadaptai/openadapt
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://github.com/openadaptai/openadapt
agents ai-agents ai-agents-framework anthropic computer-use generative-process-automation google-gemini gpt4o huggingface large-action-model large-language-models large-multimodal-models omniparser openai process-automation process-mining python segment-anything transformers ultralytics
Last synced: 19 days ago
JSON representation
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
- Host: GitHub
- URL: https://github.com/openadaptai/openadapt
- Owner: OpenAdaptAI
- License: mit
- Created: 2023-04-12T16:20:23.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2025-03-16T23:00:22.000Z (11 months ago)
- Last Synced: 2025-07-01T13:56:44.341Z (7 months ago)
- Topics: agents, ai-agents, ai-agents-framework, anthropic, computer-use, generative-process-automation, google-gemini, gpt4o, huggingface, large-action-model, large-language-models, large-multimodal-models, omniparser, openai, process-automation, process-mining, python, segment-anything, transformers, ultralytics
- Language: Python
- Homepage: https://www.OpenAdapt.AI
- Size: 28.9 MB
- Stars: 1,314
- Watchers: 14
- Forks: 190
- Open Issues: 417
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-ChatGPT-repositories - OpenAdapt - Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models (NLP)
README
# OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)
[](https://github.com/OpenAdaptAI/OpenAdapt/actions/workflows/main.yml)
[](https://pypi.org/project/openadapt/)
[](https://pypi.org/project/openadapt/)
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
**OpenAdapt** is the **open** source software **adapt**er between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.
Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.
[Join us on Discord](https://discord.gg/yF527cQbDG) | [Documentation](https://docs.openadapt.ai) | [OpenAdapt.ai](https://openadapt.ai)
---
## Architecture
OpenAdapt v1.0+ uses a **modular meta-package architecture**. The main `openadapt` package provides a unified CLI and depends on focused sub-packages via PyPI:
| Package | Description | Repository |
|---------|-------------|------------|
| `openadapt` | Meta-package with unified CLI | This repo |
| `openadapt-capture` | Event recording and storage | [openadapt-capture](https://github.com/OpenAdaptAI/openadapt-capture) |
| `openadapt-ml` | ML engine, training, inference | [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml) |
| `openadapt-evals` | Benchmark evaluation | [openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals) |
| `openadapt-viewer` | HTML visualization | [openadapt-viewer](https://github.com/OpenAdaptAI/openadapt-viewer) |
| `openadapt-grounding` | UI element localization | [openadapt-grounding](https://github.com/OpenAdaptAI/openadapt-grounding) |
| `openadapt-retrieval` | Multimodal demo retrieval | [openadapt-retrieval](https://github.com/OpenAdaptAI/openadapt-retrieval) |
| `openadapt-privacy` | PII/PHI scrubbing | [openadapt-privacy](https://github.com/OpenAdaptAI/openadapt-privacy) |
---
## Installation
Install what you need:
```bash
pip install openadapt # Minimal CLI only
pip install openadapt[capture] # GUI capture/recording
pip install openadapt[ml] # ML training and inference
pip install openadapt[evals] # Benchmark evaluation
pip install openadapt[privacy] # PII/PHI scrubbing
pip install openadapt[all] # Everything
```
**Requirements:** Python 3.10+
---
## Quick Start
### 1. Record a demonstration
```bash
openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop
```
### 2. Train a model
```bash
openadapt train start --capture my-task --model qwen3vl-2b
```
### 3. Evaluate
```bash
openadapt eval run --checkpoint training_output/model.pt --benchmark waa
```
### 4. View recordings
```bash
openadapt capture view my-task
```
---
## CLI Reference
```
openadapt capture start --name Start recording
openadapt capture stop Stop recording
openadapt capture list List captures
openadapt capture view Open capture viewer
openadapt train start --capture Train model on capture
openadapt train status Check training progress
openadapt train stop Stop training
openadapt eval run --checkpoint Evaluate trained model
openadapt eval run --agent api-claude Evaluate API agent
openadapt eval mock --tasks 10 Run mock evaluation
openadapt serve --port 8080 Start dashboard server
openadapt version Show installed versions
openadapt doctor Check system requirements
```
---
## How It Works
See the full [Architecture Evolution](docs/architecture-evolution.md) for detailed documentation.
### Three-Phase Pipeline
```mermaid
flowchart TB
%% ═══════════════════════════════════════════════════════════════════════
%% DATA SOURCES (Multi-Source Ingestion)
%% ═══════════════════════════════════════════════════════════════════════
subgraph DataSources["Data Sources"]
direction LR
HUMAN["Human Demos"]
SYNTH["Synthetic Data"]:::future
BENCH_DATA["Benchmark Tasks"]
end
%% ═══════════════════════════════════════════════════════════════════════
%% PHASE 1: DEMONSTRATE (Observation Collection)
%% ═══════════════════════════════════════════════════════════════════════
subgraph Demonstrate["1. DEMONSTRATE (Observation Collection)"]
direction TB
CAP["Capture
openadapt-capture"]
PRIV["Privacy
openadapt-privacy"]
STORE[("Demo Library")]
CAP --> PRIV
PRIV --> STORE
end
%% ═══════════════════════════════════════════════════════════════════════
%% PHASE 2: LEARN (Policy Acquisition)
%% ═══════════════════════════════════════════════════════════════════════
subgraph Learn["2. LEARN (Policy Acquisition)"]
direction TB
subgraph RetrievalPath["Retrieval Path"]
EMB["Embed"]
IDX["Index"]
SEARCH["Search"]
EMB --> IDX --> SEARCH
end
subgraph TrainingPath["Training Path"]
LOADER["Load"]
TRAIN["Train"]
CKPT[("Checkpoint")]
LOADER --> TRAIN --> CKPT
end
subgraph ProcessMining["Process Mining"]
ABSTRACT["Abstract"]:::future
PATTERNS["Patterns"]:::future
ABSTRACT --> PATTERNS
end
end
%% ═══════════════════════════════════════════════════════════════════════
%% PHASE 3: EXECUTE (Agent Deployment)
%% ═══════════════════════════════════════════════════════════════════════
subgraph Execute["3. EXECUTE (Agent Deployment)"]
direction TB
subgraph AgentCore["Agent Core"]
OBS["Observe"]
POLICY["Policy
(Demo-Conditioned)"]
GROUND["Grounding
openadapt-grounding"]
ACT["Act"]
OBS --> POLICY
POLICY --> GROUND
GROUND --> ACT
end
subgraph SafetyGate["Safety Gate"]
VALIDATE["Validate"]
CONFIRM["Confirm"]:::future
VALIDATE --> CONFIRM
end
subgraph Evaluation["Evaluation"]
EVALS["Evals
openadapt-evals"]
METRICS["Metrics"]
EVALS --> METRICS
end
ACT --> VALIDATE
VALIDATE --> EVALS
end
%% ═══════════════════════════════════════════════════════════════════════
%% THE ABSTRACTION LADDER (Side Panel)
%% ═══════════════════════════════════════════════════════════════════════
subgraph AbstractionLadder["Abstraction Ladder"]
direction TB
L0["Literal
(Raw Events)"]
L1["Symbolic
(Semantic Actions)"]
L2["Template
(Parameterized)"]
L3["Semantic
(Intent)"]:::future
L4["Goal
(Task Spec)"]:::future
L0 --> L1
L1 --> L2
L2 -.-> L3
L3 -.-> L4
end
%% ═══════════════════════════════════════════════════════════════════════
%% MODEL LAYER
%% ═══════════════════════════════════════════════════════════════════════
subgraph Models["Model Layer (VLMs)"]
direction TB
subgraph APIModels["API Models"]
direction LR
CLAUDE["Claude"]
GPT["GPT-4o"]
GEMINI["Gemini"]
end
subgraph OpenSource["Open Source / Fine-tuned"]
direction LR
QWEN3["Qwen3-VL"]
UITARS["UI-TARS"]
OPENCUA["OpenCUA"]
end
end
%% ═══════════════════════════════════════════════════════════════════════
%% MAIN DATA FLOW
%% ═══════════════════════════════════════════════════════════════════════
%% Data sources feed into phases
HUMAN --> CAP
SYNTH -.-> LOADER
BENCH_DATA --> EVALS
%% Demo library feeds learning
STORE --> EMB
STORE --> LOADER
STORE -.-> ABSTRACT
%% Learning outputs feed execution
SEARCH -->|"demo context"| POLICY
CKPT -->|"trained policy"| POLICY
PATTERNS -.->|"templates"| POLICY
%% Model connections
POLICY --> Models
GROUND --> Models
%% ═══════════════════════════════════════════════════════════════════════
%% FEEDBACK LOOPS (Evaluation-Driven)
%% ═══════════════════════════════════════════════════════════════════════
METRICS -->|"success traces"| STORE
METRICS -.->|"training signal"| TRAIN
%% Retrieval in BOTH training AND evaluation
SEARCH -->|"eval conditioning"| EVALS
%% ═══════════════════════════════════════════════════════════════════════
%% STYLING
%% ═══════════════════════════════════════════════════════════════════════
%% Phase colors
classDef phase1 fill:#3498DB,stroke:#1A5276,color:#fff
classDef phase2 fill:#27AE60,stroke:#1E8449,color:#fff
classDef phase3 fill:#9B59B6,stroke:#6C3483,color:#fff
%% Component states
classDef implemented fill:#2ECC71,stroke:#1E8449,color:#fff
classDef future fill:#95A5A6,stroke:#707B7C,color:#fff,stroke-dasharray: 5 5
classDef futureBlock fill:#f5f5f5,stroke:#95A5A6,stroke-dasharray: 5 5
classDef safetyBlock fill:#E74C3C,stroke:#A93226,color:#fff
%% Model layer
classDef models fill:#F39C12,stroke:#B7950B,color:#fff
%% Apply styles
class CAP,PRIV,STORE phase1
class EMB,IDX,SEARCH,LOADER,TRAIN,CKPT phase2
class OBS,POLICY,GROUND,ACT,VALIDATE,EVALS,METRICS phase3
class CLAUDE,GPT,GEMINI,QWEN models
class L0,L1,L2 implemented
```
### Core Approach: Demo-Conditioned Prompting
OpenAdapt explores **demonstration-conditioned automation** - "show, don't tell":
| Traditional Agent | OpenAdapt Agent |
|-------------------|-----------------|
| User writes prompts | User records demonstration |
| Ambiguous instructions | Grounded in actual UI |
| Requires prompt engineering | Reduced prompt engineering |
| Context-free | Context from similar demos |
**Retrieval powers BOTH training AND evaluation**: Similar demonstrations are retrieved as context for the VLM. In early experiments on a controlled macOS benchmark, this improved first-action accuracy from 46.7% to 100% - though all 45 tasks in that benchmark share the same navigation entry point. See the [publication roadmap](docs/publication-roadmap.md) for methodology and limitations.
### Key Concepts
- **Policy/Grounding Separation**: The Policy decides *what* to do; Grounding determines *where* to do it
- **Safety Gate**: Runtime validation layer before action execution (confirm mode for high-risk actions)
- **Abstraction Ladder**: Progressive generalization from literal replay to goal-level automation
- **Evaluation-Driven Feedback**: Success traces become new training data
**Legend:** Solid = Implemented | Dashed = Future
---
## Terminology
| Term | Description |
|------|-------------|
| **Observation** | What the agent perceives (screenshot, accessibility tree) |
| **Action** | What the agent does (click, type, scroll, etc.) |
| **Trajectory** | Sequence of observation-action pairs |
| **Demonstration** | Human-provided example trajectory |
| **Policy** | Decision-making component that maps observations to actions |
| **Grounding** | Mapping intent to specific UI elements (coordinates) |
---
## Demos
- https://twitter.com/abrichr/status/1784307190062342237
- https://www.loom.com/share/9d77eb7028f34f7f87c6661fb758d1c0
---
## Permissions
**macOS:** Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See [permissions guide](./legacy/permissions_in_macOS.md).
**Windows:** Run as Administrator if needed for input capture.
---
## Legacy Version
The monolithic OpenAdapt codebase (v0.46.0) is preserved in the `legacy/` directory.
**To use the legacy version:**
```bash
pip install openadapt==0.46.0
```
See [docs/LEGACY_FREEZE.md](docs/LEGACY_FREEZE.md) for migration guide and details.
---
## Contributing
1. [Join Discord](https://discord.gg/yF527cQbDG)
2. Pick an issue from the relevant sub-package repository
3. Submit a PR
For sub-package development:
```bash
git clone https://github.com/OpenAdaptAI/openadapt-ml # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"
```
---
## Related Projects
- [OpenAdaptAI/SoM](https://github.com/OpenAdaptAI/SoM) - Set-of-Mark prompting
- [OpenAdaptAI/pynput](https://github.com/OpenAdaptAI/pynput) - Input monitoring fork
- [OpenAdaptAI/atomacos](https://github.com/OpenAdaptAI/atomacos) - macOS accessibility
---
## Support
- **Discord:** https://discord.gg/yF527cQbDG
- **Issues:** Use the relevant sub-package repository
- **Architecture docs:** [GitHub Wiki](https://github.com/OpenAdaptAI/OpenAdapt/wiki/OpenAdapt-Architecture-(draft))
---
## License
MIT License - see [LICENSE](LICENSE) for details.