https://github.com/openadaptai/openadapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://github.com/openadaptai/openadapt

agents ai-agents ai-agents-framework anthropic computer-use generative-process-automation google-gemini gpt4o huggingface large-action-model large-language-models large-multimodal-models omniparser openai process-automation process-mining python segment-anything transformers ultralytics

Last synced: 21 days ago
JSON representation

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Host: GitHub
URL: https://github.com/openadaptai/openadapt
Owner: OpenAdaptAI
License: mit
Created: 2023-04-12T16:20:23.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2025-03-16T23:00:22.000Z (11 months ago)
Last Synced: 2025-07-01T13:56:44.341Z (7 months ago)
Topics: agents, ai-agents, ai-agents-framework, anthropic, computer-use, generative-process-automation, google-gemini, gpt4o, huggingface, large-action-model, large-language-models, large-multimodal-models, omniparser, openai, process-automation, process-mining, python, segment-anything, transformers, ultralytics
Language: Python
Homepage: https://www.OpenAdapt.AI
Size: 28.9 MB
Stars: 1,314
Watchers: 14
Forks: 190
Open Issues: 417
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

awesome-ChatGPT-repositories - OpenAdapt - Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models (NLP)

README

          # OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

[![Build Status](https://github.com/OpenAdaptAI/OpenAdapt/actions/workflows/main.yml/badge.svg)](https://github.com/OpenAdaptAI/OpenAdapt/actions/workflows/main.yml)

[![PyPI version](https://img.shields.io/pypi/v/openadapt.svg)](https://pypi.org/project/openadapt/)

[![Downloads](https://img.shields.io/pypi/dm/openadapt.svg)](https://pypi.org/project/openadapt/)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)

**OpenAdapt** is the **open** source software **adapt**er between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

[Join us on Discord](https://discord.gg/yF527cQbDG) | [Documentation](https://docs.openadapt.ai) | [OpenAdapt.ai](https://openadapt.ai)

---

## Architecture

OpenAdapt v1.0+ uses a **modular meta-package architecture**. The main `openadapt` package provides a unified CLI and depends on focused sub-packages via PyPI:

| Package | Description | Repository |

|---------|-------------|------------|

| `openadapt` | Meta-package with unified CLI | This repo |

| `openadapt-capture` | Event recording and storage | [openadapt-capture](https://github.com/OpenAdaptAI/openadapt-capture) |

| `openadapt-ml` | ML engine, training, inference | [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml) |

| `openadapt-evals` | Benchmark evaluation | [openadapt-evals](https://github.com/OpenAdaptAI/openadapt-evals) |

| `openadapt-viewer` | HTML visualization | [openadapt-viewer](https://github.com/OpenAdaptAI/openadapt-viewer) |

| `openadapt-grounding` | UI element localization | [openadapt-grounding](https://github.com/OpenAdaptAI/openadapt-grounding) |

| `openadapt-retrieval` | Multimodal demo retrieval | [openadapt-retrieval](https://github.com/OpenAdaptAI/openadapt-retrieval) |

| `openadapt-privacy` | PII/PHI scrubbing | [openadapt-privacy](https://github.com/OpenAdaptAI/openadapt-privacy) |

---

## Installation

Install what you need:

```bash

pip install openadapt              # Minimal CLI only

pip install openadapt[capture]     # GUI capture/recording

pip install openadapt[ml]          # ML training and inference

pip install openadapt[evals]       # Benchmark evaluation

pip install openadapt[privacy]     # PII/PHI scrubbing

pip install openadapt[all]         # Everything

```

**Requirements:** Python 3.10+

---

## Quick Start

### 1. Record a demonstration

```bash

openadapt capture start --name my-task

# Perform actions in your GUI, then press Ctrl+C to stop

```

### 2. Train a model

```bash

openadapt train start --capture my-task --model qwen3vl-2b

```

### 3. Evaluate

```bash

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

```

### 4. View recordings

```bash

openadapt capture view my-task

```

---

## CLI Reference

```

openadapt capture start --name     Start recording

openadapt capture stop                    Stop recording

openadapt capture list                    List captures

openadapt capture view              Open capture viewer

openadapt train start --capture     Train model on capture

openadapt train status                    Check training progress

openadapt train stop                      Stop training

openadapt eval run --checkpoint     Evaluate trained model

openadapt eval run --agent api-claude     Evaluate API agent

openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server

openadapt version                         Show installed versions

openadapt doctor                          Check system requirements

```

---

## How It Works

See the full [Architecture Evolution](docs/architecture-evolution.md) for detailed documentation.

### Three-Phase Pipeline

```mermaid

flowchart TB

    %% ═══════════════════════════════════════════════════════════════════════

    %% DATA SOURCES (Multi-Source Ingestion)

    %% ═══════════════════════════════════════════════════════════════════════

    subgraph DataSources["Data Sources"]

        direction LR

        HUMAN["Human Demos"]

        SYNTH["Synthetic Data"]:::future

        BENCH_DATA["Benchmark Tasks"]

    end

    %% ═══════════════════════════════════════════════════════════════════════

    %% PHASE 1: DEMONSTRATE (Observation Collection)

    %% ═══════════════════════════════════════════════════════════════════════

    subgraph Demonstrate["1. DEMONSTRATE (Observation Collection)"]

        direction TB

        CAP["Capture
openadapt-capture"]

        PRIV["Privacy
openadapt-privacy"]

        STORE[("Demo Library")]

        CAP --> PRIV

        PRIV --> STORE

    end

    %% ═══════════════════════════════════════════════════════════════════════

    %% PHASE 2: LEARN (Policy Acquisition)

    %% ═══════════════════════════════════════════════════════════════════════

    subgraph Learn["2. LEARN (Policy Acquisition)"]

        direction TB

        subgraph RetrievalPath["Retrieval Path"]

            EMB["Embed"]

            IDX["Index"]

            SEARCH["Search"]

            EMB --> IDX --> SEARCH

        end

        subgraph TrainingPath["Training Path"]

            LOADER["Load"]

            TRAIN["Train"]

            CKPT[("Checkpoint")]

            LOADER --> TRAIN --> CKPT

        end

        subgraph ProcessMining["Process Mining"]

            ABSTRACT["Abstract"]:::future

            PATTERNS["Patterns"]:::future

            ABSTRACT --> PATTERNS

        end

    end

    %% ═══════════════════════════════════════════════════════════════════════

    %% PHASE 3: EXECUTE (Agent Deployment)

    %% ═══════════════════════════════════════════════════════════════════════

    subgraph Execute["3. EXECUTE (Agent Deployment)"]

        direction TB

        subgraph AgentCore["Agent Core"]

            OBS["Observe"]

            POLICY["Policy
(Demo-Conditioned)"]

            GROUND["Grounding
openadapt-grounding"]

            ACT["Act"]

            OBS --> POLICY

            POLICY --> GROUND

            GROUND --> ACT

        end

        subgraph SafetyGate["Safety Gate"]

            VALIDATE["Validate"]

            CONFIRM["Confirm"]:::future

            VALIDATE --> CONFIRM

        end

        subgraph Evaluation["Evaluation"]

            EVALS["Evals
openadapt-evals"]

            METRICS["Metrics"]

            EVALS --> METRICS

        end

        ACT --> VALIDATE

        VALIDATE --> EVALS

    end

    %% ═══════════════════════════════════════════════════════════════════════

    %% THE ABSTRACTION LADDER (Side Panel)

    %% ═══════════════════════════════════════════════════════════════════════

    subgraph AbstractionLadder["Abstraction Ladder"]

        direction TB

        L0["Literal
(Raw Events)"]

        L1["Symbolic
(Semantic Actions)"]

        L2["Template
(Parameterized)"]

        L3["Semantic
(Intent)"]:::future

        L4["Goal
(Task Spec)"]:::future

        L0 --> L1

        L1 --> L2

        L2 -.-> L3

        L3 -.-> L4

    end

    %% ═══════════════════════════════════════════════════════════════════════

    %% MODEL LAYER

    %% ═══════════════════════════════════════════════════════════════════════

    subgraph Models["Model Layer (VLMs)"]

        direction TB

        subgraph APIModels["API Models"]

            direction LR

            CLAUDE["Claude"]

            GPT["GPT-4o"]

            GEMINI["Gemini"]

        end

        subgraph OpenSource["Open Source / Fine-tuned"]

            direction LR

            QWEN3["Qwen3-VL"]

            UITARS["UI-TARS"]

            OPENCUA["OpenCUA"]

        end

    end

    %% ═══════════════════════════════════════════════════════════════════════

    %% MAIN DATA FLOW

    %% ═══════════════════════════════════════════════════════════════════════

    %% Data sources feed into phases

    HUMAN --> CAP

    SYNTH -.-> LOADER

    BENCH_DATA --> EVALS

    %% Demo library feeds learning

    STORE --> EMB

    STORE --> LOADER

    STORE -.-> ABSTRACT

    %% Learning outputs feed execution

    SEARCH -->|"demo context"| POLICY

    CKPT -->|"trained policy"| POLICY

    PATTERNS -.->|"templates"| POLICY

    %% Model connections

    POLICY --> Models

    GROUND --> Models

    %% ═══════════════════════════════════════════════════════════════════════

    %% FEEDBACK LOOPS (Evaluation-Driven)

    %% ═══════════════════════════════════════════════════════════════════════

    METRICS -->|"success traces"| STORE

    METRICS -.->|"training signal"| TRAIN

    %% Retrieval in BOTH training AND evaluation

    SEARCH -->|"eval conditioning"| EVALS

    %% ═══════════════════════════════════════════════════════════════════════

    %% STYLING

    %% ═══════════════════════════════════════════════════════════════════════

    %% Phase colors

    classDef phase1 fill:#3498DB,stroke:#1A5276,color:#fff

    classDef phase2 fill:#27AE60,stroke:#1E8449,color:#fff

    classDef phase3 fill:#9B59B6,stroke:#6C3483,color:#fff

    %% Component states

    classDef implemented fill:#2ECC71,stroke:#1E8449,color:#fff

    classDef future fill:#95A5A6,stroke:#707B7C,color:#fff,stroke-dasharray: 5 5

    classDef futureBlock fill:#f5f5f5,stroke:#95A5A6,stroke-dasharray: 5 5

    classDef safetyBlock fill:#E74C3C,stroke:#A93226,color:#fff

    %% Model layer

    classDef models fill:#F39C12,stroke:#B7950B,color:#fff

    %% Apply styles

    class CAP,PRIV,STORE phase1

    class EMB,IDX,SEARCH,LOADER,TRAIN,CKPT phase2

    class OBS,POLICY,GROUND,ACT,VALIDATE,EVALS,METRICS phase3

    class CLAUDE,GPT,GEMINI,QWEN models

    class L0,L1,L2 implemented

```

### Core Approach: Demo-Conditioned Prompting

OpenAdapt explores **demonstration-conditioned automation** - "show, don't tell":

| Traditional Agent | OpenAdapt Agent |

|-------------------|-----------------|

| User writes prompts | User records demonstration |

| Ambiguous instructions | Grounded in actual UI |

| Requires prompt engineering | Reduced prompt engineering |

| Context-free | Context from similar demos |

**Retrieval powers BOTH training AND evaluation**: Similar demonstrations are retrieved as context for the VLM. In early experiments on a controlled macOS benchmark, this improved first-action accuracy from 46.7% to 100% - though all 45 tasks in that benchmark share the same navigation entry point. See the [publication roadmap](docs/publication-roadmap.md) for methodology and limitations.

### Key Concepts

- **Policy/Grounding Separation**: The Policy decides *what* to do; Grounding determines *where* to do it

- **Safety Gate**: Runtime validation layer before action execution (confirm mode for high-risk actions)

- **Abstraction Ladder**: Progressive generalization from literal replay to goal-level automation

- **Evaluation-Driven Feedback**: Success traces become new training data

**Legend:** Solid = Implemented | Dashed = Future

---

## Terminology

| Term | Description |

|------|-------------|

| **Observation** | What the agent perceives (screenshot, accessibility tree) |

| **Action** | What the agent does (click, type, scroll, etc.) |

| **Trajectory** | Sequence of observation-action pairs |

| **Demonstration** | Human-provided example trajectory |

| **Policy** | Decision-making component that maps observations to actions |

| **Grounding** | Mapping intent to specific UI elements (coordinates) |

---

## Demos

- https://twitter.com/abrichr/status/1784307190062342237

- https://www.loom.com/share/9d77eb7028f34f7f87c6661fb758d1c0

---

## Permissions

**macOS:** Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See [permissions guide](./legacy/permissions_in_macOS.md).

**Windows:** Run as Administrator if needed for input capture.

---

## Legacy Version

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the `legacy/` directory.

**To use the legacy version:**

```bash

pip install openadapt==0.46.0

```

See [docs/LEGACY_FREEZE.md](docs/LEGACY_FREEZE.md) for migration guide and details.

---

## Contributing

1. [Join Discord](https://discord.gg/yF527cQbDG)

2. Pick an issue from the relevant sub-package repository

3. Submit a PR

For sub-package development:

```bash

git clone https://github.com/OpenAdaptAI/openadapt-ml  # or other sub-package

cd openadapt-ml

pip install -e ".[dev]"

```

---

## Related Projects

- [OpenAdaptAI/SoM](https://github.com/OpenAdaptAI/SoM) - Set-of-Mark prompting

- [OpenAdaptAI/pynput](https://github.com/OpenAdaptAI/pynput) - Input monitoring fork

- [OpenAdaptAI/atomacos](https://github.com/OpenAdaptAI/atomacos) - macOS accessibility

---

## Support

- **Discord:** https://discord.gg/yF527cQbDG

- **Issues:** Use the relevant sub-package repository

- **Architecture docs:** [GitHub Wiki](https://github.com/OpenAdaptAI/OpenAdapt/wiki/OpenAdapt-Architecture-(draft))

---

## License

MIT License - see [LICENSE](LICENSE) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/openadaptai/openadapt

Awesome Lists containing this project

README