{"id":29113427,"url":"https://github.com/openadaptai/omnimcp","last_synced_at":"2026-01-18T01:11:23.192Z","repository":{"id":282811536,"uuid":"949716872","full_name":"OpenAdaptAI/OmniMCP","owner":"OpenAdaptAI","description":"OmniMCP uses Microsoft OmniParser and Model Context Protocol (MCP) to provide AI models with rich UI context and powerful interaction capabilities.","archived":false,"fork":false,"pushed_at":"2025-04-08T01:16:41.000Z","size":25922,"stargazers_count":45,"open_issues_count":15,"forks_count":9,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-17T13:14:17.925Z","etag":null,"topics":["anthropic","aws","computeruse","gemini","generative-ai","model-context-protocol","omniparser","openai"],"latest_commit_sha":null,"homepage":"https://omnimcp.openadapt.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenAdaptAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-17T02:50:11.000Z","updated_at":"2025-06-09T01:36:24.000Z","dependencies_parsed_at":"2025-04-05T02:36:28.938Z","dependency_job_id":null,"html_url":"https://github.com/OpenAdaptAI/OmniMCP","commit_stats":null,"previous_names":["openadaptai/omnimcp"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OpenAdaptAI/OmniMCP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOmniMCP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOmniMCP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOmniMCP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOmniMCP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenAdaptAI","download_url":"https://codeload.github.com/OpenAdaptAI/OmniMCP/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenAdaptAI%2FOmniMCP/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262581351,"owners_count":23331912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic","aws","computeruse","gemini","generative-ai","model-context-protocol","omniparser","openai"],"created_at":"2025-06-29T11:05:31.855Z","updated_at":"2026-01-18T01:11:23.178Z","avatar_url":"https://github.com/OpenAdaptAI.png","language":"Python","funding_links":[],"categories":["Browser Automation"],"sub_categories":["How to Submit"],"readme":"# OmniMCP\n\n[![CI](https://github.com/OpenAdaptAI/OmniMCP/actions/workflows/ci.yml/badge.svg)](https://github.com/OpenAdaptAI/OmniMCP/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python Version](https://img.shields.io/badge/python-3.10%20|%203.11%20|%203.12-blue)](https://www.python.org/)\n[![Code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\nOmniMCP provides rich UI context and interaction capabilities to AI models through [Model Context Protocol (MCP)](https://github.com/modelcontextprotocol) and [microsoft/OmniParser](https://github.com/microsoft/OmniParser). It focuses on enabling deep understanding of user interfaces through visual analysis, structured planning, and precise interaction execution.\n\n## Core Features\n\n- **Visual Perception:** Understands UI elements using OmniParser.\n- **LLM Planning:** Plans next actions based on goal, history, and visual state.\n- **Agent Executor:** Orchestrates the perceive-plan-act loop (`omnimcp/agent_executor.py`).\n- **Action Execution:** Controls mouse/keyboard via `pynput` (`omnimcp/input.py`).\n- **CLI Interface:** Simple entry point (`cli.py`) for running tasks.\n- **Auto-Deployment:** Optional OmniParser server deployment to AWS EC2 with auto-shutdown.\n- **Debugging:** Generates timestamped visual logs per step.\n\n## Overview\n\n`cli.py` uses `AgentExecutor` to run a perceive-plan-act loop. It captures the screen (`VisualState`), plans using an LLM (`core.plan_action_for_ui`), and executes actions (`InputController`).\n\n### Demos\n\n- **Real Action (Calculator):** `python cli.py` opens Calculator and computes 5*9.\n  ![OmniMCP Real Action Demo GIF](images/omnimcp_demo.gif)\n- **Synthetic UI (Login):** `python demo_synthetic.py` uses generated images (no real I/O). *(Note: Pending refactor to use AgentExecutor).*\n  ![OmniMCP Synthetic Demo GIF](images/omnimcp_demo_synthetic.gif)\n\n## Prerequisites\n\n- Python \u003e=3.10, \u003c3.13\n- `uv` installed (`pip install uv`)\n- **Linux Runtime Requirement:** Requires an active graphical session (X11/Wayland) for `pynput`. May need system libraries (`libx11-dev`, etc.) - see `pynput` docs.\n\n*(macOS display scaling dependencies are handled automatically during installation).*\n\n### For AWS Deployment Features\n\nRequires AWS credentials in `.env` (see `.env.example`). **Warning:** Creates AWS resources (EC2, Lambda, etc.) incurring costs. Use `python -m omnimcp.omniparser.server stop` to clean up.\n\n```.env\nAWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY\nAWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY\nANTHROPIC_API_KEY=YOUR_ANTHROPIC_KEY\n# OMNIPARSER_URL=http://... # Optional: Skip auto-deploy\n```\n\n## Installation\n\n```bash\ngit clone [https://github.com/OpenAdaptAI/OmniMCP.git](https://github.com/OpenAdaptAI/OmniMCP.git)\ncd OmniMCP\n./install.sh # Creates .venv, installs deps incl. test extras\ncp .env.example .env\n# Edit .env with your keys\n# Activate: source .venv/bin/activate (Linux/macOS) or relevant Windows command\n```\n\n## Quick Start\n\nEnsure environment is activated and `.env` is configured.\n\n```bash\n# Run default goal (Calculator task)\npython cli.py\n\n# Run custom goal\npython cli.py --goal \"Your goal here\"\n\n# See options\npython cli.py --help\n```\nDebug outputs are saved in `runs/\u003ctimestamp\u003e/`.\n\n**Note on MCP Server:** An experimental MCP server (`OmniMCP` class in `omnimcp/mcp_server.py`) exists but is separate from the primary `cli.py`/`AgentExecutor` workflow.\n\n## Architecture\n\n1.  **CLI** (`cli.py`) - Entry point, setup, starts Executor.\n2.  **Agent Executor** (`omnimcp/agent_executor.py`) - Orchestrates loop, manages state/artifacts.\n3.  **Visual State Manager** (`omnimcp/visual_state.py`) - Perception (screenshot, calls parser).\n4.  **OmniParser Client \u0026 Deploy** (`omnimcp/omniparser/`) - Manages OmniParser server communication/deployment.\n5.  **LLM Planner** (`omnimcp/core.py`) - Generates action plan.\n6.  **Input Controller** (`omnimcp/input.py`) - Executes actions (mouse/keyboard).\n7.  **(Optional) MCP Server** (`omnimcp/mcp_server.py`) - Experimental MCP interface.\n\n## Development\n\n### Environment Setup \u0026 Checks\n```bash\n# Setup (if not done): ./install.sh\n# Activate env: source .venv/bin/activate (or similar)\n# Format/Lint: uv run ruff format . \u0026\u0026 uv run ruff check . --fix\n# Run tests: uv run pytest tests/\n```\n\n### Debug Support\nRunning `python cli.py` saves timestamped runs in `runs/`, including:\n* `step_N_state_raw.png`\n* `step_N_state_parsed.png` (with element boxes)\n* `step_N_action_highlight.png` (with action highlight)\n* `final_state.png`\n\nDetailed logs are in `logs/run_YYYY-MM-DD_HH-mm-ss.log` (`LOG_LEVEL=DEBUG` in `.env` recommended).\n\n\u003cdetails\u003e\n\u003csummary\u003eExample Log Snippet (Auto-Deploy + Agent Step)\u003c/summary\u003e\n\n```log\n# --- Initialization \u0026 Auto-Deploy ---\n2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.client:... - No server_url provided, attempting discovery/deployment...\n2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.server:... - Creating new EC2 instance...\n2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.server:... - Instance i-... is running. Public IP: ...\n2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.server:... - Setting up auto-shutdown infrastructure...\n2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.server:... - Auto-shutdown infrastructure setup completed...\n... (SSH connection, Docker setup) ...\n2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.client:... - Auto-deployment successful. Server URL: http://...\n... (Agent Executor Init) ...\n\n# --- Agent Execution Loop Example Step ---\n2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - --- Step N/10 ---\n2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Perceiving current screen state...\n2025-MM-DD HH:MM:SS | INFO     | omnimcp.visual_state:update:... - VisualState update complete. Found X elements. Took Y.YYs.\n2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - Perceived state with X elements.\n... (Save artifacts) ...\n2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Planning next action...\n... (LLM Call) ...\n2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - LLM Plan: Action=..., TargetID=..., GoalComplete=False\n2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Added to history: Step N: Planned action ...\n2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - Executing action: ...\n2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.agent_executor:run:... - Action executed successfully.\n2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Step N duration: Z.ZZs\n... (Loop continues or finishes) ...\n```\n*(Note: Details like timings, counts, IPs, instance IDs, and specific plans will vary)*\n\u003c/details\u003e\n\n## Roadmap \u0026 Limitations\n\nKey limitations \u0026 future work areas:\n\n* **Performance:** Reduce OmniParser latency (explore local models, caching, etc.) and optimize state management (avoid full re-parse).\n* **Robustness:** Improve LLM planning reliability (prompts, techniques like ReAct), add action verification/error recovery, enhance element targeting.\n* **Target API/Architecture:** Evolve towards a higher-level declarative API (e.g., `@omni.publish` style) and potentially integrate loop logic with the experimental MCP Server (`OmniMCP` class).\n* **Consistency:** Refactor `demo_synthetic.py` to use `AgentExecutor`.\n* **Features:** Expand action space (drag/drop, hover).\n* **Testing:** Add E2E tests, broaden cross-platform validation, define evaluation metrics.\n* **Research:** Explore fine-tuning, process graphs (RAG), framework integration.\n\n## Project Status\n\nCore loop via `cli.py`/`AgentExecutor` is functional for basic tasks. Performance and robustness need significant improvement. MCP integration is experimental.\n\n## Contributing\n\n1. Fork repository\n2. Create feature branch\n3. Implement changes \u0026 add tests\n4. Ensure checks pass (`uv run ruff format .`, `uv run ruff check . --fix`, `uv run pytest tests/`)\n5. Submit pull request\n\n## License\n\nMIT License\n\n## Contact\n\n- Issues: [GitHub Issues](https://github.com/OpenAdaptAI/OmniMCP/issues)\n- Questions: [Discussions](https://github.com/OpenAdaptAI/OmniMCP/discussions)\n- Security: security@openadapt.ai\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenadaptai%2Fomnimcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenadaptai%2Fomnimcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenadaptai%2Fomnimcp/lists"}