https://github.com/web-infra-dev/midscene

Driving all platforms UI automation with vision-based model
https://github.com/web-infra-dev/midscene

ai ai-test browser-use computer-use gpt-operator javascript phone-use testing

Last synced: 2 days ago
JSON representation

Driving all platforms UI automation with vision-based model

Host: GitHub
URL: https://github.com/web-infra-dev/midscene
Owner: web-infra-dev
License: mit
Created: 2024-07-23T04:03:50.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2026-03-02T07:17:33.000Z (2 days ago)
Last Synced: 2026-03-02T07:30:33.978Z (2 days ago)
Topics: ai, ai-test, browser-use, computer-use, gpt-operator, javascript, phone-use, testing
Language: TypeScript
Homepage: https://midscenejs.com
Size: 415 MB
Stars: 11,867
Watchers: 66
Forks: 860
Open Issues: 71
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - web-infra-dev/midscene - infra-dev团队开发，并开源在GitHub上。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
AiTreasureBox - web-infra-dev/midscene - 11-03_10565_0](https://img.shields.io/github/stars/web-infra-dev/midscene.svg)|An AI-powered automation SDK can control the page, perform assertions, and extract data in JSON format using natural language.| (Repos)
awesome-hacking-lists - web-infra-dev/midscene - Let AI be your browser operator. (HTML)
awesome - web-infra-dev/midscene - Driving all platforms UI automation with vision-based model (<a name="TypeScript"></a>TypeScript)
my-awesome-list - midscene - based model | web-infra-dev | 11718 | (TypeScript)
awesome - web-infra-dev/midscene - Driving all platforms UI automation with vision-based model (TypeScript)
awesome-ai - midscene - Let AI be your browser operator (Uncategorized / Uncategorized)

README

          


  



Midscene.js



English | [简体中文](./README.zh.md)

Official Website: https://midscenejs.com/







  Driving all platforms UI automation with vision-based model





  

  

  

  

  

  

  

    

  



## 📣 Control Any Platform with Midscene Skills

> Use [Midscene Skills](https://github.com/web-infra-dev/midscene-skills) to control any platform.

## Showcases

autonomously register the GitHub form in a web browser and pass all field validations.

Plus these real-world showcases:

* [iOS Automation - Meituan coffee order](https://midscenejs.com/showcases#ios)

* [iOS Automation - Auto-like the first @midscene_ai tweet](https://midscenejs.com/showcases#ios)

* [Android Automation - DCar: Xiaomi SU7 specs](https://midscenejs.com/showcases#android)

* [Android Automation - Booking a hotel for Christmas](https://midscenejs.com/showcases#android)

* [MCP Integration - Midscene MCP UI prepatch release](https://midscenejs.com/showcases#mcp)

See more real-world showcases — click to explore: [showcases](https://midscenejs.com/showcases)

Community showcase: [robotic arm + vision + voice for in-vehicle testing](https://midscenejs.com/showcases#community-showcases)

## 💡 Features

### Write Automation with Natural Language

- Describe your goals and steps, and Midscene will plan and operate the user interface for you.

- Use Javascript SDK or YAML to write your automation script.

### Web & Mobile App & Any Interface

- **Web Automation**: Either integrate with [Puppeteer](https://midscenejs.com/integrate-with-puppeteer), [Playwright](https://midscenejs.com/integrate-with-playwright) or use [Bridge Mode](https://midscenejs.com/bridge-mode) to control your desktop browser.

- **Android Automation**: Use [Javascript SDK](https://midscenejs.com/android-getting-started) with adb to control your local Android device.

- **iOS Automation**: Use [Javascript SDK](https://midscenejs.com/ios-getting-started) with WebDriverAgent to control your local iOS devices and simulators.

- **Any Interface Automation**: Use [Javascript SDK](https://midscenejs.com/integrate-with-any-interface) to control your own interface.

### For Developers

- **Three kinds of APIs**:

  - [Interaction API](https://midscenejs.com/api#interaction-methods): interact with the user interface.

  - [Data Extraction API](https://midscenejs.com/api#data-extraction): extract data from the user interface and dom.

  - [Utility API](https://midscenejs.com/api#more-apis): utility functions like `aiAssert()`, `aiLocate()`, `aiWaitFor()`.

- **MCP**: Midscene provides MCP services that expose atomic Midscene Agent actions as MCP tools so upper-layer agents can inspect and operate UIs with natural language. [Docs](https://midscenejs.com/mcp)

- [**Caching for Efficiency**](https://midscenejs.com/caching): Replay your script with cache and get the result faster.

- **Debugging Experience**: Midscene.js offers a visualized replay back report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need.

## 👉 Zero-code Quick Experience

- **[Chrome Extension](https://midscenejs.com/quick-experience)**: Start in-browser experience immediately through [the Chrome Extension](https://midscenejs.com/quick-experience), without writing any code.

- **[Android Playground](https://midscenejs.com/android-getting-started)**: There is also a built-in Android playground to control your local Android device.

- **[iOS Playground](https://midscenejs.com/ios-getting-started)**: There is also a built-in iOS playground to control your local iOS device.

## ✨ Driven by Visual Language Model

Midscene.js is all-in on the pure-vision route for UI actions: element localization and interactions are based on screenshots only. It supports visual-language models like `Qwen3-VL`, `Doubao-1.6-vision`, `gemini-3-pro`, and `UI-TARS`. For data extraction and page understanding, you can still opt in to include DOM when needed.

* Pure-vision localization for UI actions; the DOM extraction mode is removed.

* Works across web, mobile, desktop, and even `` surfaces.

* Far fewer tokens by skipping DOM for actions, which cuts cost and speeds up runs.

* DOM can still be included for data extraction and page understanding when needed.

* Strong open-source options for self-hosting.

Read more about [Model Strategy](https://midscenejs.com/model-strategy)

## 📄 Resources 

* Official Website: [https://midscenejs.com](https://midscenejs.com/)

* Documentation: [https://midscenejs.com](https://midscenejs.com/)

* Sample Projects: [https://github.com/web-infra-dev/midscene-example](https://github.com/web-infra-dev/midscene-example)

* API Reference: [https://midscenejs.com/api](https://midscenejs.com/api)

* GitHub: [https://github.com/web-infra-dev/midscene](https://github.com/web-infra-dev/midscene)

## 🤝 Community

* [Discord](https://discord.gg/2JyBHxszE4)

* [Follow us on X](https://x.com/midscene_ai)

* [Lark Group(飞书交流群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=693v0991-a6bb-4b44-b2e1-365ca0d199ba)

## 🌟 Awesome Midscene

Community projects that extend Midscene.js capabilities:

* [midscene-ios](https://github.com/lhuanyu/midscene-ios) - iOS Mirror automation support for Midscene

* [midscene-pc](https://github.com/Mofangbao/midscene-pc) - PC operation device for Windows, macOS, and Linux

* [midscene-pc-docker](https://github.com/Mofangbao/midscene-pc-docker) - Docker image with Midscene-PC server pre-installed

* [Midscene-Python](https://github.com/Python51888/Midscene-Python) - Python SDK for Midscene automation

* [midscene-java](https://github.com/Master-Frank/midscene-java) by @Master-Frank - Java SDK for Midscene automation

* [midscene-java](https://github.com/alstafeev/midscene-java) by @alstafeev - Java SDK for Midscene automation

## 📝 Credits

We would like to thank the following projects:

- [Rsbuild](https://github.com/web-infra-dev/rsbuild) and [Rslib](https://github.com/web-infra-dev/rslib) for the build tool.

- [UI-TARS](https://github.com/bytedance/ui-tars) for the open-source agent model UI-TARS.

- [Qwen-VL](https://github.com/QwenLM/Qwen-VL) for the open-source VL model Qwen-VL.

- [scrcpy](https://github.com/Genymobile/scrcpy) and [yume-chan](https://github.com/yume-chan) allow us to control Android devices with browser.

- [appium-adb](https://github.com/appium/appium-adb) for the javascript bridge of adb.

- [appium-webdriveragent](https://github.com/appium/WebDriverAgent) for the javascript operate XCTest。

- [YADB](https://github.com/ysbing/YADB) for the yadb tool which improves the performance of text input.

- [libnut-core](https://github.com/nut-tree/libnut-core) for the cross-platform native keyboard and mouse control.

- [Puppeteer](https://github.com/puppeteer/puppeteer) for browser automation and control.

- [Playwright](https://github.com/microsoft/playwright) for browser automation and control and testing.

## 📖 Citation

If you use Midscene.js in your research or project, please cite:

```bibtex

@software{Midscene.js,

  author = {Xiao Zhou, Tao Yu, YiBing Lin},

  title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},

  year = {2025},

  publisher = {GitHub},

  url = {https://github.com/web-infra-dev/midscene}

}

```

## ✨ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=web-infra-dev/midscene&type=Date)](https://www.star-history.com/#web-infra-dev/midscene&Date)

## 📝 License

Midscene.js is [MIT licensed](https://github.com/web-infra-dev/midscene/blob/main/LICENSE).

---



  If this project helps you or inspires you, please give us a star

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/web-infra-dev/midscene

Awesome Lists containing this project

README

Midscene.js