Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/showlab/awesome-gui-agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
https://github.com/showlab/awesome-gui-agent
List: awesome-gui-agent
ai-assistant awesome graphical-user-interface gui-agents llm-agent
Last synced: about 1 month ago
JSON representation
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
- Host: GitHub
- URL: https://github.com/showlab/awesome-gui-agent
- Owner: showlab
- Created: 2024-06-28T16:12:23.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-10-27T08:03:37.000Z (about 2 months ago)
- Last Synced: 2024-10-27T09:19:15.145Z (about 2 months ago)
- Topics: ai-assistant, awesome, graphical-user-interface, gui-agents, llm-agent
- Homepage:
- Size: 577 KB
- Stars: 153
- Watchers: 4
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-gui-agent - 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents. (Other Lists / Monkey C Lists)
README
# Awesome GUI Agent [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
A curated list of papers, projects, and resources for multi-modal Graphical User Interface (GUI) agents.
Build a digital assistant on your screen. Generated by DALL-E-3.**WELCOME CONTRIBUTE!**
🔥 This project is actively maintained, and we welcome your contributions. If you have any suggestions, such as missing papers or information, please feel free to open an issue or submit a pull request.
🤖 Try our [Awesome-Paper-Agent](https://chatgpt.com/g/g-qqs9km6wi-awesome-paper-agent). Just provide an arXiv URL link, and it will automatically return formatted information, like this:
```
User:
https://arxiv.org/abs/2312.13108GPT:
+ [AssistGUI: Task-Oriented Desktop Graphical User Interface Automation](https://arxiv.org/abs/2312.13108) (Dec. 2023)[![Star](https://img.shields.io/github/stars/showlab/assistgui.svg?style=social&label=Star)](https://github.com/showlab/assistgui)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.13108)
[![Website](https://img.shields.io/badge/Website-9cf)](https://showlab.github.io/assistgui/)
```So then you can easily copy and use this information in your pull requests.
⭐ If you find this repository useful, please give it a star.
## Datasets / Benchmarks
+ [World of Bits: An Open-Domain Platform for Web-Based Agents](https://proceedings.mlr.press/v70/shi17a.html) (Aug. 2017, ICML 2017)[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://proceedings.mlr.press/v70/shi17a/shi17a.pdf)
+ [A Unified Solution for Structured Web Data Extraction](https://dl.acm.org/doi/10.1145/2009916.2010020) (Jul. 2011, SIGIR 2011)[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://dl.acm.org/doi/10.1145/2009916.2010020)
+ [Rico: A Mobile App Dataset for Building Data-Driven Design Applications](https://dl.acm.org/doi/10.1145/3126594.3126651) (Oct. 2017)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://dl.acm.org/doi/10.1145/3126594.3126651)
+ [Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration](https://arxiv.org/abs/1802.08802) (Feb. 2018, ICLR 2018)[![Star](https://img.shields.io/github/stars/stanfordnlp/wge.svg?style=social&label=Star)](https://github.com/stanfordnlp/wge)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/1802.08802)+ [Mapping Natural Language Instructions to Mobile UI Action Sequences](https://arxiv.org/abs/2005.03776) (May. 2020, ACL 2020)
[![Star](https://img.shields.io/github/stars/deepneuralmachine/seq2act-tensorflow.svg?style=social&label=Star)](https://github.com/deepneuralmachine/seq2act-tensorflow)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2005.03776)+ [WebSRC: A Dataset for Web-Based Structural Reading Comprehension](https://arxiv.org/abs/2101.09465) (Jan. 2021, EMNLP 2021)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2101.09465)
[![Website](https://img.shields.io/badge/Website-9cf)](https://x-lance.github.io/WebSRC/)+ [AndroidEnv: A Reinforcement Learning Platform for Android](https://arxiv.org/abs/2105.13231) (May. 2021)
[![Star](https://img.shields.io/github/stars/deepmind/android_env.svg?style=social&label=Star)](https://github.com/deepmind/android_env)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2105.13231)
[![Website](https://img.shields.io/badge/Website-9cf)](https://github.com/deepmind/android_env)+ [A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility](https://arxiv.org/abs/2202.02312) (Feb. 2022)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2202.02312)
+ [META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI](https://arxiv.org/abs/2205.11029) (May. 2022)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2205.11029)
[![Website](https://img.shields.io/badge/Website-9cf)](https://x-lance.github.io/META-GUI-Leaderboard/)+ [WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents](https://arxiv.org/abs/2207.01206) (Jul. 2022)
[![Star](https://img.shields.io/github/stars/princeton-nlp/WebShop.svg?style=social&label=Star)](https://github.com/princeton-nlp/WebShop)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2207.01206)
[![Website](https://img.shields.io/badge/Website-9cf)](https://webshop-pnlp.github.io/)+ [Language Models can Solve Computer Tasks](https://arxiv.org/abs/2303.17491) (Mar. 2023)
[![Star](https://img.shields.io/github/stars/posgnu/rci-agent.svg?style=social&label=Star)](https://github.com/posgnu/rci-agent)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2303.17491)
[![Website](https://img.shields.io/badge/Website-9cf)](https://posgnu.github.io/rci-web/)+ [Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction](https://arxiv.org/abs/2305.08144) (May. 2023)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.08144)
[![GitHub](https://img.shields.io/badge/GitHub-181717.svg?style=social&logo=github)](https://github.com/X-LANCE/Mobile-Env)+ [Mind2Web: Towards a Generalist Agent for the Web](https://arxiv.org/abs/2306.06070) (Jun. 2023)
[![Star](https://img.shields.io/github/stars/osu-nlp-group/mind2web.svg?style=social&label=Star)](https://github.com/osu-nlp-group/mind2web)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2306.06070)
[![Website](https://img.shields.io/badge/Website-9cf)](https://osu-nlp-group.github.io/Mind2Web/)+ [Android in the Wild: A Large-Scale Dataset for Android Device Control](https://arxiv.org/abs/2307.10088) (Jul. 2023)
[![Star](https://img.shields.io/github/stars/google-research/google-research.svg?style=social&label=Star)](https://github.com/google-research/google-research/tree/master/android_in_the_wild)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2307.10088)+ [WebArena: A Realistic Web Environment for Building Autonomous Agents](https://arxiv.org/abs/2307.13854) (Jul. 2023)
[![Star](https://img.shields.io/github/stars/web-arena-x/webarena.svg?style=social&label=Star)](https://github.com/web-arena-x/webarena)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2307.13854)
[![Website](https://img.shields.io/badge/Website-9cf)](https://webarena.dev/)+ [Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models](https://arxiv.org/abs/2311.09278) (Nov. 2023)
[![Star](https://img.shields.io/github/stars/xufangzhi/ENVISIONS.svg?style=social&label=Star)](https://github.com/xufangzhi/ENVISIONS)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.09278)
[![Website](https://img.shields.io/badge/Website-9cf)](https://xufangzhi.github.io/symbol-llm-page/)+ [AssistGUI: Task-Oriented Desktop Graphical User Interface Automation](https://arxiv.org/abs/2401.07781) (Dec. 2023, CVPR 2024)
[![Star](https://img.shields.io/github/stars/showlab/assistgui.svg?style=social&label=Star)](https://github.com/showlab/assistgui)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.07781)
[![Website](https://img.shields.io/badge/Website-9cf)](https://showlab.github.io/assistgui/)+ [VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks](https://arxiv.org/abs/2401.13649) (Jan. 2024, ACL 2024)
[![Star](https://img.shields.io/github/stars/web-arena-x/visualwebarena.svg?style=social&label=Star)](https://github.com/jykoh/visualwebarena)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.13649)
[![Website](https://img.shields.io/badge/Website-9cf)](https://jykoh.com/vwa)+ [OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web](https://arxiv.org/abs/2402.17553) (Feb. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.17553)
+ [WebLINX: Real-World Website Navigation with Multi-Turn Dialogue](https://arxiv.org/abs/2402.05930) (Feb. 2024)
[![Star](https://img.shields.io/github/stars/mcgill-nlp/weblinx.svg?style=social&label=Star)](https://github.com/mcgill-nlp/weblinx)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.05930)
[![Website](https://img.shields.io/badge/Website-9cf)](https://mcgill-nlp.github.io/weblinx/)+ [On the Multi-turn Instruction Following for Conversational Web Agents](https://arxiv.org/abs/2402.15057) (Feb. 2024)
[![Star](https://img.shields.io/github/stars/magicgh/self-map.svg?style=social&label=Star)](https://github.com/magicgh/self-map)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.15057)+ [AgentStudio: A Toolkit for Building General Virtual Agents](https://arxiv.org/abs/2403.17918) (Mar. 2024)
[![Star](https://img.shields.io/github/stars/skyworkai/agent-studio.svg?style=social&label=Star)](https://github.com/skyworkai/agent-studio)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.17918)
[![Website](https://img.shields.io/badge/Website-9cf)](https://skyworkai.github.io/agent-studio/)+ [OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments](https://arxiv.org/abs/2404.07972) (Apr. 2024)
[![Star](https://img.shields.io/github/stars/xlang-ai/OSWorld.svg?style=social&label=Star)](https://github.com/xlang-ai/OSWorld)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.07972)
[![Website](https://img.shields.io/badge/Website-9cf)](https://os-world.github.io/)+ [Benchmarking Mobile Device Control Agents across Diverse Configurations](https://arxiv.org/abs/2404.16660) (Apr. 2024, ICLR 2024)
[![Star](https://img.shields.io/github/stars/gimme1dollar/b-moca.svg?style=social&label=Star)](https://github.com/gimme1dollar/b-moca)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.16660)+ [MMInA: Benchmarking Multihop Multimodal Internet Agents](https://arxiv.org/abs/2404.09992) (Apr. 2024)
[![Star](https://img.shields.io/github/stars/shulin16/MMInA.svg?style=social&label=Star)](https://github.com/shulin16/MMInA)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.09992)
[![Website](https://img.shields.io/badge/Website-9cf)](https://mmina.cliangyu.com)
+ [Autonomous Evaluation and Refinement of Digital Agents](https://arxiv.org/abs/2404.06474) (Apr. 2024)[![Star](https://img.shields.io/github/stars/Berkeley-NLP/Agent-Eval-Refine.svg?style=social&label=Star)](https://github.com/Berkeley-NLP/Agent-Eval-Refine)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.06474)+ [LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation](https://arxiv.org/abs/2404.16054) (Apr. 2024)
[![Star](https://img.shields.io/github/stars/LlamaTouch/LlamaTouch.svg?style=social&label=Star)](https://github.com/LlamaTouch/LlamaTouch)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.16054)+ [VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?](https://arxiv.org/abs/2404.05955) (Apr. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.05955)
+ [GUICourse: From General Vision Language Models to Versatile GUI Agents](https://arxiv.org/abs/2406.11317) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/yiye3/GUICourse.svg?style=social&label=Star)](https://github.com/yiye3/GUICourse)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.11317)
+ [GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents](https://arxiv.org/abs/2406.10819) (Jun. 2024)[![Star](https://img.shields.io/github/stars/Dongping-Chen/GUI-World.svg?style=social&label=Star)](https://github.com/Dongping-Chen/GUI-World)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.10819)
[![Website](https://img.shields.io/badge/Website-9cf)](https://gui-world.github.io/)+ [GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices](https://arxiv.org/abs/2406.08451) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/OpenGVLab/GUI-Odyssey.svg?style=social&label=Star)](https://github.com/OpenGVLab/GUI-Odyssey)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.08451)+ [VideoGUI: A Benchmark for GUI Automation from Instructional Videos](https://arxiv.org/abs/2406.10227) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/showlab/videogui.svg?style=social&label=Star)](https://github.com/showlab/videogui)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.10227)
[![Website](https://img.shields.io/badge/Website-9cf)](https://showlab.github.io/videogui/)+ [Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding](https://arxiv.org/abs/2406.19263) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/eric-ai-lab/Screen-Point-and-Read.svg?style=social&label=Star)](https://github.com/eric-ai-lab/Screen-Point-and-Read)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.19263)
[![Website](https://img.shields.io/badge/Website-9cf)](https://screen-point-and-read.github.io/)+ [MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents](https://arxiv.org/abs/2406.08184) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/MobileAgentBench/mobile-agent-bench.svg?style=social&label=Star)](https://github.com/MobileAgentBench/mobile-agent-bench)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.08184)
[![Website](https://img.shields.io/badge/Website-9cf)](https://mobileagentbench.github.io)+ [AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents](https://arxiv.org/abs/2405.14573) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/google-research/android_world.svg?style=social&label=Star)](https://github.com/google-research/android_world)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.14573)+ [Practical, Automated Scenario-based Mobile App Testing](https://arxiv.org/abs/2406.08340) (Jun. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.08340)
+ [WebCanvas: Benchmarking Web Agents in Online Environments](https://arxiv.org/abs/2406.12373) (Jun. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.12373)
[![Website](https://img.shields.io/badge/Website-9cf)](https://www.imean.ai/web-canvas)+ [On the Effects of Data Scale on Computer Control Agents](https://arxiv.org/abs/2406.03679) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/google-research/google-research)](https://github.com/google-research/google-research/tree/master/android_control)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.03679)+ [CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents](https://arxiv.org/abs/2407.01511) (Jul. 2024)
[![Star](https://img.shields.io/github/stars/camel-ai/crab.svg?style=social&label=Star)](https://github.com/camel-ai/crab)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.01511)+ [WebVLN: Vision-and-Language Navigation on Websites](https://ojs.aaai.org/index.php/AAAI/article/view/27878) (AAAI 2024)
[![Star](https://img.shields.io/github/stars/WebVLN/WebVLN.svg?style=social&label=Star)](https://github.com/WebVLN/WebVLN)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://ojs.aaai.org/index.php/AAAI/article/view/27878)+ [Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?](https://arxiv.org/abs/2407.10956) (Jul. 2024)
[![Star](https://img.shields.io/github/stars/xlang-ai/Spider2-V.svg?style=social&label=Star)](https://github.com/xlang-ai/Spider2-V)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.10956)
[![Website](https://img.shields.io/badge/Website-9cf)](https://spider2-v.github.io/)+ [AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents](https://arxiv.org/abs/2407.17490)
[![Star](https://img.shields.io/github/stars/YuxiangChai/AMEX-codebase.svg?style=social&label=Star)](https://github.com/YuxiangChai/AMEX-codebase)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.17490)
[![Website](https://img.shields.io/badge/Website-9cf)](https://yuxiangchai.github.io/AMEX/)+ [Windows Agent Arena](https://raw.githubusercontent.com/microsoft/WindowsAgentArena/website/static/files/windows_agent_arena.pdf)
[![Star](https://img.shields.io/github/stars/microsoft/WindowsAgentArena.svg?style=social&label=Star)](https://github.com/microsoft/WindowsAgentArena)
[![Website](https://img.shields.io/badge/Website-9cf)](https://microsoft.github.io/WindowsAgentArena/)
[![PDF](https://img.shields.io/badge/PDF-4285f4.svg)](https://raw.githubusercontent.com/microsoft/WindowsAgentArena/website/static/files/windows_agent_arena.pdf)+ [Harnessing Webpage UIs for Text-Rich Visual Understanding](https://arxiv.org/abs/2410.13824) (Oct, 2024)
[![Star](https://img.shields.io/github/stars/neulab/multiui.svg?style=social&label=Star)](https://github.com/neulab/multiui)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://neulab.github.io/MultiUI/)
[![Website](https://img.shields.io/badge/Website-9cf)](https://neulab.github.io/MultiUI/)
## Models / Agents+ [Grounding Open-Domain Instructions to Automate Web Support Tasks](https://web3.arxiv.org/abs/2103.16057) (Mar. 2021)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://web3.arxiv.org/abs/2103.16057)
+ [Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning](https://arxiv.org/abs/2108.03353) (Aug. 2021)[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](http://arxiv.org/abs/2108.03353)
+ [A Data-Driven Approach for Learning to Control Computers](https://arxiv.org/abs/2202.08137) (Feb. 2022)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2202.08137)
+ [Augmenting Autotelic Agents with Large Language Models](https://arxiv.org/pdf/2305.12487) (May. 2023)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2305.12487)
+ [Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control](https://arxiv.org/abs/2306.07863) (Jun. 2023, ICLR 2024)
[![Star](https://img.shields.io/github/stars/ltzheng/synapse.svg?style=social&label=Star)](https://github.com/ltzheng/synapse)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2306.07863)+ [A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis](https://arxiv.org/abs/2307.12856) (Jul. 2023, ICLR 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](http://arxiv.org/abs/2307.12856)
+ [LASER: LLM Agent with State-Space Exploration for Web Navigation](https://arxiv.org/abs/2309.08172) (Sep. 2023)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2309.08172)
+ [CogAgent: A Visual Language Model for GUI Agents](https://arxiv.org/abs/2312.08914) (Dec. 2023, CVPR 2024)
[![Star](https://img.shields.io/github/stars/THUDM/CogVLM.svg?style=social&label=Star)](https://github.com/THUDM/CogVLM)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.08914)+ [WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models](https://arxiv.org/abs/2401.13919)
[![Star](https://img.shields.io/github/stars/MinorJerry/WebVoyager.svg?style=social&label=Star)](https://github.com/MinorJerry/WebVoyager)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.13919)+ [OS-Copilot: Towards Generalist Computer Agents with Self-Improvement](https://arxiv.org/abs/2402.07456) (Feb. 2024)
[![Star](https://img.shields.io/github/stars/OS-Copilot/OS-Copilot.svg?style=social&label=Star)](https://github.com/OS-Copilot/OS-Copilot)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.07456)
[![Website](https://img.shields.io/badge/Website-9cf)](https://os-copilot.github.io/)+ [UFO: A UI-Focused Agent for Windows OS Interaction](https://arxiv.org/abs/2402.07939) (Feb. 2024)
[![Star](https://img.shields.io/github/stars/microsoft/UFO.svg?style=social&label=Star)](https://github.com/microsoft/UFO)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.07939)
[![Website](https://img.shields.io/badge/Website-9cf)](https://microsoft.github.io/UFO/)+ [Comprehensive Cognitive LLM Agent for Smartphone GUI Automation](https://arxiv.org/abs/2402.11941) (Feb. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.11941)
+ [Improving Language Understanding from Screenshots](https://arxiv.org/abs/2402.14073) (Feb. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.14073)
+ [AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent](https://arxiv.org/abs/2404.03648) (Apr. 2024, KDD 2024)
[![Star](https://img.shields.io/github/stars/THUDM/AutoWebGLM.svg?style=social&label=Star)](https://github.com/THUDM/AutoWebGLM)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.03648)+ [SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models](https://arxiv.org/abs/2305.19308) (May. 2023, NeurIPS 2023)
[![Star](https://img.shields.io/github/stars/BraveGroup/SheetCopilot.svg?style=social&label=Star)](https://github.com/BraveGroup/SheetCopilot)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.19308)
[![Website](https://img.shields.io/badge/Website-9cf)](https://sheetcopilot.github.io/)+ [You Only Look at Screens: Multimodal Chain-of-Action Agents](https://arxiv.org/abs/2309.11436) (Sep. 2023)
[![Star](https://img.shields.io/github/stars/cooelf/Auto-UI.svg?style=social&label=Star)](https://github.com/cooelf/Auto-UI)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2309.11436)+ [Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API](https://arxiv.org/abs/2310.04716) (Oct. 2023)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2310.04716)
+ [OpenAgents: AN OPEN PLATFORM FOR LANGUAGE AGENTS IN THE WILD](https://arxiv.org/pdf/2310.10634) (Oct. 2023)
[![Star](https://img.shields.io/github/stars/xlang-ai/OpenAgents.svg?style=social&label=Star)](https://github.com/xlang-ai/OpenAgents)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2310.10634)+ [GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation](https://arxiv.org/abs/2311.07562) (Nov. 2023)
[![Star](https://img.shields.io/github/stars/zzxslp/MM-Navigator.svg?style=social&label=Star)](https://github.com/zzxslp/MM-Navigator)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.07562)+ [AppAgent: Multimodal Agents as Smartphone Users](https://arxiv.org/abs/2312.13771) (Dec. 2023)
[![Star](https://img.shields.io/github/stars/mnotgod96/AppAgent.svg?style=social&label=Star)](https://github.com/mnotgod96/AppAgent)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.13771)
[![Website](https://img.shields.io/badge/Website-9cf)](https://appagent-official.github.io)+ [SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents](https://arxiv.org/abs/2401.10935) (Jan. 2024, ACL 2024)
[![Star](https://img.shields.io/github/stars/njucckevin/SeeClick.svg?style=social&label=Star)](https://github.com/njucckevin/SeeClick)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.10935)+ [GPT-4V(ision) is a Generalist Web Agent, if Grounded](https://arxiv.org/abs/2401.01614) (Jan. 2024, ICML 2024)
[![Star](https://img.shields.io/github/stars/OSU-NLP-Group/SeeAct.svg?style=social&label=Star)](https://github.com/OSU-NLP-Group/SeeAct)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.01614)
[![Website](https://img.shields.io/badge/Website-9cf)](https://osu-nlp-group.github.io/SeeAct/)+ [Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception](http://arxiv.org/abs/2401.16158) (Jan. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](http://arxiv.org/abs/2401.16158)
+ [Dual-View Visual Contextualization for Web Navigation](https://arxiv.org/abs/2402.04476) (Feb. 2024, CVPR 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.04476)
+ [DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning](https://arxiv.org/abs/2406.11896) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/DigiRL-agent/digirl.svg?style=social&label=Star)](https://github.com/DigiRL-agent/digirl)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.11896)
[![Website](https://img.shields.io/badge/Website-9cf)](https://digirl-agent.github.io/)+ [Visual Grounding for User Interfaces](https://aclanthology.org/2024.naacl-industry.9.pdf) (NAACL 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://aclanthology.org/2024.naacl-industry.9.pdf)
+ [ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model](https://arxiv.org/abs/2402.07945) (Feb. 2024)
[![Star](https://img.shields.io/github/stars/niuzaisheng/ScreenAgent.svg?style=social&label=Star)](https://github.com/niuzaisheng/ScreenAgent)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.07945)
[![Website](https://img.shields.io/badge/Website-9cf)](https://screenagent.pages.dev/)+ [ScreenAI: A Vision-Language Model for UI and Infographics Understanding](https://arxiv.org/abs/2402.04615) (Feb. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.04615)
+ [Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs](https://arxiv.org/abs/2404.05719) (Apr. 2024)[![Star](https://img.shields.io/github/stars/apple/ml-ferret.svg?style=social&label=Star)](https://github.com/apple/ml-ferret)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.05719)+ [Octopus: On-device language model for function calling of software APIs](https://arxiv.org/abs/2404.01549) (Apr., 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.01549)
+ [Octopus v2: On-device language model for super agent](https://arxiv.org/abs/2404.01744) (Apr., 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.01744)
+ [Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent](https://arxiv.org/abs/2404.11459) (Apr., 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.11459)
[![Website](https://img.shields.io/badge/Website-9cf)](https://www.nexa4ai.com/octopus-v3)+ [Octopus v4: Graph of language models](https://arxiv.org/abs/2404.19296) (Apr., 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.19296)
+ [AutoWebGLM: Bootstrap and Reinforce a Large Language Model-based Web Navigating Agent](https://arxiv.org/abs/2404.03648) (Apr. 2024)
[![Star](https://img.shields.io/github/stars/THUDM/AutoWebGLM.svg?style=social&label=Star)](https://github.com/THUDM/AutoWebGLM)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.03648)
+ [Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning](https://arxiv.org/abs/2404.10887) (Apr. 2024)[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.10887)
+ [Enhancing Mobile "How-to" Queries with Automated Search Results Verification and Reranking](https://arxiv.org/pdf/2404.08860v3) (Apr. 2024, SIGIR 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2404.08860v3)
+ [AutoDroid: LLM-powered Task Automation in Android](https://arxiv.org/abs/2308.15272)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2308.15272)
+ [Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study](https://arxiv.org/abs/2403.03186) (Mar. 2024)
[![Star](https://img.shields.io/github/stars/BAAI-Agents/Cradle.svg?style=social&label=Star)](https://github.com/BAAI-Agents/Cradle)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.03186)
[![Website](https://img.shields.io/badge/Website-9cf)](https://baai-agents.github.io/Cradle/)+ [Android in the Zoo: Chain-of-Action-Thought for GUI Agents](https://arxiv.org/abs/2403.02713) (Mar. 2024)
[![Star](https://img.shields.io/github/stars/IMNearth/CoAT.svg?style=social&label=Star)](https://github.com/IMNearth/CoAT)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.02713)+ [Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning](https://arxiv.org/abs/2405.00516v1) (May 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.00516v1)
+ [GUI Action Narrator: Where and When Did That Action Take Place?](https://arxiv.org/abs/2406.13719) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/showlab/GUI-Action-Narrator.svg?style=social&label=Star)](https://github.com/showlab/GUI-Action-Narrator)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.13719)
[![Website](https://img.shields.io/badge/Website-9cf)](https://showlab.github.io/GUI-Narrator)+ [Identifying User Goals from UI Trajectories](https://arxiv.org/abs/2406.14314) (Jun. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.14314)+ [VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning](https://arxiv.org/abs/2406.14056) (Jun. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.14056)
+ [Octo-planner: On-device Language Model for Planner-Action Agents](https://arxiv.org/abs/2406.18082) (Jun. 2024)[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.18082)
[![Website](https://img.shields.io/badge/Website-9cf)](https://www.nexa4ai.com/octo-planner#video)+ [E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion](https://arxiv.org/abs/2406.14250) (Jun. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.14250)
+ [Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration](https://arxiv.org/abs/2406.01014) (Jun. 2024)
[![Star](https://img.shields.io/github/stars/X-PLUG/MobileAgent.svg?style=social&label=Star)](https://github.com/X-PLUG/MobileAgent)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.01014)+ [MobileFlow: A Multimodal LLM For Mobile GUI Agent](https://arxiv.org/abs/2407.04346) (Jul. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.04346)
+ [Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model](https://arxiv.org/abs/2407.03037) (Jul. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.03037)
+ [Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence](https://arxiv.org/abs/2407.07061) (Jul. 2024)
[![Star](https://img.shields.io/github/stars/OpenBMB/IoA.svg?style=social&label=Star)](https://github.com/OpenBMB/IoA)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.07061)+ [MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices](https://arxiv.org/abs/2407.03913) (Jul. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.03913)
+ [AUITestAgent: Automatic Requirements Oriented GUI Function Testing](https://arxiv.org/abs/2407.09018) (Jul. 2024)
[![Star](https://img.shields.io/github/stars/bz-lab/AUITestAgent.svg?style=social&label=Star)](https://github.com/bz-lab/AUITestAgent)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.09018)+ [Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems](https://arxiv.org/abs/2407.13032) (Jul. 2024)
[![Star](https://img.shields.io/github/stars/EmergenceAI/Agent-E.svg?style=social&label=Star)](https://github.com/EmergenceAI/Agent-E)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.13032)+ [OmniParser for Pure Vision Based GUI Agent](https://arxiv.org/pdf/2408.00203) (Aug. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2408.00203)
+ [VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents](https://arxiv.org/abs/2408.06327) (Aug. 2024)
[![Star](https://img.shields.io/github/stars/THUDM/VisualAgentBench.svg?style=social&label=Star)](https://github.com/THUDM/VisualAgentBench)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2408.06327)
[![Website](https://img.shields.io/badge/Website-9cf)](https://github.com/THUDM/VisualAgentBench)+ [Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents](https://web3.arxiv.org/abs/2408.07199v1) (Aug. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://web3.arxiv.org/abs/2408.07199v1)
[![Website](https://img.shields.io/badge/Website-9cf)](https://www.multion.ai/blog/introducing-agent-q-research-breakthrough-for-the-next-generation-of-ai-agents-with-planning-and-self-healing-capabilities)+ [MindSearch: Mimicking Human Minds Elicits Deep AI Searcher](https://arxiv.org/abs/2407.20183) (Jul. 2023)
[![Star](https://img.shields.io/github/stars/InternLM/MindSearch.svg?style=social&label=Star)](https://github.com/InternLM/MindSearch)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.20183)
[![Website](https://img.shields.io/badge/Website-9cf)](https://mindsearch.netlify.app/)+ [AppAgent v2: Advanced Agent for Flexible Mobile Interactions](https://arxiv.org/abs/2408.11824) (Aug. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2408.11824)
+ [Caution for the Environment:
Multimodal Agents are Susceptible to Environmental Distractions](https://arxiv.org/abs/2408.02544) (Aug. 2024)[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/pdf/2408.02544)
+ [Agent Workflow Memory](https://arxiv.org/abs/2409.07429) (Sep. 2024)
[![Star](https://img.shields.io/github/stars/zorazrw/agent-workflow-memory.svg?style=social&label=Star)](https://github.com/zorazrw/agent-workflow-memory)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.07429)+ [MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understandin](https://arxiv.org/abs/2409.14818) (Sep. 2024)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2409.14818)
+ [Agent S: An Open Agentic Framework that Uses Computers Like a Human](https://arxiv.org/abs/2410.08164) (Oct. 2024)
[![Star](https://img.shields.io/github/stars/simular-ai/Agent-S.svg?style=social&label=Star)](https://github.com/simular-ai/Agent-S)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.08164)+ [MobA: A Two-Level Agent System for Efficient Mobile Task Automation](https://arxiv.org/abs/2410.13757) (Oct. 2024)
[![Star](https://img.shields.io/github/stars/OpenDFM/MobA.svg?style=social&label=Star)](https://github.com/OpenDFM/MobA)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2410.13757)
[![Dataset](https://img.shields.io/badge/Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/OpenDFM/MobA-MobBench)## Projects
+ [PyAutoGUI](https://pyautogui.readthedocs.io/en/latest/index.html)[![Star](https://img.shields.io/github/stars/asweigart/pyautogui.svg?style=social&label=Star)](https://github.com/asweigart/pyautogui/tree/master)
[![Website](https://img.shields.io/badge/Website-9cf)](https://pyautogui.readthedocs.io/en/latest/)+ [GPT-4V-Act: AI agent using GPT-4V(ision) for web UI interaction](https://github.com/ddupont808/GPT-4V-Act)
[![Star](https://img.shields.io/github/stars/ddupont808/GPT-4V-Act.svg?style=social&label=Star)](https://github.com/ddupont808/GPT-4V-Act)
+ [gpt-computer-assistant](https://github.com/onuratakan/gpt-computer-assistant)
[![Star](https://img.shields.io/github/stars/onuratakan/gpt-computer-assistant.svg?style=social&label=Star)](https://github.com/onuratakan/gpt-computer-assistant)
+ [Mobile-Agent: The Powerful Mobile Device Operation Assistant Family](https://github.com/X-PLUG/MobileAgent)
[![Star](https://img.shields.io/github/stars/X-PLUG/MobileAgent.svg?style=social&label=Star)](https://github.com/X-PLUG/MobileAgent)
+ [OpenUI](https://github.com/wandb/openui)[![Star](https://img.shields.io/github/stars/wandb/openui.svg?style=social&label=Star)](https://github.com/wandb/openui)
[![Website](https://img.shields.io/badge/Website-9cf)](https://openui.fly.dev)+ [ACT-1](https://www.adept.ai/blog/act-1)
[![Website](https://img.shields.io/badge/Website-9cf)](https://www.adept.ai/blog/act-1)
+ [NatBot](https://github.com/nat/natbot)
[![Star](https://img.shields.io/github/stars/nat/natbot.svg?style=social&label=Star)](https://github.com/nat/natbot)
+ [Multion](https://www.multion.ai)
[![Website](https://img.shields.io/badge/Website-9cf)](https://www.multion.ai/)
+ [Auto-GPT](https://github.com/Significant-Gravitas/Auto-GPT)
[![Star](https://img.shields.io/github/stars/Significant-Gravitas/Auto-GPT.svg?style=social&label=Star)](https://github.com/Significant-Gravitas/Auto-GPT)
+ [WebLlama](https://github.com/McGill-NLP/webllama)
[![Star](https://img.shields.io/github/stars/McGill-NLP/webllama.svg?style=social&label=Star)](https://github.com/McGill-NLP/webllama)
[![Website](https://img.shields.io/badge/Website-9cf)](https://webllama.github.io)+ [LaVague: Large Action Model Framework to Develop AI Web Agents](https://github.com/lavague-ai/LaVague)
[![Star](https://img.shields.io/github/stars/lavague-ai/LaVague.svg?style=social&label=Star)](https://github.com/lavague-ai/LaVague)
[![Website](https://img.shields.io/badge/Website-9cf)](https://docs.lavague.ai/)+ [OpenAdapt: AI-First Process Automation with Large Multimodal Models](https://github.com/OpenAdaptAI/OpenAdapt)
[![Star](https://img.shields.io/github/stars/OpenAdaptAI/OpenAdapt.svg?style=social&label=Star)](https://github.com/OpenAdaptAI/OpenAdapt)
+ [Surfkit: A toolkit for building and sharing AI agents that operate on devices](https://github.com/agentsea/surfkit)
[![Star](https://img.shields.io/github/stars/agentsea/surfkit.svg?style=social&label=Star)](https://github.com/agentsea/surfkit)
+ [AGI Computer Control](https://github.com/James4Ever0/agi_computer_control)
+ [Open Interpreter](https://github.com/OpenInterpreter/open-interpreter)
[![Star](https://img.shields.io/github/stars/OpenInterpreter/open-interpreter.svg?style=social&label=Star)](https://github.com/OpenInterpreter/open-interpreter)
[![Website](https://img.shields.io/badge/Website-9cf)](https://openinterpreter.com/)+ [WebMarker: Mark web pages for use with vision-language models](https://github.com/reidbarber/webmarker)
[![Star](https://img.shields.io/github/stars/reidbarber/webmarker.svg?style=social&label=Star)](https://github.com/reidbarber/webmarker)
[![Website](https://img.shields.io/badge/Website-9cf)](https://www.webmarkerjs.com/)## Related Repositories
- [awesome-llm-powered-agent](https://github.com/hyp1231/awesome-llm-powered-agent)
- [Awesome-LLM-based-Web-Agent-and-Tools](https://github.com/albzni/Awesome-LLM-based-Web-Agent-and-Tools)
- [awesome-ui-agents](https://github.com/opendilab/awesome-ui-agents/)
- [computer-control-agent-knowledge-base](https://github.com/James4Ever0/computer_control_agent_knowledge_base)## Acknowledgements
This template is provided by [Awesome-Video-Diffusion](https://github.com/showlab/Awesome-Video-Diffusion) and [Awesome-MLLM-Hallucination](https://github.com/showlab/Awesome-MLLM-Hallucination).