awesome-gui-agent

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
https://github.com/showlab/awesome-gui-agent

Last synced: 5 days ago
JSON representation

Datasets / Benchmarks
- WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
- ![Star - Thinker)
- Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration
- ![Star
- World of Bits: An Open-Domain Platform for Web-Based Agents
- Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration
- ![Star
- Mapping Natural Language Instructions to Mobile UI Action Sequences
- ![Star - tensorflow)
- WebSRC: A Dataset for Web-Based Structural Reading Comprehension
- AndroidEnv: A Reinforcement Learning Platform for Android
- ![Star
- META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
- A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility
- GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
- ![Star - Chen/GUI-World)
- VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
- GUICourse: From General Vision Language Models to Versatile GUI Agents
- ![Star
- GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
- ![Star - Odyssey)
- VideoGUI: A Benchmark for GUI Automation from Instructional Videos
- ![Star
- Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
- ![Star - ai-lab/Screen-Point-and-Read)
- MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents
- ![Star - agent-bench)
- ![Star
- World of Bits: An Open-Domain Platform for Web-Based Agents
- ![arXiv
- ![Star - tensorflow)
- WebSRC: A Dataset for Web-Based Structural Reading Comprehension
- AndroidEnv: A Reinforcement Learning Platform for Android
- ![Star - arena-x/webarena)
- Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
- ![Star
- ![Star - nlp/weblinx)
- AssistGUI: Task-Oriented Desktop Graphical User Interface Automation
- On the Multi-turn Instruction Following for Conversational Web Agents
- A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility
- META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
- ![Star - nlp/WebShop)
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
- ![Star - nlp/WebShop)
- Language Models can Solve Computer Tasks
- Language Models can Solve Computer Tasks
- ![Star - agent)
- ![Star - agent)
- Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction
- Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction
- ![GitHub - LANCE/Mobile-Env)
- ![GitHub - LANCE/Mobile-Env)
- Mind2Web: Towards a Generalist Agent for the Web
- Mind2Web: Towards a Generalist Agent for the Web
- ![Star - nlp-group/mind2web)
- ![Star - nlp-group/mind2web)
- Android in the Wild: A Large-Scale Dataset for Android Device Control
- Android in the Wild: A Large-Scale Dataset for Android Device Control
- WebArena: A Realistic Web Environment for Building Autonomous Agents
- ![Star - arena-x/webarena)
- WebArena: A Realistic Web Environment for Building Autonomous Agents
- Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models
- ![Star
- AssistGUI: Task-Oriented Desktop Graphical User Interface Automation
- VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
- VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
- OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
- OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
- WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
- WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
- On the Multi-turn Instruction Following for Conversational Web Agents
- ![Star - map)
- ![Star - nlp/weblinx)
- ![Star - map)
- AgentStudio: A Toolkit for Building General Virtual Agents
- ![Star - studio)
- AgentStudio: A Toolkit for Building General Virtual Agents
- ![Star - studio)
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- ![Star - ai/OSWorld)
- ![Star - ai/OSWorld)
- Benchmarking Mobile Device Control Agents across Diverse Configurations
- Benchmarking Mobile Device Control Agents across Diverse Configurations
- ![Star - moca)
- MMInA: Benchmarking Multihop Multimodal Internet Agents
- ![Star
- Autonomous Evaluation and Refinement of Digital Agents
- ![Star - NLP/Agent-Eval-Refine)
- ![Star - moca)
- MMInA: Benchmarking Multihop Multimodal Internet Agents
- ![Star
- Autonomous Evaluation and Refinement of Digital Agents
- ![Star - NLP/Agent-Eval-Refine)
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation
- ![Star
- ![Star
- VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
- GUICourse: From General Vision Language Models to Versatile GUI Agents
- ![Star
- GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
- ![Star - Chen/GUI-World)
- GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
- ![Star - Odyssey)
- VideoGUI: A Benchmark for GUI Automation from Instructional Videos
- ![Star
- Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
- ![Star - ai-lab/Screen-Point-and-Read)
- MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents
- ![Star - agent-bench)
- AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
- ![Star - research/android_world)
- Practical, Automated Scenario-based Mobile App Testing
- WebCanvas: Benchmarking Web Agents in Online Environments
- On the Effects of Data Scale on Computer Control Agents
- ![Star - research/google-research/tree/master/android_control)
- ![Star - codebase)
- Windows Agent Arena
- ![Star
- ![PDF
- Harnessing Webpage UIs for Text-Rich Visual Understanding
- ![Star
- AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
- ![Star - research/android_world)
- Practical, Automated Scenario-based Mobile App Testing
- WebCanvas: Benchmarking Web Agents in Online Environments
- On the Effects of Data Scale on Computer Control Agents
- ![Star - research/google-research/tree/master/android_control)
- CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
- ![Star - ai/crab)
- WebVLN: Vision-and-Language Navigation on Websites
- ![Star
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
- CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
- ![Star - ai/crab)
- WebVLN: Vision-and-Language Navigation on Websites
- ![Star
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
- ![Star - ai/Spider2-V)
- ![Star - ai/Spider2-V)
- AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
- ![Star - codebase)
- Windows Agent Arena
- ![Star
- AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
- ![PDF
- Harnessing Webpage UIs for Text-Rich Visual Understanding
- ![Star
- WebWalker: Benchmarking LLMs in Web Traversal
- ![Star - nlp/WebWalker)
- GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent
- ![Star - ACES-ISE/ChatUITest)
- A Unified Solution for Structured Web Data Extraction
- OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
- ![Star - Copilot/OS-Genesis)
- A3: Android Agent Arena for Mobile GUI Agents
- Rico: A Mobile App Dataset for Building Data-Driven Design Applications
- ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
- ![Star - Pro-GUI-Grounding)
- ![PDF
- ![Star - codebase)
- ![Star
- ![Star
- ![Star - agent-bench)
- ![Star - research/android_world)
- ![Star - ai/crab)
- ![Star
- ![Star - ai/Spider2-V)
- SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
- LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark
- ![Star
Models / Agents

Programming Languages

Categories

Datasets / Benchmarks 173 Models / Agents 78

Sub Categories

Keywords

web-agent 1 rag 1 multi-agent 1 llm 1 information-seeking 1 artificial-intelligence 1 alibaba 1 agent 1