Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-ai-for-gui-agents
Awesome resources about AI for GUI Agents.
https://github.com/maxi-w/awesome-ai-for-gui-agents
Last synced: about 5 hours ago
JSON representation
-
Models
-
Datasets
-
META-GUI
-
WaveUI-25k
-
GroundUI-1k & GroundUI-18k
-
GUI-World
-
WebSRC
-
-
Research Papers
-
WebSRC
- ScreenAI: A Vision-Language Model for UI and Infographics Understanding
- MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling
- ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
- GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
- Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
- UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface
- AgentStudio: A Toolkit for Building General Virtual Agents
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
- ScreenAgent: A Vision Language Model-driven Computer Control Agent
- CogAgent: A Visual Language Model for GUI Agents
- From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
- Android in the Wild: A Large-Scale Dataset for Android Device Control
- Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
- Towards Better Semantic Understanding of Mobile Interfaces
- META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
- VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling
- ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces
- WebSRC: A Dataset for Web-Based Structural Reading Comprehension
- UIBert: Learning Generic Multimodal Representations for UI Understanding
- Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels
- Object Detection for Graphical User Interface: Old Fashioned or Deep Learning or a Combination?
-
Categories