awesome-ui-agents
A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond (continually updated)
https://github.com/opendilab/awesome-ui-agents
Last synced: about 23 hours ago
JSON representation
-
Papers
-
Models
- VSC-RL: Advancing Autonomous Vision-Language Agents with Variational Subgoal-Conditioned Reinforcement Learning
- AppVLM: A Lightweight Vision Language Model for Online App Control
- DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
- code
- Apple Intelligence Foundation Language Models
- CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
- code
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
- code
- Intention-inInteraction (IN3): Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
- code
- code
- code
- ScreenAI: A Vision-Language Model for UI and Infographics Understanding
- ScreenAgent: A Vision Language Model-driven Computer Control Agent
- code
- AppAgent: Multimodal Agents as Smartphone Users
- Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
- code
- Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
- WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
- code
- OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
- code
- UFO: A UI-Focused Agent for Windows OS Interaction
- code
- Octopus v2: On-device language model for super agent
- Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study
- code
- CogAgent: A Visual Language Model for GUI Agents
- code
- Octopus: Embodied Vision-Language Programmer from Environmental Feedback
- code
- You Only Look at Screens: Multimodal Chain-of-Action Agents
- code
- code
- Augmenting Autotelic Agents with Large Language Models
- Language Models can Solve Computer Tasks
- code
- DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
- Autowebglm: Bootstrap and reinforce a large language model-based web navigating agent
- Dual-view visual contextualization for web navigation
- Agent-e: From autonomous web navigation to foundational design principles in agentic systems
- Tree search for language model agents
- GPT-4V(ision) is a Generalist Web Agent, if Grounded
- Agent S: an open agentic framework that uses computers like a human
- LASER: LLM Agent with State-Space Exploration for Web Navigation
- code
- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
- code
- Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
- code
- On the Effects of Data Scale on UI Control Agents
- Cradle: Empowering Foundation Agents Towards General Computer Control
- Lightweight Neural App Control
- SeeAct GPT-4V(ision) is a Generalist Web Agent, if Grounded
- MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot
- OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
- code
- Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents
- code
- code
- code
- code
- code
- code
- code
- OpenAI operator
- OpenAI Computer-Using Agent
- Claude computer use
- Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
- Lightweight Neural App Control
- Enhancing Software Agents with Monte Carlo Tree Search and Hindsight Feedback
- code
- Programming with Pixels: Towards Generalist Software Engineering Agents
- E-commerce UI/UX Optimization via Generative AI, MDD, and Multi-Agent Reinforcement Learning
- CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
- Enhance Mobile Agents Thinking Process Via Iterative Preference Learning
- UI-Venus Technical Report: Building High-performance UI Agents with RFT
- Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization
- SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
- Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery for Foundation Model Internet Agents
-
Tools
- Opera Browser Operator: AI-based Agentic Browsing
- code
- code
- AndroidEnv: A Reinforcement Learning Platform for Android
- LEGENT: An Open Platform for Embodied Agentb Agents on Large Language Models
- code
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation
- code
- WebArena: A Realistic Web Environment for Building Autonomous Agents
- code
- Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction
- code
- OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent
- code
- Make Websites Accessible for Agents
- code
- ToolGen: Unified Tool Retrieval and Calling via Generation
- code
-
Datasets
- SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
- code
- code
- AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks
- code
- code
- MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents
- code
- VillagerBench/VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft
- code
- CToolEval: A Chinese Benchmark for LLM-Powered Agent Evaluation in Real-World API Interactions
- code
- Multi-Turn Mind2Web: On the Multi-turn Instruction Following for Conversational Web Agents
- code
- VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
- code
- Android in the Zoo: Chain-of-Action-Thought for GUI Agents
- Android in the Wild: A Large-Scale Dataset for Android Device Control
- Mind2Web: Towards a Generalist Agent for the Web
- code
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
- code
- Rico: A Mobile App Dataset for Building Data-Driven Design Applications
- WebCanvas: Benchmarking Web Agents in Online Environments
- code
-
-
Related Repositories
Programming Languages
Categories
Sub Categories
Keywords
llm
11
agent
9
nlp
4
generative-ai
3
agents
3
ai
3
large-language-models
3
mllm
2
vlm
2
grounding
2
multimodal
2
gpt4v
2
decision-making
2
llms
2
chatgpt
2
reinforcement-learning
2
android
2
gui-agents
2
ai-agents
2
tool-learning
2
automation
2
language-model
2
copilot
2
gui
2
llm-powered-agents
1
gpt
1
app
1
executable-langauge-grounding
1
code-generation
1
assistant-chat-bots
1
windows
1
harmony
1
ios
1
mobile
1
mobile-agents
1
reasoning
1
multimodal-agent
1
multimodal-large-language-models
1
prompting
1
cross-modality
1
ui
1
multi-modal
1
pretrained-models
1
visual-language-models
1
semantic-parsing
1
language-model-agent
1
awesome-list
1
embodied-agent
1
embodied-ai
1
foundation-model
1