Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/showlab/videogui
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
https://github.com/showlab/videogui
gui llm-agent video-language
Last synced: about 2 months ago
JSON representation
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
- Host: GitHub
- URL: https://github.com/showlab/videogui
- Owner: showlab
- Created: 2024-06-16T16:38:39.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-10-22T03:05:19.000Z (2 months ago)
- Last Synced: 2024-10-23T04:24:30.198Z (2 months ago)
- Topics: gui, llm-agent, video-language
- Language: JavaScript
- Homepage: https://showlab.github.io/videogui/
- Size: 32.2 MB
- Stars: 20
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# [VideoGUI: A Benchmark for GUI Automation from Instructional Videos](https://showlab.github.io/videogui/)
[Kevin Qinghong Lin](https://qinghonglin.github.io/), [Linjie Li](https://scholar.google.com/citations?user=WR875gYAAAAJ&hl=en), [Difei Gao](https://scholar.google.com/citations?user=No9OsocAAAAJ&hl=en), Qinchen Wu,
Mingyi Yan, [Zhengyuan Yang](https://zyang-ur.github.io/), [Lijuan Wang](https://www.microsoft.com/en-us/research/people/lijuanw/), [Mike Zheng Shou](https://sites.google.com/view/showlab)[![Project Website](https://img.shields.io/badge/Project-Website-blue)](https://showlab.github.io/videogui/)
## 📢 News
- [2024.6] We release the arXiv paper.
- [2024.9] Accepted by NeurIPS 2024 D&B.
- [2024.10] We released the data at [Huggingface dataset](https://huggingface.co/VideoGUI). Please stay tuned for further updates.## 📖 Introduction
> **TL;DR:** A Multi-modal Benchmark for Visual-centric GUI Automation from Instructional Videos.![overview](./assets/teaser.png)
**Visual-centric softwares and tasks:** VideoGUI focuses on professional and novel software like PR and AE for video editing, or Stable Diffusion and Runway for visual creation. Besides, the task query emphasizes visual preview rather than textual instructions.
**Instructional videos with human demonstration:** We source novel tasks from high-quality instructional videos, with annotators replicating these to reproduce effects.
**Hierarchical planning and actions:** We provide detailed annotations with planning procedures and recorded actions for hierarchical evaluation.