Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zzxslp/MM-Navigator
GPT-4V in Wonderland: LMMs as Smartphone Agents
https://github.com/zzxslp/MM-Navigator
gpt4v llm-agents web-navigation
Last synced: 22 days ago
JSON representation
GPT-4V in Wonderland: LMMs as Smartphone Agents
- Host: GitHub
- URL: https://github.com/zzxslp/MM-Navigator
- Owner: zzxslp
- Created: 2023-11-13T18:28:03.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-17T14:48:25.000Z (5 months ago)
- Last Synced: 2024-07-18T05:48:33.596Z (5 months ago)
- Topics: gpt4v, llm-agents, web-navigation
- Language: Python
- Homepage:
- Size: 28.4 MB
- Stars: 117
- Watchers: 15
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
# GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
[GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation](https://arxiv.org/pdf/2311.07562.pdf)Our code and evaluation benchmark will be out soon!
## Demo
A demo figure using GPT-4V to shop on the Amazon app with an iphone:
## Citation
If you find our work helpful to your research, please consider citing the paper:
```
@article{yan2023gpt,
title={GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation},
author={Yan, An and Yang, Zhengyuan and Zhu, Wanrong and Lin, Kevin and Li, Linjie and Wang, Jianfeng and Yang, Jianwei and Zhong, Yiwu and McAuley, Julian and Gao, Jianfeng and others},
journal={arXiv preprint arXiv:2311.07562},
year={2023}
}
```