Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abdur75648/v-zen
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Resources
https://github.com/abdur75648/v-zen
agi chatgpt gpt grounding-dino gui gui-automation large-language-models llama llm mistral multimodal-large-language-models superagi vicuna
Last synced: 4 months ago
JSON representation
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Resources
- Host: GitHub
- URL: https://github.com/abdur75648/v-zen
- Owner: abdur75648
- License: apache-2.0
- Created: 2024-07-21T07:45:44.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-07-21T07:46:36.000Z (7 months ago)
- Last Synced: 2024-10-10T04:33:48.205Z (4 months ago)
- Topics: agi, chatgpt, gpt, grounding-dino, gui, gui-automation, large-language-models, llama, llm, mistral, multimodal-large-language-models, superagi, vicuna
- Homepage: https://arxiv.org/abs/2405.15341
- Size: 5.86 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM
## Introduction
[![V-Zen](https://img.shields.io/badge/V--Zen-blueviolet?logo=github&style=flat-square)](https://github.com/abdur75648/V-Zen)
[![SuperAGI](https://img.shields.io/badge/SuperAGI-purple?style=flat-square)](https://superagi.com/)
[![arXiv](https://img.shields.io/badge/arXiv-2405.15341-darkred.svg)](https://arxiv.org/abs/2405.15341)
[![Demo](https://img.shields.io/badge/Demo-Online-brightgreen.svg)](https://superagi.com/)V-Zen is a novel multimodal large language model (LLM) designed for efficient GUI understanding and precise grounding. Our model introduces an innovative architecture that significantly improves the performance of GUI automation tasks.
## Code Availability
Coming Soon...## Citation
If you find this work useful, please consider citing the following paper:```
@article{author2024vzen,
title={V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM},
author={Abdur Rahman and Rajat Chawla and Muskaan Kumar and Arkajit Datta and Adarsh Jha and Mukunda NS and Ishaan Bhola},
journal={arXiv preprint arXiv:2405.15341},
year={2024},
eprint={2405.15341},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2405.15341},
}```