Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/abdur75648/v-zen

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Resources
https://github.com/abdur75648/v-zen

agi chatgpt gpt grounding-dino gui gui-automation large-language-models llama llm mistral multimodal-large-language-models superagi vicuna

Last synced: 1 day ago
JSON representation

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Resources

Awesome Lists containing this project

README

        

# V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

## Introduction



SuperAGI logo


SuperAGI logo

[![V-Zen](https://img.shields.io/badge/V--Zen-blueviolet?logo=github&style=flat-square)](https://github.com/abdur75648/V-Zen)
[![SuperAGI](https://img.shields.io/badge/SuperAGI-purple?style=flat-square)](https://superagi.com/)
[![arXiv](https://img.shields.io/badge/arXiv-2405.15341-darkred.svg)](https://arxiv.org/abs/2405.15341)
[![Demo](https://img.shields.io/badge/Demo-Online-brightgreen.svg)](https://superagi.com/)

V-Zen is a novel multimodal large language model (LLM) designed for efficient GUI understanding and precise grounding. Our model introduces an innovative architecture that significantly improves the performance of GUI automation tasks.

## Code Availability
Coming Soon...

## Citation
If you find this work useful, please consider citing the following paper:

```
@article{author2024vzen,
title={V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM},
author={Abdur Rahman and Rajat Chawla and Muskaan Kumar and Arkajit Datta and Adarsh Jha and Mukunda NS and Ishaan Bhola},
journal={arXiv preprint arXiv:2405.15341},
year={2024},
eprint={2405.15341},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2405.15341},
}

```