{"id":14964503,"url":"https://github.com/abdur75648/v-zen","last_synced_at":"2025-09-08T12:37:22.997Z","repository":{"id":249483178,"uuid":"831647450","full_name":"abdur75648/V-Zen","owner":"abdur75648","description":"V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM  Resources","archived":false,"fork":false,"pushed_at":"2024-07-21T07:46:36.000Z","size":6,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-31T06:41:42.156Z","etag":null,"topics":["agi","chatgpt","gpt","grounding-dino","gui","gui-automation","large-language-models","llama","llm","mistral","multimodal-large-language-models","superagi","vicuna"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2405.15341","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abdur75648.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-21T07:45:44.000Z","updated_at":"2024-10-14T16:25:25.000Z","dependencies_parsed_at":"2024-07-21T08:50:46.616Z","dependency_job_id":"7a2f3459-413b-47c7-b2b3-8ce14dcb58e4","html_url":"https://github.com/abdur75648/V-Zen","commit_stats":null,"previous_names":["abdur75648/v-zen"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdur75648%2FV-Zen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdur75648%2FV-Zen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdur75648%2FV-Zen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdur75648%2FV-Zen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abdur75648","download_url":"https://codeload.github.com/abdur75648/V-Zen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238104557,"owners_count":19417156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agi","chatgpt","gpt","grounding-dino","gui","gui-automation","large-language-models","llama","llm","mistral","multimodal-large-language-models","superagi","vicuna"],"created_at":"2024-09-24T13:33:16.608Z","updated_at":"2025-02-10T11:31:49.914Z","avatar_url":"https://github.com/abdur75648.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM\n\n## Introduction\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://superagi.com//#gh-light-mode-only\"\u003e\n    \u003cimg src=\"https://superagi.com/wp-content/uploads/2023/05/Logo-dark.svg\" width=\"318px\" alt=\"SuperAGI logo\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://superagi.com//#gh-dark-mode-only\"\u003e\n    \u003cimg src=\"https://superagi.com/wp-content/uploads/2023/05/Logo-light.svg\" width=\"318px\" alt=\"SuperAGI logo\" /\u003e\n  \u003c/a\u003e\n\n\u003c/p\u003e\n\n[![V-Zen](https://img.shields.io/badge/V--Zen-blueviolet?logo=github\u0026style=flat-square)](https://github.com/abdur75648/V-Zen)\n[![SuperAGI](https://img.shields.io/badge/SuperAGI-purple?style=flat-square)](https://superagi.com/)\n[![arXiv](https://img.shields.io/badge/arXiv-2405.15341-darkred.svg)](https://arxiv.org/abs/2405.15341)\n[![Demo](https://img.shields.io/badge/Demo-Online-brightgreen.svg)](https://superagi.com/)\n\nV-Zen is a novel multimodal large language model (LLM) designed for efficient GUI understanding and precise grounding. Our model introduces an innovative architecture that significantly improves the performance of GUI automation tasks.\n\n## Code Availability\nComing Soon...\n\n## Citation\nIf you find this work useful, please consider citing the following paper:\n\n```\n@article{author2024vzen,\n      title={V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM},\n      author={Abdur Rahman and Rajat Chawla and Muskaan Kumar and Arkajit Datta and Adarsh Jha and Mukunda NS and Ishaan Bhola},\n      journal={arXiv preprint arXiv:2405.15341},\n      year={2024},\n      eprint={2405.15341},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https://arxiv.org/abs/2405.15341}, \n}\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdur75648%2Fv-zen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabdur75648%2Fv-zen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdur75648%2Fv-zen/lists"}