Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
https://github.com/NVlabs/VILA
Last synced: 29 days ago
JSON representation
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
- Host: GitHub
- URL: https://github.com/NVlabs/VILA
- Owner: NVlabs
- License: apache-2.0
- Created: 2024-02-23T09:19:16.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-10-24T13:29:43.000Z (about 2 months ago)
- Last Synced: 2024-10-29T15:35:20.672Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 42.4 MB
- Stars: 1,944
- Watchers: 27
- Forks: 157
- Open Issues: 53
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- ai-game-devtools - VILA - training for Visual Language Models. |[arXiv](https://arxiv.org/abs/2312.07533) | | Visual | (<span id="visual">Visual</span> / <span id="tool">Tool (AI LLM)</span>)
- StarryDivineSky - NVlabs/VILA - 一种具有训练、推理和评估配方的多图像视觉语言模型,可从云部署到边缘(Jetson Orin 和笔记本电脑)。VILA 是一种视觉语言模型 (VLM),使用大规模交错的图文数据进行预训练,可实现视频理解和多图像理解能力。VILA 可通过 AWQ 4bit 量化和 TinyChat 框架在边缘部署。我们发现:(1)图文对是不够的,交错的图文是必不可少的;(2)交错图文预训练中的解冻LLM使上下文学习成为可能;(3)重新混合纯文本指令数据对于提高VLM和纯文本性能至关重要;(4) 令牌压缩扩展 #video 帧。VILA展示了吸引人的功能,包括:视频推理、上下文学习、视觉思维链和更好的世界知识。 (多模态大模型 / 网络服务_其他)
- AiTreasureBox - NVlabs/VILA - 12-07_2073_2](https://img.shields.io/github/stars/NVlabs/VILA.svg)|VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)| (Repos)