Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/VPGTrans/VPGTrans
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
https://github.com/VPGTrans/VPGTrans
large-scale-language-modeling llm vision-language-model vl-llm
Last synced: 3 months ago
JSON representation
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
- Host: GitHub
- URL: https://github.com/VPGTrans/VPGTrans
- Owner: VPGTrans
- License: bsd-3-clause
- Created: 2023-04-30T10:27:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-13T02:28:31.000Z (about 1 year ago)
- Last Synced: 2024-06-24T05:55:51.094Z (5 months ago)
- Topics: large-scale-language-modeling, llm, vision-language-model, vl-llm
- Language: Python
- Homepage: https://vpgtrans.github.io/
- Size: 3.97 MB
- Stars: 263
- Watchers: 6
- Forks: 25
- Open Issues: 3
Awesome Lists containing this project
- StarryDivineSky - VPGTrans/VPGTrans - 语言模型(VL-LLM)往往需要消耗大量的资源,所以现有的解决方案都是把语言模型和视觉提示生成模型(Visual Prompt Generator, VPG)连接起来,但即便如此,继续调整VPG仍然需要几千个GPU小时和数百万的训练数据。通过我们提出的VPGTrans方法,可以快速(少于10%训练时间)将已有的多模态对话模型的视觉模块迁移到新的语言模型,且达到类似或更优效果。现有的常用的VL-LLM基本采取的架构:VPG(比如1.2B)->Projector(4M)->LLM(比如11B),在一个基座LLM基础上训练一个视觉soft prompt生成模块(Visual Prompt Generator, VPG),以及一个进行维度变换的线性层(Projector)。在训练过程中,LLM参数一般不会被更新,或者仅仅更新非常少量的参数。可训练参数主要来自于VPG和projector。VPGTrans框架: (1) 一阶段:projector的warm-up (2) 二阶段: 整体微调。(1)第一阶段:我们首先使用词向量转化器和原有projector进行融合作为新projector的初始化,然后用5倍学习率训练新projector一个epoch。(2)第二阶段:直接正常训练VPG和projector。 (多模态大模型 / 网络服务_其他)