https://github.com/Coobiw/MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
https://github.com/Coobiw/MPP-LLaVA

deepspeed fine-tuning mllm model-parallel multimodal-large-language-models pipeline-parallelism pretraining qwen video-language-model video-large-language-models

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/Coobiw/MPP-LLaVA
Owner: Coobiw
Created: 2023-10-24T17:27:32.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2024-07-16T06:41:43.000Z (about 1 year ago)
Last Synced: 2024-07-16T09:02:07.626Z (about 1 year ago)
Topics: deepspeed, fine-tuning, mllm, model-parallel, multimodal-large-language-models, pipeline-parallelism, pretraining, qwen, video-language-model, video-large-language-models
Language: Jupyter Notebook
Homepage:
Size: 73.1 MB
Stars: 307
Watchers: 4
Forks: 16
Open Issues: 6
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-LLM-resources - MPP-LLaVA

README

          


    



- [MPP-Qwen-Next: Multimodal Pipeline Parallel based on QwenLM](#mpp-qwen-next-multimodal-pipeline-parallel-based-on-qwenlm)

  - [News](#news)

  - [Framework](#framework)

  - [Features](#features)

    - [图像-单轮问答](#图像-单轮问答)

    - [图像-多轮对话](#图像-多轮对话)

    - [视频-对话](#视频-对话)

    - [多图-对话（未经过多图sft，视频sft后涌现该能力）](#多图-对话未经过多图sft视频sft后涌现该能力)

  - [TODO LIST](#todo-list)

  - [Installation](#installation)

  - [Weight\&Data Preparation](#weightdata-preparation)

  - [推理](#推理)

  - [流水线并行训练(PP+DP)](#流水线并行训练ppdp)

  - [二阶段训练loss曲线参考](#二阶段训练loss曲线参考)

  - [Custom Data Format(如果你想continue training)](#custom-data-format如果你想continue-training)

  - [Acknowledgement](#acknowledgement)

  - [License](#license)

  - [Star History](#star-history)

# MPP-Qwen-Next: Multimodal Pipeline Parallel based on QwenLM

https://github.com/Coobiw/MiniGPT4Qwen/assets/48615375/963416dd-fd97-4680-b7ac-fa4a14beaaae

  

  Your browser does not support the video tag.

https://github.com/Coobiw/MiniGPT4Qwen/assets/48615375/0e7c33f6-33d3-478a-ab0e-ecc116aeec78

  

  Your browser does not support the video tag.

## News

- [2024/6] 🔥 开源MPP-Qwen-Next的sft权重(15GB) [modelscope链接](https://www.modelscope.cn/models/Coobiw/MPP-Qwen-Next) [百度网盘链接](https://pan.baidu.com/s/15rfwuCfM_sdViWQJv1mZmg?pwd=baka)

- [2024/6] 🔥 **MPP-Qwen-Next**: 加入llava的多轮对话sft数据以及videochatgpt的100k sft数据，**支持图像多轮对话，视频对话，并涌现出多图对话能力** [知乎博客](https://zhuanlan.zhihu.com/p/703597348)

- [2024/5] 🔥 代码支持多轮对话sft、视频sft、多图sft

- [2024/4] 🔥 支持多卡推理，修正chat template以获得更好的对话效果 [知乎博客](https://zhuanlan.zhihu.com/p/698549757)

- [2024/3] 🔥 **MPPQwen-14B**: Extend MiniGPT4Qwen-14B to MPP-Qwen14B(Multimodal Pipeline Parallel). 数据和训练范式参照LLaVA（pretrain + sft)，指令微调时打开LLM。**全部训练过程在6张RTX4090上完成** [README&Tutorial](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/MPPQwen14B_README.md)； [知乎博客](https://zhuanlan.zhihu.com/p/687106694)

- [2024/2] 🔥 **MiniGPT4Qwen-14B**: Scaling Up MiniGPT4Qwen to 14B. **使用DeepSpeed Pipeline Parallel让全过程仅使用2张4090显卡** [README&Tutorial](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/MiniGPT4Qwen_README.md)； [知乎博客](https://zhuanlan.zhihu.com/p/684462477)

- [2023/10] 🔥 **MiniGPT4Qwen**：采用18.8k的高质量双语指令微调数据，得到**单阶段训练的个人版双语MLLM** [README&Tutorial](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/MiniGPT4Qwen_README.md)； [知乎博客](https://zhuanlan.zhihu.com/p/664612306)

## Framework

![](./assets/MPPQwen/framework.png)

## Features

### 图像-单轮问答

![](assets/MPPQwen/pic1.jpg)

### 图像-多轮对话

![](assets/MPPQwen/pic2.jpg)

### 视频-对话

![](assets/MPPQwen/pic3.jpg)

### 多图-对话（未经过多图sft，视频sft后涌现该能力）

---

无视频sft的MPP-14B模型多图对话（看似回答，实际啥都没说）：

![](assets/MPPQwen/pic4.jpg)

---

视频sft后的MPPQwen-8B模型（具备比较不同图像的能力）：

![](assets/MPPQwen/pic5.jpg)

## TODO LIST

- [ ] 加入huggingface-transformers实现，并push到huggingface

- [x] 开源sft权重（modelscope & 百度网盘）

- [x] 支持单图推理、多图推理、视频推理

- [x] 支持model parallelism的推理（使用了transformers的`device_map="auto"`）

- [x] 开源pretrain权重

- [x] 开源处理好的pretrain和sft的数据集json文件

- [x] 支持多轮对话、多图sft、视频sft

- [x] 支持deepspeed的流水线并行

## Installation

```bash

conda create -n minigpt4qwen python=3.8 && conda activate minigpt4qwen

pip install -e .

```

## Weight&Data Preparation

请放在`cache`目录中，结构如下

![](assets/MPPQwen/pic6.jpg)

模型权重请参照：[WEIGHT.md](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/WEIGHT.md)

训练数据请参照：[DATA.md](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/DATA.md)

## 推理

请先按照[WEIGHT.md](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/WEIGHT.md)配置好权重

并在以下链接中二选一，下载sft后的模型权重（15GB）：

- [modelscope链接](https://www.modelscope.cn/models/Coobiw/MPP-Qwen-Next)

- [百度网盘链接](https://pan.baidu.com/s/15rfwuCfM_sdViWQJv1mZmg?pwd=baka)

### 运行命令行demo

**Single-GPU Inference**

```bash

python cli_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth

```

**MultiGPU(llm使用`device_map="auto"加载`，可以多卡加载LLM部分模型：**

```bash

python cli_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --llm_device_map "auto"

```

**CPU（速度慢）:**

```bash

python cli_demo.py--model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --cpu-only # 如果显存足够(>=20GB)可以不要--cpu-only

```

运行后需要输入图片路径，可以输入多张图片，用`:f`结束图片路径输入后进入对话

常见操作：

> :help 查看help

>

> :clear 清空当前命令行

>

> :clh 清空对话历史（但图像输入不会更改）

>

> :his 查看对话历史

>

> :img 查看输入的图像路径

### 运行gradio webui demo

**Single-GPU Inference**

```bash

python webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth

```

**MultiGPU(llm使用`device_map="auto"加载`**

```bash

python webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --llm_device_map "auto"

```

**CPU：**

```bash

python webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --cpu-only # 如果显存足够(>=20GB)可以不要--cpu-only

```

## 流水线并行训练(PP+DP)

下面为8卡3090运行指令:

### Pretrain

> nproc_per_node: 8

> dp: 4

> pp: 2

> nproc_per_node = pp * dp

```bash

python -m torch.distributed.run --nproc_per_node=8 train_pipeline.py --cfg-path lavis/projects/pp_qwen7b_video/pretrain.yaml --num-stages 2

```

### SFT

> nproc_per_node: 8

> dp: 1

> pp: 8

> nproc_per_node = pp * dp

```bash

python -m torch.distributed.run --nproc_per_node=8 train_pipeline.py --cfg-path lavis/projects/pp_qwen7b_video/sft.yaml --num-stages 8

```

### pipeline parallel的权重转换为pth文件

#### 预训练阶段:

（仅转换linear projection层）

```bash

python pipe_proj2pth.py --ckpt-dir lavis/output/pp_7b_video/pretrain/global_step2181

```

转换后，模型文件会存储在`ckpt_dir`底下，名为`model.pth`

#### sft阶段

（需要转换projection层和所有LLM的参数）

```bash

python pipemodel2pth.py --ckpt-dir lavis/output/pp_7b_video/sft_video/global_step2005

```

转换后，模型文件会存储在`ckpt_dir`底下，名为`unfreeze_llm_model.pth`

## 二阶段训练loss曲线参考

---

pretrain：

![](./assets/MPPQwen/curve1.jpg)

---

sft:

![](./assets/MPPQwen/curve2.jpg)

## Custom Data Format(如果你想continue training)

处理函数可以参考: [https://github.com/Coobiw/MiniGPT4Qwen/releases/download/MPP-Qwen-Next_ckpt-and-data/ckpt-and-data.zip](https://github.com/Coobiw/MiniGPT4Qwen/releases/download/MPP-Qwen-Next_ckpt-and-data/ckpt-and-data.zip)中，llava_instuct和videochatgpt目录里的`analysis.py`脚本

***P.S.: 如果路径经常出错，可以把所有路径都改成绝对路径（包括dataset configs）***

### 图像指令微调数据格式

单轮(instruction和output为`str`)：

```json

[

    {

        "image": "000000215677.jpg",

        "instruction": " {question}",

        "output": "{answer}"

    },

]

```

多轮(instruction和output为等长的`list`)：

```json

{

        "image": "000000479443.jpg",

        "instruction": [

            " {question1}",

            "{question2}",

            "..."

        ],

        "output": [

            "{answer1}",

            "{answer2}",

            "..."

        ]

    },

```

### 视频指令微调数据格式

```json

[

    {

        "video": "v_k_ZXmr8pmrs.mkv",

        "instruction": " {question}",

        "output": "{answer}"

    }

]

```

## Acknowledgement

- [Lavis](https://github.com/salesforce/LAVIS) 本仓库是基于lavis进行构建的，且使用了其中BLIP2的ViT和Q-former

- [QwenLM](https://github.com/QwenLM/Qwen) 本仓库的语言模型采用Qwen7B-Chat

- [DeepSpeed](https://github.com/microsoft/DeepSpeed) 👍

- [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples) 👍👍

- [LLaVA](https://github.com/haotian-liu/LLaVA) 参照其训练范式，使用了其预训练和指令微调数据

- [VideoChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT) 使用其视频sft的100k数据

- [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA) 提供videochatgpt视频数据的百度网盘下载链接

## License

- 本仓库的许多代码是基于[Lavis](https://github.com/salesforce/LAVIS) 的，其采用 [BSD 3-Clause License](https://github.com/Vision-CAIR/MiniGPT-4/blob/main/LICENSE_Lavis.md).

- 本仓库采用Qwen-7B-Chat，支持商用和科研、开发用途，其License为[LICENSE](https://github.com/QwenLM/Qwen/blob/main/LICENSE)

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Coobiw/MPP-LLaVA&type=Date)](https://star-history.com/#Coobiw/MPP-LLaVA&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Coobiw/MPP-LLaVA

Awesome Lists containing this project

README