{"id":17377213,"url":"https://github.com/Coobiw/MPP-LLaVA","last_synced_at":"2025-02-27T06:30:37.433Z","repository":{"id":203364274,"uuid":"709422137","full_name":"Coobiw/MPP-LLaVA","owner":"Coobiw","description":"Personal Project: MPP-Qwen14B \u0026 MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.","archived":false,"fork":false,"pushed_at":"2024-07-16T06:41:43.000Z","size":76623,"stargazers_count":307,"open_issues_count":6,"forks_count":16,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-07-16T09:02:07.626Z","etag":null,"topics":["deepspeed","fine-tuning","mllm","model-parallel","multimodal-large-language-models","pipeline-parallelism","pretraining","qwen","video-language-model","video-large-language-models"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Coobiw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-24T17:27:32.000Z","updated_at":"2024-07-16T09:02:18.477Z","dependencies_parsed_at":"2023-11-08T11:35:15.824Z","dependency_job_id":"153d511d-b02e-41da-b225-3e3e73aa581d","html_url":"https://github.com/Coobiw/MPP-LLaVA","commit_stats":null,"previous_names":["coobiw/minigpt4qwen","coobiw/mpp-llava"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Coobiw%2FMPP-LLaVA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Coobiw%2FMPP-LLaVA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Coobiw%2FMPP-LLaVA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Coobiw%2FMPP-LLaVA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Coobiw","download_url":"https://codeload.github.com/Coobiw/MPP-LLaVA/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219842822,"owners_count":16556560,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deepspeed","fine-tuning","mllm","model-parallel","multimodal-large-language-models","pipeline-parallelism","pretraining","qwen","video-language-model","video-large-language-models"],"created_at":"2024-10-16T05:01:03.933Z","updated_at":"2024-10-16T05:02:48.531Z","avatar_url":"https://github.com/Coobiw.png","language":"Jupyter Notebook","funding_links":[],"categories":["技巧 Tips"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"./assets/MPPQwen/logo.webp\" alt=\"MPP-Qwen-Next logo\" width=\"300\" /\u003e\n\u003c/div\u003e\n\n- [MPP-Qwen-Next: Multimodal Pipeline Parallel based on QwenLM](#mpp-qwen-next-multimodal-pipeline-parallel-based-on-qwenlm)\n  - [News](#news)\n  - [Framework](#framework)\n  - [Features](#features)\n    - [图像-单轮问答](#图像-单轮问答)\n    - [图像-多轮对话](#图像-多轮对话)\n    - [视频-对话](#视频-对话)\n    - [多图-对话（未经过多图sft，视频sft后涌现该能力）](#多图-对话未经过多图sft视频sft后涌现该能力)\n  - [TODO LIST](#todo-list)\n  - [Installation](#installation)\n  - [Weight\\\u0026Data Preparation](#weightdata-preparation)\n  - [推理](#推理)\n  - [流水线并行训练(PP+DP)](#流水线并行训练ppdp)\n  - [二阶段训练loss曲线参考](#二阶段训练loss曲线参考)\n  - [Custom Data Format(如果你想continue training)](#custom-data-format如果你想continue-training)\n  - [Acknowledgement](#acknowledgement)\n  - [License](#license)\n  - [Star History](#star-history)\n\n\n\n# MPP-Qwen-Next: Multimodal Pipeline Parallel based on QwenLM\n\nhttps://github.com/Coobiw/MiniGPT4Qwen/assets/48615375/963416dd-fd97-4680-b7ac-fa4a14beaaae\n\n\u003cvideo controls\u003e\n  \u003csource src=\"https://github.com/Coobiw/MiniGPT4Qwen/assets/48615375/963416dd-fd97-4680-b7ac-fa4a14beaaae\" type=\"video/mp4\"\u003e\n  Your browser does not support the video tag.\n\u003c/video\u003e\n\nhttps://github.com/Coobiw/MiniGPT4Qwen/assets/48615375/0e7c33f6-33d3-478a-ab0e-ecc116aeec78\n\n\u003cvideo controls\u003e\n  \u003csource src=\"https://github.com/Coobiw/MiniGPT4Qwen/assets/48615375/0e7c33f6-33d3-478a-ab0e-ecc116aeec78\" type=\"video/mp4\"\u003e\n  Your browser does not support the video tag.\n\u003c/video\u003e\n\n## News\n- [2024/6] 🔥 开源MPP-Qwen-Next的sft权重(15GB) [modelscope链接](https://www.modelscope.cn/models/Coobiw/MPP-Qwen-Next) [百度网盘链接](https://pan.baidu.com/s/15rfwuCfM_sdViWQJv1mZmg?pwd=baka)\n- [2024/6] 🔥 **MPP-Qwen-Next**: 加入llava的多轮对话sft数据以及videochatgpt的100k sft数据，**支持图像多轮对话，视频对话，并涌现出多图对话能力** [知乎博客](https://zhuanlan.zhihu.com/p/703597348)\n- [2024/5] 🔥 代码支持多轮对话sft、视频sft、多图sft\n- [2024/4] 🔥 支持多卡推理，修正chat template以获得更好的对话效果 [知乎博客](https://zhuanlan.zhihu.com/p/698549757)\n- [2024/3] 🔥 **MPPQwen-14B**: Extend MiniGPT4Qwen-14B to MPP-Qwen14B(Multimodal Pipeline Parallel). 数据和训练范式参照LLaVA（pretrain + sft)，指令微调时打开LLM。**全部训练过程在6张RTX4090上完成** [README\u0026Tutorial](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/MPPQwen14B_README.md)； [知乎博客](https://zhuanlan.zhihu.com/p/687106694)\n- [2024/2] 🔥 **MiniGPT4Qwen-14B**: Scaling Up MiniGPT4Qwen to 14B. **使用DeepSpeed Pipeline Parallel让全过程仅使用2张4090显卡** [README\u0026Tutorial](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/MiniGPT4Qwen_README.md)； [知乎博客](https://zhuanlan.zhihu.com/p/684462477)\n- [2023/10] 🔥 **MiniGPT4Qwen**：采用18.8k的高质量双语指令微调数据，得到**单阶段训练的个人版双语MLLM** [README\u0026Tutorial](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/MiniGPT4Qwen_README.md)； [知乎博客](https://zhuanlan.zhihu.com/p/664612306)\n\n## Framework\n\n![](./assets/MPPQwen/framework.png)\n\n## Features\n\n### 图像-单轮问答\n![](assets/MPPQwen/pic1.jpg)\n\n### 图像-多轮对话\n![](assets/MPPQwen/pic2.jpg)\n\n### 视频-对话\n![](assets/MPPQwen/pic3.jpg)\n\n### 多图-对话（未经过多图sft，视频sft后涌现该能力）\n---\n无视频sft的MPP-14B模型多图对话（看似回答，实际啥都没说）：\n![](assets/MPPQwen/pic4.jpg)\n\n---\n视频sft后的MPPQwen-8B模型（具备比较不同图像的能力）：\n![](assets/MPPQwen/pic5.jpg)\n\n\n## TODO LIST\n- [ ] 加入huggingface-transformers实现，并push到huggingface\n- [x] 开源sft权重（modelscope \u0026 百度网盘）\n- [x] 支持单图推理、多图推理、视频推理\n- [x] 支持model parallelism的推理（使用了transformers的`device_map=\"auto\"`）\n- [x] 开源pretrain权重\n- [x] 开源处理好的pretrain和sft的数据集json文件\n- [x] 支持多轮对话、多图sft、视频sft\n- [x] 支持deepspeed的流水线并行\n\n## Installation\n\n```bash\nconda create -n minigpt4qwen python=3.8 \u0026\u0026 conda activate minigpt4qwen\npip install -e .\n```\n\n## Weight\u0026Data Preparation\n请放在`cache`目录中，结构如下\n![](assets/MPPQwen/pic6.jpg)\n\n模型权重请参照：[WEIGHT.md](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/WEIGHT.md)\n\n训练数据请参照：[DATA.md](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/DATA.md)\n\n## 推理\n请先按照[WEIGHT.md](https://github.com/Coobiw/MiniGPT4Qwen/blob/master/WEIGHT.md)配置好权重\n\n并在以下链接中二选一，下载sft后的模型权重（15GB）：\n- [modelscope链接](https://www.modelscope.cn/models/Coobiw/MPP-Qwen-Next)\n- [百度网盘链接](https://pan.baidu.com/s/15rfwuCfM_sdViWQJv1mZmg?pwd=baka)\n### 运行命令行demo\n\n**Single-GPU Inference**\n\n```bash\npython cli_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth\n```\n\n\n\n**MultiGPU(llm使用`device_map=\"auto\"加载`，可以多卡加载LLM部分模型：**\n\n```bash\npython cli_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --llm_device_map \"auto\"\n```\n\n\n**CPU（速度慢）:**\n\n```bash\npython cli_demo.py--model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --cpu-only # 如果显存足够(\u003e=20GB)可以不要--cpu-only\n```\n\n运行后需要输入图片路径，可以输入多张图片，用`:f`结束图片路径输入后进入对话\n\n常见操作：\n\n\u003e :help 查看help\n\u003e\n\u003e :clear 清空当前命令行\n\u003e\n\u003e :clh 清空对话历史（但图像输入不会更改）\n\u003e\n\u003e :his 查看对话历史\n\u003e\n\u003e :img 查看输入的图像路径\n\n### 运行gradio webui demo\n\n**Single-GPU Inference**\n\n```bash\npython webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth\n```\n\n\n\n**MultiGPU(llm使用`device_map=\"auto\"加载`**\n\n```bash\npython webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --llm_device_map \"auto\"\n```\n\n\n\n**CPU：**\n\n```bash\npython webui_demo.py --model-type qwen7b_chat -c lavis/output/pp_7b_video/sft_video/global_step2005/unfreeze_llm_model.pth --cpu-only # 如果显存足够(\u003e=20GB)可以不要--cpu-only\n```\n\n## 流水线并行训练(PP+DP)\n下面为8卡3090运行指令:\n\n### Pretrain\n\u003e nproc_per_node: 8\n\u003e dp: 4\n\u003e pp: 2\n\u003e nproc_per_node = pp * dp\n\n```bash\npython -m torch.distributed.run --nproc_per_node=8 train_pipeline.py --cfg-path lavis/projects/pp_qwen7b_video/pretrain.yaml --num-stages 2\n```\n\n### SFT\n\u003e nproc_per_node: 8\n\u003e dp: 1\n\u003e pp: 8\n\u003e nproc_per_node = pp * dp\n\n```bash\npython -m torch.distributed.run --nproc_per_node=8 train_pipeline.py --cfg-path lavis/projects/pp_qwen7b_video/sft.yaml --num-stages 8\n```\n\n### pipeline parallel的权重转换为pth文件\n\n#### 预训练阶段:\n\n（仅转换linear projection层）\n\n```bash\npython pipe_proj2pth.py --ckpt-dir lavis/output/pp_7b_video/pretrain/global_step2181\n```\n\n转换后，模型文件会存储在`ckpt_dir`底下，名为`model.pth`\n\n#### sft阶段\n\n（需要转换projection层和所有LLM的参数）\n\n```bash\npython pipemodel2pth.py --ckpt-dir lavis/output/pp_7b_video/sft_video/global_step2005\n```\n\n转换后，模型文件会存储在`ckpt_dir`底下，名为`unfreeze_llm_model.pth`\n\n## 二阶段训练loss曲线参考\n---\n\npretrain：\n![](./assets/MPPQwen/curve1.jpg)\n\n---\nsft:\n![](./assets/MPPQwen/curve2.jpg)\n\n## Custom Data Format(如果你想continue training)\n处理函数可以参考: [https://github.com/Coobiw/MiniGPT4Qwen/releases/download/MPP-Qwen-Next_ckpt-and-data/ckpt-and-data.zip](https://github.com/Coobiw/MiniGPT4Qwen/releases/download/MPP-Qwen-Next_ckpt-and-data/ckpt-and-data.zip)中，llava_instuct和videochatgpt目录里的`analysis.py`脚本\n\n***P.S.: 如果路径经常出错，可以把所有路径都改成绝对路径（包括dataset configs）***\n### 图像指令微调数据格式\n单轮(instruction和output为`str`)：\n```json\n[\n    {\n        \"image\": \"000000215677.jpg\",\n        \"instruction\": \"\u003cImg\u003e\u003cImageHere\u003e\u003c/Img\u003e {question}\",\n        \"output\": \"{answer}\"\n    },\n]\n```\n\n多轮(instruction和output为等长的`list`)：\n```json\n{\n        \"image\": \"000000479443.jpg\",\n        \"instruction\": [\n            \"\u003cImg\u003e\u003cImageHere\u003e\u003c/Img\u003e {question1}\",\n            \"{question2}\",\n            \"...\"\n        ],\n        \"output\": [\n            \"{answer1}\",\n            \"{answer2}\",\n            \"...\"\n        ]\n    },\n```\n\n### 视频指令微调数据格式\n```json\n[\n    {\n        \"video\": \"v_k_ZXmr8pmrs.mkv\",\n        \"instruction\": \"\u003cImg\u003e\u003cImageHere\u003e\u003c/Img\u003e {question}\",\n        \"output\": \"{answer}\"\n    }\n]\n```\n\n## Acknowledgement\n\n- [Lavis](https://github.com/salesforce/LAVIS) 本仓库是基于lavis进行构建的，且使用了其中BLIP2的ViT和Q-former\n- [QwenLM](https://github.com/QwenLM/Qwen) 本仓库的语言模型采用Qwen7B-Chat\n- [DeepSpeed](https://github.com/microsoft/DeepSpeed) 👍\n- [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples) 👍👍\n- [LLaVA](https://github.com/haotian-liu/LLaVA) 参照其训练范式，使用了其预训练和指令微调数据\n- [VideoChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT) 使用其视频sft的100k数据\n- [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA) 提供videochatgpt视频数据的百度网盘下载链接\n\n## License\n\n- 本仓库的许多代码是基于[Lavis](https://github.com/salesforce/LAVIS) 的，其采用 [BSD 3-Clause License](https://github.com/Vision-CAIR/MiniGPT-4/blob/main/LICENSE_Lavis.md).\n- 本仓库采用Qwen-7B-Chat，支持商用和科研、开发用途，其License为[LICENSE](https://github.com/QwenLM/Qwen/blob/main/LICENSE)\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=Coobiw/MPP-LLaVA\u0026type=Date)](https://star-history.com/#Coobiw/MPP-LLaVA\u0026Date)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCoobiw%2FMPP-LLaVA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCoobiw%2FMPP-LLaVA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCoobiw%2FMPP-LLaVA/lists"}