{"id":13958431,"url":"https://github.com/SHI-Labs/CuMo","last_synced_at":"2025-07-21T00:30:46.135Z","repository":{"id":239081183,"uuid":"797568457","full_name":"SHI-Labs/CuMo","owner":"SHI-Labs","description":"CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts","archived":false,"fork":false,"pushed_at":"2024-06-08T06:04:21.000Z","size":8022,"stargazers_count":149,"open_issues_count":0,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-25T17:09:48.055Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SHI-Labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-08T05:11:08.000Z","updated_at":"2025-06-03T07:55:48.000Z","dependencies_parsed_at":"2024-05-09T22:29:19.214Z","dependency_job_id":"45571c32-1161-409f-9c2e-5537e1a88e94","html_url":"https://github.com/SHI-Labs/CuMo","commit_stats":null,"previous_names":["shi-labs/cumo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SHI-Labs/CuMo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SHI-Labs%2FCuMo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SHI-Labs%2FCuMo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SHI-Labs%2FCuMo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SHI-Labs%2FCuMo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SHI-Labs","download_url":"https://codeload.github.com/SHI-Labs/CuMo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SHI-Labs%2FCuMo/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266221246,"owners_count":23894964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-08T13:01:34.229Z","updated_at":"2025-07-21T00:30:45.528Z","avatar_url":"https://github.com/SHI-Labs.png","language":"Python","funding_links":[],"categories":["多模态大模型"],"sub_categories":["网络服务_其他"],"readme":"\n# CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts\n\u003ca href='https://chrisjuniorli.github.io/project/CuMo/'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e\n\u003ca href='https://arxiv.org/abs/2405.05949'\u003e\u003cimg src='https://img.shields.io/badge/Paper-Arxiv-red'\u003e\u003c/a\u003e\n\u003ca href='https://huggingface.co/shi-labs/CuMo-mistral-7b'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue'\u003e\u003c/a\u003e\n\u003ca href='https://huggingface.co/datasets/shi-labs/CuMo_dataset'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Data-green'\u003e\u003c/a\u003e\n\u003ca href='https://huggingface.co/spaces/shi-labs/CuMo-7b-zero'\u003e\u003cimg src='https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg'\u003e\u003c/a\u003e\n\n[Jiachen Li](https://chrisjuniorli.github.io/),\n[Xinyao Wang](),\n[Sijie Zhu](https://jeff-zilence.github.io/),\n[Chia-wen Kuo](https://sites.google.com/view/chiawen-kuo/home),\n[Lu Xu](),\n[Fan Chen](),\n[Jitesh Jain](https://praeclarumjj3.github.io/),\n[Humphrey Shi](https://www.humphreyshi.com/home),\n[Longyin Wen](https://scholar.google.com/citations?user=PO9WFl0AAAAJ\u0026hl=en)\n\n## Release\n- [06/07] We released checkpoints of CuMo after pre-training and pre-finetuning stages at [CuMo-misc](https://huggingface.co/shi-labs/CuMo-misc).\n- [05/10] Check out the [Demo](https://huggingface.co/spaces/shi-labs/CuMo-7b-zero) based on Gradio zero gpu space.\n- [05/09] Check out the [Arxiv](https://arxiv.org/abs/2405.05949) version of the paper!\n- [05/08] We released **CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts** with [project page](https://chrisjuniorli.github.io/project/CuMo/) and [codes](https://github.com/SHI-Labs/CuMo).\n\n## Contents\n- [Release](#release)\n- [Contents](#contents)\n- [Overview](#overview)\n- [Installation](#installation)\n- [Model Zoo](#model-zoo)\n- [Demo setup](#demo-setup)\n  - [Gradio Web UI](#gradio-web-ui)\n  - [CLI Inference](#cli-inference)\n- [Getting Started](#getting-started)\n- [Citation](#citation)\n- [Acknowledgement](#acknowledgement)\n- [License](#license)\n\n## Overview\n\n\u003cdiv align=center\u003e\n\u003cimg width=\"50%\" src=\"assets/teaser.png\"/\u003e\n\u003c/div\u003e\n\nIn this project, we delve into the usage and training recipe of leveraging MoE in multimodal LLMs. We propose __CuMo__, which incorporates Co-upcycled Top-K sparsely-gated Mixture-of-experts blocks into the vision encoder and the MLP connector, thereby enhancing the capabilities of multimodal LLMs. We further adopt a three-stage training approach with auxiliary losses to stabilize the training process and maintain a balanced loading of experts.\nCuMo is exclusively trained on open-sourced datasets and achieves comparable performance to other state-of-the-art multimodal LLMs on multiple VQA and visual-instruction-following benchmarks.\n\n\u003cdiv align=center\u003e\n\u003cimg width=\"100%\" src=\"assets/archi.png\"/\u003e\n\u003c/div\u003e\n\n## Installation\n1. Clone this repo.\n```bash\ngit clone https://github.com/SHI-Labs/CuMo.git\ncd CuMo\n```\n\n2. Install dependencies.\n\n*We used python 3.9 venv for all experiments and it should be compatible with python 3.9 or 3.10 under anaconda if you prefer to use it.*\n\n```bash\nvenv:\npython -m venv /path/to/new/virtual/cumo\nsource /path/to/new/virtual/cumo/bin/activate\n\nanaconda:\nconda create -n cumo python=3.9 -y\nconda activate cumo\n\npip install --upgrade pip\npip install -e .\n```\n\n3. Install additional packages for training CuMo\n```\npip install -e \".[train]\"\npip install flash-attn --no-build-isolation\n```\n\n## Model Zoo\nThe CuMo model weights are open-sourced at Huggingface: \n| Model | Base LLM | Vision Encoder | MLP Connector | Download |\n|----------|----------|----------|----------|----------------|\n| CuMo-7B | Mistral-7B-Instruct-v0.2 | CLIP-MoE | MLP-MoE | 🤗 [HF ckpt](https://huggingface.co/shi-labs/CuMo-mistral-7b) |\n| CuMo-8x7B | Mixtral-8x7B-Instruct-v0.1 | CLIP-MoE | MLP-MoE | 🤗 [HF ckpt](https://huggingface.co/shi-labs/CuMo-mixtral-8x7b) |\n\nThe intermediate checkpoints after pre-training and pre-finetuning are also released at Huggingface:\n| Model | Base LLM | Stage | Download |\n|----------|----------|----------|--------------|\n| CuMo-7B | Mistral-7B-Instruct-v0.2 | Pre-Training | 🤗 [HF ckpt](https://huggingface.co/shi-labs/CuMo-misc/tree/main/cumo-mistral-7b) |\n| CuMo-8x7B | Mixtral-8x7B-Instruct-v0.1 | Pre-Finetuning | 🤗 [HF ckpt](https://huggingface.co/shi-labs/CuMo-misc/tree/main/cumo-mixtral-8x7b) |\n\n## Demo setup\n### Gradio Web UI\nWe provide a Gradio Web UI based [demo](https://huggingface.co/spaces/shi-labs/CuMo-7b-zero). You can also setup the demo locally with\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m cumo.serve.app \\\n    --model-path checkpoints/CuMo-mistral-7b\n```\nyou can add `--bits 8` or `--bits 4` to save the GPU memory.\n\n### CLI Inference\nIf you prefer to star a demo without a web UI, you can use the following commands to run a demo with CuMo-Mistral-7b on your terminal:\n```Shell\nCUDA_VISIBLE_DEVICES=0 python -m cumo.serve.cli \\\n    --model-path checkpoints/CuMo-mistral-7b \\\n    --image-file cumo/serve/examples/waterview.jpg\n```\nyou can add `--load-4bit` or `--load-8bit` to save the GPU memory.\n\n\n## Getting Started\n\nPlease refer to [Getting Started](docs/getting_started.md) for dataset preparation, training, and inference details of CuMo.\n\n## Citation\n```\n@article{li2024cumo,\n  title={CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts},\n  author={Li, Jiachen and Wang, Xinyao and Zhu, Sijie and Kuo, Chia-wen and Xu, Lu and Chen, Fan and Jain, Jitesh and Shi, Humphrey and Wen, Longyin},\n  journal={arXiv:},\n  year={2024}\n}\n```\n\n## Acknowledgement\n\nWe thank the authors of [LLaVA](https://github.com/haotian-liu/LLaVA), [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA), [S^2](https://github.com/bfshi/scaling_on_scales),\n[st-moe-pytorch](https://github.com/lucidrains/st-moe-pytorch), [mistral-src](https://github.com/mistralai/mistral-src) for releasing the source codes.\n\n## License\n[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-yellow.svg)](LICENSE)\n[![Weight License](https://img.shields.io/badge/Weight%20License-CC%20By%20NC%204.0-red)](WEIGHT_LICENSE)\n\nThe weights of checkpoints are licensed under CC BY-NC 4.0 for non-commercial use. The codebase is licensed under Apache 2.0. This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.\nThe content produced by any version of CuMo is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project. This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSHI-Labs%2FCuMo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSHI-Labs%2FCuMo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSHI-Labs%2FCuMo/lists"}