https://github.com/thunlp-mt/adamms
Official Repository for "AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization" [CVPR2025]
https://github.com/thunlp-mt/adamms
Last synced: 3 months ago
JSON representation
Official Repository for "AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization" [CVPR2025]
- Host: GitHub
- URL: https://github.com/thunlp-mt/adamms
- Owner: THUNLP-MT
- Created: 2025-03-14T07:42:46.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-07T12:06:27.000Z (about 1 year ago)
- Last Synced: 2025-06-14T00:06:03.611Z (12 months ago)
- Language: Python
- Homepage:
- Size: 2.78 MB
- Stars: 6
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
π AdaMMS: Adaptive Model Merging for Heterogeneous Multimodal LLMs
π₯ Accepted to CVPR 2025! [Paper]
---
π Language:
πΊπΈ English |
π¨π³ δΈζ
## Introduction
Recent advancements in model merging have shown great potential in combining capabilities from multiple large language models (LLMs). However, existing methods primarily focus on merging **homogeneous models** with identical architectures, struggling when applied to **heterogeneous Multimodal Large Language Models (MLLMs)** that differ in both architecture and parameter space.
We propose **AdaMMS**: **Ada**ptive **M**apping, **M**erging, and **S**earching β a novel unsupervised model merging framework tailored for heterogeneous MLLMs. AdaMMS tackles the challenges in three steps:
1. π§ **Mapping**
Establish a mapping function between different model architectures.
2. βοΈ **Merging**
Perform weighted linear interpolation to accommodate asymmetries in parameter space.
3. π **Searching**
Introduce an unsupervised hyperparameter search method to determine optimal merging coefficients.
π Extensive experiments show that AdaMMS consistently outperforms previous model merging methods on various vision-language benchmarks.
Here is the illustration of three steps in AdaMMS:

Here is the average results from different mnerging methods:

This is a visualization of the model outputs obtained with different alpha valuesοΌ

---
## π οΈ Environment Setup
> β οΈ It's recommended to set up environments **individually for each model**, then install the `lmms-eval` evaluation framework.
>
>
~~~markdown
### β
Example: CogVLM
```bash
conda create -n lmms-cogvlm python=3.10
conda activate lmms-cogvlm
wget https://github.com/THUDM/CogVLM/blob/main/requirements.txt --no-check-certificate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
cd lmms-eval && pip install -e .
conda install openjdk=8
########################
### β
Example: mPLUG-Owl
conda create -n lmms-mplug python=3.10
conda activate lmms-mplug
git clone https://github.com/X-PLUG/mPLUG-Owl.git
cd mPLUG-Owl/mPLUG-Owl2
pip install --upgrade pip && pip install -e .
git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
cd lmms-eval && pip install -e .
conda install openjdk=8
pip install deepspeed # Optional for inference acceleration
~~~
###
------
## π Merge Scripts
> Naming convention: `xxx2yyy.py` indicates merging model `xxx` into architecture `yyy`.
### π Linear Interpolation Scripts
| Source Model | Target Model | Script File |
| -------------------- | ------------ | ---------------------- |
| LLaVA | CogVLM | `llava2cogvlm.py` |
| mPLUG-Owl | CogVLM | `mplugowl2cogvlm.py` |
| LLaVA-OneVision-Qwen | QwenVL2 | `llava-qwen2qwenvl.py` |
### 𧬠Non-Linear Merging (Baseline)
| Source Model | Target Model | Script File |
| -------------------- | ------------ | ----------------------------------- |
| LLaVA | CogVLM | `llava2cogvlm_ties_merging.py` |
| mPLUG-Owl | CogVLM | `mplugowl2cogvlm_ties_merging.py` |
| LLaVA-OneVision-Qwen | QwenVL2 | `llava-qwen2qwenvl_ties_merging.py` |
------
## βοΈ Merging + Inference
> π Refer to `runs/` for example scripts. Logging results helps identify the best alpha. You can find more details for inference in https://github.com/EvolvingLMMs-Lab/lmms-eva .
### π§ͺ Run Merge Script
```bash
conda activate lmms-cogvlm
python $MERGE_SCRIPT --output $ckpt_path --alpha $alpha \
--base $BASE_MODEL_PATH --base_llava $LLAVA_PATH \
--interpolation
```
### π Batch Evaluation for Multiple Alphas (0.4~1.0)
```bash
#!/bin/bash
for alpha in 1.0 0.9 0.8 0.7 0.6 0.5 0.4; do
echo "===> Alpha: $alpha"
# Merge
python3 $MERGE_SCRIPT --output $ckpt_path --alpha $alpha --interpolation \
--base COGVLM_PATH --llava_base LLAVA_PATH
# Evaluate
for task in "mme" "mmmu_val" "nocaps_val" "vizwiz_vqa_val" "seedbench" "gqa" "ok_vqa" "refcoco_bbox_testA" "refcocog_bbox_test" "refcoco+_bbox_testA" "mmbench" "ocrbench" ; do
CUDA_VISIBLE_DEVICES=$GPU accelerate launch \
--num_processes=1 \
-m lmms_eval \
--model cogvlm \
--model_args pretrained=$ckpt_path,... \
--tasks $task \
--log_samples \
--output_path $output_path
done
rm -rf $ckpt_path
done
```
------
## π Alpha Selection
After evaluating different alphas, run the following script to auto-select the best one:
```bash
python search/view_log_delta_perdata_search_limit.py
```
This will output the best `alpha` and its performance logs.
------
## π§© Merge Logic (Example: `llava2cogvlm.py`)
### 1οΈβ£ Load Parameters
- Check if parameter should be merged: `need_merge(key)`
- Scale base model:
```python
cogvlm_diff[key] = (cogvlm_chat[key] * alpha)
```
### 2οΈβ£ Merge Parameters
- **Linear**:
```python
cogvlm_diff['lm_head.weight'] += llava['lm_head.weight']
```
- **Non-linear**: Call `do_merging()` or `do_merging_strategy()` from `ties_merging.py`.
### 3οΈβ£ Save Parameters
- Compatible with both `torch` and `safetensors`.
- For `safetensors`, metadata is required.
------
## π€ Contributions
We welcome PRs and issues! π
AdaMMS aims to improve the efficiency of heterogeneous multimodal model merging and support your research in MLLMs.
------
## π Citation
If you find this project helpful, please cite:
```bibtex
@misc{du2025adamms,
title={AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization},
author={Yiyang Du and Xiaochen Wang and Chi Chen and Jiabo Ye and Yiru Wang and Peng Li and Ming Yan and Ji Zhang and Fei Huang and Zhifang Sui and Maosong Sun and Yang Liu},
year={2025},
eprint={2503.23733},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.23733},
}
```
------
## π¨π³ δΈζηθ―΄ζ
π θ―·ηΉε»ζ€ιΎζ₯跳转 [δΈζη README](https://github.com/THUNLP-MT/AdaMMS/blob/main/README_CH.md)