https://github.com/showlab/cosmo

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/showlab/cosmo
Owner: showlab
Created: 2024-01-02T02:48:16.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-10T21:04:11.000Z (about 1 year ago)
Last Synced: 2025-04-01T17:24:48.534Z (3 months ago)
Language: Python
Size: 7 MB
Stars: 72
Watchers: 4
Forks: 4
Open Issues: 4
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

         

# CosMo

[Paper Link](https://arxiv.org/pdf/2401.00849.pdf),

[Model Link](https://huggingface.co/Awiny),

[Dataset Link](https://huggingface.co/datasets/Awiny/Howto-Interlink7M),

[COSMOE LINK](https://fingerrec.github.io/cosmoe)

![](figures/motivation.png)

- 
Cosmo, a fully open-source and comprehensive interleaved pre-training framework, is meticulously crafted for image and video processing.

Its primary focus lies on In-context Learning.

![figures/main_ppl.png](figures/main_ppl.png)

## News

- 2/April/2024. The code support trained on MOE (Mixtral 8x7b) model now. See [CosMOE Website](https://fingerrec.github.io/cosmoe) for details.

- 2/Jan/2024. We provide  the preprocess scripts to prepare the pre-training/downstream dataset.

- 2/Jan/2024. We release Howto-Interlink7M dataset. See [Huggingface List View](https://huggingface.co/datasets/Awiny/Howto-Interlink7M) for details.

## Functionality of This Code

- Provides an Interleaved Image/Video Dataloader.

- Integration with Webdataset.

- Utilizes Huggingface Transformers along with Deepspeed for training.

- Incorporates contrastive loss.

- Enables few-shot evaluation.

- Supports instruction tuning.

- Scaline from 2B model to 56B.

## Model Card

![](figures/table_figure.png)

See [MODEL_CARD.md](MODEL_CARD.md) for more details.

## HowToInterlink7M Dataset

See [HowToInterlink7M.md](HowToInterlink7M.md).

## Install

See [INSTALL.md](INSTALL.md).

## Dataset Preparation

See [DATASET.md](DATASET.md).

## Pre-training

See [PRETRAIN.md](PRETRAIN.md).

## Pre-training Cos-MOE(56B)

### Prepare LLM

Since MOE checkpoint is larger than 200GB which is time consuming to download.

So it's best to pre-cache in a local directory with tools like AZCopy.

```

mkdir /tmp/pretrained_models

cd /tmp/pretrained_models

git clone https://huggingface.co/mistralai/Mixtral-8x7B-v0.1

```

### Train

To reduce GPU memory usage during MOE training, we employ 4-bit training. Ensure 'bitsandbytes' is installed beforehand.

```

pip install requirements_mistral.txt

```

```shell

python src/main_pretrain.py --base_config src/config/train_server/base_180m.yaml --variant_config src/config/train_server/mixtral/ base.yaml --deepspeed src/config/deepspeed/deepspeed_config_mixtral_8x7b.json

```

## Few-shot Evaluation without Tuning

This code support 44 downstream datasets.

Include but not limited to COCO, FLICKR30K, OK-VQA, TextVQA, VizWiz, VQAV2, Hatefulmemes, Vatex, TGIF, MSVD, MSRVTT.

See [Evaluation.md](EVALUATION.md)

## Instruction Tuning

See [TUNING.md](TUNING.md)

## Citation

If you find our work helps, please consider cite the following work

```

@article{wang2024cosmo,

  title={COSMO: Contrastive Streamlined Multimodal Model with Interleaved Pre-Training},

  author={Wang, Alex Jinpeng and Li, Linjie and Lin, Kevin Qinghong and Wang Jianfeng and Lin, Kevin and Yang, Zhengyuan  and Wang, Lijuan and Shou, Mike Zheng},

  journal={arXiv preprint arXiv:2401.00849},

  year={2024}

}

```

## Contact

Email: awinyimgprocess at gmail dot com

## Acknowledgement

Our work are mainly based on the following works:

[MMC4](https://github.com/allenai/mmc4),[Open-flamingo](https://github.com/mlfoundations/open_flamingo), [Open-CLIP](https://github.com/mlfoundations/open_clip), [Huggingface Transformer](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py) and [WebDataset](https://github.com/webdataset/webdataset).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/showlab/cosmo

Awesome Lists containing this project

README