Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/DAMO-NLP-SG/Inf-CLIP

πŸ’£πŸ’£ The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
https://github.com/DAMO-NLP-SG/Inf-CLIP

clip contrastive-learning flash-attention infinite-batch-size memory-efficient ring-attention

Last synced: 13 days ago
JSON representation

πŸ’£πŸ’£ The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

Awesome Lists containing this project

README

        




Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss


If our project helps you, please give us a star ⭐ on GitHub to support us. πŸ™πŸ™

[![arXiv](https://img.shields.io/badge/Arxiv-2410.17243-AD1C18.svg?logo=arXiv)](https://arxiv.org/abs/2410.17243)
[![hf_paper](https://img.shields.io/badge/πŸ€—-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2410.17243)
[![PyPI](https://img.shields.io/badge/PyPI-Inf--CL-9C276A.svg)](https://pypi.org/project/inf-cl)

[![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/DAMO-NLP-SG/Inf-CLIP/blob/main/LICENSE)
[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FDAMO-NLP-SG%2FInf-CLIP&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false)](https://hits.seeyoufarm.com)
[![GitHub issues](https://img.shields.io/github/issues/DAMO-NLP-SG/Inf-CLIP?color=critical&label=Issues)](https://github.com/DAMO-NLP-SG/Inf-CLIP/issues?q=is%3Aopen+is%3Aissue)
[![GitHub closed issues](https://img.shields.io/github/issues-closed/DAMO-NLP-SG/Inf-CLIP?color=success&label=Issues)](https://github.com/DAMO-NLP-SG/Inf-CLIP/issues?q=is%3Aissue+is%3Aclosed)

[![zhihu](https://img.shields.io/badge/-ηŸ₯乎-000000?logo=zhihu&logoColor=0084FF)](https://zhuanlan.zhihu.com/p/1681887214)
[![Twitter](https://img.shields.io/badge/-Twitter-black?logo=twitter&logoColor=1D9BF0)](https://x.com/lixin4ever/status/1849669129613226457)

πŸ’‘ Some other multimodal foundation model projects from our team may interest you ✨.

> [**VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding**](https://arxiv.org/abs/2311.16922)

> Sicong Leng, Hang Zhang, Guanzheng Chen, Xin Li, Shijian Lu, Chunyan Miao, Lidong Bing

[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/DAMO-NLP-SG/VCD) [![github](https://img.shields.io/github/stars/DAMO-NLP-SG/VCD.svg?style=social)](https://github.com/DAMO-NLP-SG/VCD) [![arXiv](https://img.shields.io/badge/Arxiv-2311.16922-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2311.16922)

> [**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**](https://github.com/DAMO-NLP-SG/VideoLLaMA2)

> Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/DAMO-NLP-SG/VideoLLaMA2) [![github](https://img.shields.io/github/stars/DAMO-NLP-SG/VideoLLaMA2.svg?style=social)](https://github.com/DAMO-NLP-SG/VideoLLaMA2) [![arXiv](https://img.shields.io/badge/Arxiv-2406.07476-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2406.07476)

> [**The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio**](https://arxiv.org/abs/2410.12787)

> Sicong Leng, Yun Xing, Zesen Cheng, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing

[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/DAMO-NLP-SG/CMM) [![github](https://img.shields.io/github/stars/DAMO-NLP-SG/CMM.svg?style=social)](https://github.com/DAMO-NLP-SG/CMM) [![arXiv](https://img.shields.io/badge/Arxiv-2410.12787-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2410.12787)

## πŸ“° News
* **[2024.10.18]** Release training and evaluation codes of Inf-CLIP.

## πŸ› οΈ Requirements and Installation

Basic Dependencies:
* Python >= 3.8
* Pytorch >= 2.0.0
* CUDA Version >= 11.8

[Remote] Install Inf-CL:
```bash
# remote installing
pip install inf_cl -i https://pypi.org/simple
```

[Local] Install Inf-CL:
```bash
pip install -e .
```

Install required packages:
```bash
git clone https://github.com/DAMO-NLP-SG/Inf-CLIP
cd Inf-CLIP
pip install -r requirements.txt
```

## ⭐ Features

`inf_cl` is the triton implementation of Inf-CL loss:
* [x] [Ring-CL (inf_cl/ring.py#L238)](https://github.com/DAMO-NLP-SG/Inf-CLIP/blob/main/inf_clip/models/ops/ring.py#L238)
* [x] [Inf-CL (inf_cl/ring.py#L251)](https://github.com/DAMO-NLP-SG/Inf-CLIP/blob/main/inf_clip/models/ops/ring.py#L251)

`inf_clip` is the CLIP training codebase with Inf-CL loss and other training features:
- [x] [Gradient Accumulation (inf_clip/train/train.py#L180)](https://github.com/DAMO-NLP-SG/Inf-CLIP/inf_clip_train/train.py#L180)
- [x] [Gradient Cache (inf_clip/train/train.py#L292)](https://github.com/DAMO-NLP-SG/Inf-CLIP/blob/main/inf_clip_train/train.py#L292)

## πŸ”‘ Usage

A simple example about how to adopt our Inf-CL loss for contrastive learning. Using such command for attempting:
```
torchrun --nproc_per_node 2 tests/example.py
```

```python
import torch
import torch.nn.functional as F
import torch.distributed as dist
import numpy as np

from inf_cl import cal_inf_loss

def create_cl_tensors(rank, world_size):
# Parameters
dtype = torch.float32
num_heads = 3 # Number of attention heads
seq_length_q = 32768 # Sequence length
seq_length_k = 32768
d_model = 256 # Dimension of each head (must be 16, 32, 64, or 128)

# Randomly initialize inputs
q = torch.rand((seq_length_q // world_size, num_heads * d_model), dtype=dtype, device=f"cuda:{rank}")
k = torch.rand((seq_length_k // world_size, num_heads * d_model), dtype=dtype, device=f"cuda:{rank}")
l = torch.ones([], dtype=dtype, device=f"cuda:{rank}") * np.log(1 / 0.07)

q = F.normalize(q, p=2, dim=-1).requires_grad_() # Query
k = F.normalize(k, p=2, dim=-1).requires_grad_() # Key
l = l.requires_grad_() # Logit scale

return q, k, l

if __name__ == "__main__":
# Assume that the distributed environment has been initialized
dist.init_process_group("nccl")

rank = dist.get_rank()
world_size = dist.get_world_size()

torch.cuda.set_device(rank)

# Exampled by Image-Text Contrastive Learning, q is the global image features,
# k is the text features, and l is the logit scale.
q, k, l = create_cl_tensors(rank, world_size)

# labels are diagonal elements by default.
# labels = torch.arange(q.shape[0])
loss = cal_inf_loss(q, k, scale=l.exp())

print(loss)

```

## πŸš€ Main Results

### Memory Cost

\* denotes adopting "data offload" strategy.

### Max Supported Batch Size

### Speed

### Batch Size Scaling

Training with larger data scale needs larger batch size.

## πŸ—οΈ Training & Evaluation

### Quick Start

To facilitate further development on top of our codebase, we provide a quick-start guide on how to use Inf-CLIP to train a customized CLIP and evaluate the trained model on the mainstream clip benchmarks.

1. Training Data Structure:
```bash
Inf-CLIP
β”œβ”€β”€ datasets
β”‚ β”œβ”€β”€ cc3m/ # https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc3m.md
| | β”œβ”€β”€ 0000.tar
| | β”œβ”€β”€ 0001.tar
| | β”œβ”€β”€ ...
| | └── 0301.tar
β”‚ β”œβ”€β”€ cc12m/ # https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc12m.md
| | β”œβ”€β”€ 0000.tar
| | β”œβ”€β”€ 0001.tar
| | β”œβ”€β”€ ...
| | └── 1044.tar
β”‚ β”œβ”€β”€ laion400m/ # https://github.com/rom1504/img2dataset/blob/main/dataset_examples/laion400m.md
| | β”œβ”€β”€ 00000.tar
| | β”œβ”€β”€ 00001.tar
| | β”œβ”€β”€ ...
| | └── 41407.tar
```
2. Command:
```bash
bash scripts/cc3m/lit_vit-b-32_bs16k.sh
bash scripts/cc12m/lit_vit-b-32_bs32k.sh
bash scripts/laion400m/lit_vit-b-32_bs256k.sh
```
3. Evaluation Data Structure:
```bash
Inf-CLIP
β”œβ”€β”€ datasets
β”‚ β”œβ”€β”€ imagenet-1k/ # download val_images.tar.gz of imagenet
| | └── val/
| | | β”œβ”€β”€ n01440764
| | | β”œβ”€β”€ n01443537
| | | β”œβ”€β”€ ...
| | | └── n15075141
β”‚ β”œβ”€β”€ clip-benchmark/ # bash datasets/benchmarks_download.sh
| | β”œβ”€β”€ wds_mscoco_captions
| | β”œβ”€β”€ wds_flickr8k
| | β”œβ”€β”€ wds_flickr30k
| | β”œβ”€β”€ wds_imagenet1k
| | β”œβ”€β”€ wds_imagenetv2
| | β”œβ”€β”€ wds_imagenet_sketch
| | β”œβ”€β”€ wds_imagenet-a
| | β”œβ”€β”€ wds_imagenet-r
| | β”œβ”€β”€ wds_imagenet-o
| | └── wds_objectnet
```
4. Command:
```bash
# imagenet evaluation
bash scripts/imagenet_eval.sh
# overall evaluation
bash scripts/benchmarks_eval.sh
```

## πŸ“‘ Citation

If you find Inf-CLIP useful for your research and applications, please cite using this BibTeX:
```bibtex
@article{damovl2024infcl,
title={Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss},
author={Zesen Cheng, Hang Zhang, Kehan Li, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing},
journal={arXiv preprint arXiv:2410.17243},
year={2024},
url={https://arxiv.org/abs/2410.12787}
}
```

## πŸ‘ Acknowledgement
The codebase of Inf-CLIP is adapted from [**OpenCLIP**](https://github.com/mlfoundations/open_clip). We are also grateful for the following projects our Inf-CL arose from:
* [**OpenAI CLIP**](https://openai.com/index/clip/), [**img2dataset**](https://github.com/rom1504/img2dataset), [**CLIP-Benchmark**](https://github.com/LAION-AI/CLIP_benchmark).
* [**FlashAttention**](https://github.com/Dao-AILab/flash-attention), [**RingAttention**](https://github.com/haoliuhl/ringattention), [**RingFlashAttention**](https://github.com/zhuzilin/ring-flash-attention).

## πŸ”’ License

This project is released under the Apache 2.0 license as found in the LICENSE file.
The service is a research preview intended for **non-commercial use ONLY**, subject to the model Licenses of CLIP, Terms of Use of the data generated by OpenAI, and Laion. Please get in touch with us if you find any potential violations.