{"id":18061931,"url":"https://github.com/DAMO-NLP-SG/Inf-CLIP","last_synced_at":"2025-03-28T07:32:09.333Z","repository":{"id":258759125,"uuid":"873582274","full_name":"DAMO-NLP-SG/Inf-CLIP","owner":"DAMO-NLP-SG","description":"💣💣 The official CLIP training codebase of Inf-CL: \"Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss\". A super memory-efficiency CLIP training scheme.","archived":false,"fork":false,"pushed_at":"2024-10-23T06:46:41.000Z","size":3946,"stargazers_count":47,"open_issues_count":1,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-24T14:29:22.337Z","etag":null,"topics":["clip","contrastive-learning","flash-attention","infinite-batch-size","memory-efficient","ring-attention"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DAMO-NLP-SG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-16T12:11:45.000Z","updated_at":"2024-10-24T10:41:59.000Z","dependencies_parsed_at":"2024-10-27T21:21:45.871Z","dependency_job_id":"da00a5e3-0cff-4bfc-bec4-a28bdd250f53","html_url":"https://github.com/DAMO-NLP-SG/Inf-CLIP","commit_stats":null,"previous_names":["damo-nlp-sg/inf-cl","damo-nlp-sg/inf-clip"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAMO-NLP-SG%2FInf-CLIP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAMO-NLP-SG%2FInf-CLIP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAMO-NLP-SG%2FInf-CLIP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAMO-NLP-SG%2FInf-CLIP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DAMO-NLP-SG","download_url":"https://codeload.github.com/DAMO-NLP-SG/Inf-CLIP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245989710,"owners_count":20705889,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clip","contrastive-learning","flash-attention","infinite-batch-size","memory-efficient","ring-attention"],"created_at":"2024-10-31T05:01:21.137Z","updated_at":"2025-03-28T07:32:09.319Z","avatar_url":"https://github.com/DAMO-NLP-SG.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://github.com/user-attachments/assets/53a09bd1-c8ac-43c0-80ae-03ba284c94ad\" width=\"150\" style=\"margin-bottom: 0.2;\"/\u003e\n\u003cp\u003e\n\n\u003ch3 align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2410.17243\"\u003e\nBreaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss\u003c/a\u003e\u003c/h3\u003e\n\u003ch5 align=\"center\"\u003e If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏 \u003c/h2\u003e\n\n\u003ch5 align=\"center\"\u003e\n\n[![arXiv](https://img.shields.io/badge/Arxiv-2410.17243-AD1C18.svg?logo=arXiv)](https://arxiv.org/abs/2410.17243)\n[![hf_paper](https://img.shields.io/badge/🤗-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2410.17243)\n[![PyPI](https://img.shields.io/badge/PyPI-Inf--CL-9C276A.svg)](https://pypi.org/project/inf-cl) \u003cbr\u003e\n[![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/DAMO-NLP-SG/Inf-CLIP/blob/main/LICENSE)\n[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FDAMO-NLP-SG%2FInf-CLIP\u0026count_bg=%2379C83D\u0026title_bg=%23555555\u0026icon=\u0026icon_color=%23E7E7E7\u0026title=hits\u0026edge_flat=false)](https://hits.seeyoufarm.com)\n[![GitHub issues](https://img.shields.io/github/issues/DAMO-NLP-SG/Inf-CLIP?color=critical\u0026label=Issues)](https://github.com/DAMO-NLP-SG/Inf-CLIP/issues?q=is%3Aopen+is%3Aissue)\n[![GitHub closed issues](https://img.shields.io/github/issues-closed/DAMO-NLP-SG/Inf-CLIP?color=success\u0026label=Issues)](https://github.com/DAMO-NLP-SG/Inf-CLIP/issues?q=is%3Aissue+is%3Aclosed)  \u003cbr\u003e\n[![zhihu](https://img.shields.io/badge/-知乎-000000?logo=zhihu\u0026logoColor=0084FF)](https://zhuanlan.zhihu.com/p/1681887214)\n[![Twitter](https://img.shields.io/badge/-Twitter-black?logo=twitter\u0026logoColor=1D9BF0)](https://x.com/lixin4ever/status/1849669129613226457) \u003cbr\u003e\n\n\u003c/h5\u003e\n\n\u003cdiv align=\"center\"\u003e\u003cimg src=\"https://github.com/user-attachments/assets/2c19838b-43d8-4145-b28c-903f3d76f8ab\" width=\"800\" /\u003e\u003c/div\u003e\n\n\u003cdetails open\u003e\u003csummary\u003e💡 Some other multimodal foundation model projects from our team may interest you ✨. \u003c/summary\u003e\u003cp\u003e\n\u003c!--  may --\u003e\n\n\u003e [**VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding**](https://arxiv.org/abs/2311.16922) \u003cbr\u003e\n\u003e Sicong Leng, Hang Zhang, Guanzheng Chen, Xin Li, Shijian Lu, Chunyan Miao, Lidong Bing \u003cbr\u003e\n[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/DAMO-NLP-SG/VCD)  [![github](https://img.shields.io/github/stars/DAMO-NLP-SG/VCD.svg?style=social)](https://github.com/DAMO-NLP-SG/VCD)  [![arXiv](https://img.shields.io/badge/Arxiv-2311.16922-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2311.16922) \u003cbr\u003e\n\n\u003e [**VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs**](https://github.com/DAMO-NLP-SG/VideoLLaMA2) \u003cbr\u003e\n\u003e Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing \u003cbr\u003e\n[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/DAMO-NLP-SG/VideoLLaMA2)  [![github](https://img.shields.io/github/stars/DAMO-NLP-SG/VideoLLaMA2.svg?style=social)](https://github.com/DAMO-NLP-SG/VideoLLaMA2) [![arXiv](https://img.shields.io/badge/Arxiv-2406.07476-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2406.07476) \u003cbr\u003e\n\n\u003e [**The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio**](https://arxiv.org/abs/2410.12787) \u003cbr\u003e\n\u003e Sicong Leng, Yun Xing, Zesen Cheng, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing \u003cbr\u003e\n[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/DAMO-NLP-SG/CMM)  [![github](https://img.shields.io/github/stars/DAMO-NLP-SG/CMM.svg?style=social)](https://github.com/DAMO-NLP-SG/CMM)  [![arXiv](https://img.shields.io/badge/Arxiv-2410.12787-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2410.12787) \u003cbr\u003e\n\n\u003c/p\u003e\u003c/details\u003e\n\n## 📰 News\n* **[2024.10.18]**  Release training and evaluation codes of Inf-CLIP.\n\n\u003cdiv align=\"center\"\u003e\u003cimg src=\"https://github.com/user-attachments/assets/11c5cc32-aac2-497d-bbc1-33e065a71be0\" width=\"800\" /\u003e\u003c/div\u003e\n\n## 🛠️ Requirements and Installation\n\nBasic Dependencies:\n* Python \u003e= 3.8\n* Pytorch \u003e= 2.0.0\n* CUDA Version \u003e= 11.8\n\n[Remote] Install Inf-CL:\n```bash\n# remote installing\npip install inf_cl -i https://pypi.org/simple\n```\n\n[Local] Install Inf-CL:\n```bash\npip install -e .\n```\n\nInstall required packages:\n```bash\ngit clone https://github.com/DAMO-NLP-SG/Inf-CLIP\ncd Inf-CLIP\npip install -r requirements.txt\n```\n\n## ⭐ Features\n\n`inf_cl` is the triton implementation of Inf-CL loss:\n* [x] [Ring-CL (inf_cl/ring.py#L238)](https://github.com/DAMO-NLP-SG/Inf-CLIP/blob/main/inf_clip/models/ops/ring.py#L238)\n* [x] [Inf-CL  (inf_cl/ring.py#L251)](https://github.com/DAMO-NLP-SG/Inf-CLIP/blob/main/inf_clip/models/ops/ring.py#L251)\n\n`inf_clip` is the CLIP training codebase with Inf-CL loss and other training features:\n- [x] [Gradient Accumulation (inf_clip/train/train.py#L180)](https://github.com/DAMO-NLP-SG/Inf-CLIP/inf_clip_train/train.py#L180)\n- [x] [Gradient Cache (inf_clip/train/train.py#L292)](https://github.com/DAMO-NLP-SG/Inf-CLIP/blob/main/inf_clip_train/train.py#L292)\n\n\n## 🔑 Usage\n\nA simple example about how to adopt our Inf-CL loss for contrastive learning. Using such command for attempting:\n```\ntorchrun --nproc_per_node 2 tests/example.py\n```\n\n```python\nimport torch\nimport torch.nn.functional as F\nimport torch.distributed as dist\nimport numpy as np\n\nfrom inf_cl import cal_inf_loss\n\n\ndef create_cl_tensors(rank, world_size):\n    # Parameters\n    dtype = torch.float32\n    num_heads = 3        # Number of attention heads\n    seq_length_q = 32768 # Sequence length\n    seq_length_k = 32768\n    d_model = 256        # Dimension of each head (must be 16, 32, 64, or 128)\n\n    # Randomly initialize inputs\n    q = torch.rand((seq_length_q // world_size, num_heads * d_model), dtype=dtype, device=f\"cuda:{rank}\")\n    k = torch.rand((seq_length_k // world_size, num_heads * d_model), dtype=dtype, device=f\"cuda:{rank}\")\n    l = torch.ones([], dtype=dtype, device=f\"cuda:{rank}\") * np.log(1 / 0.07)\n\n    q = F.normalize(q, p=2, dim=-1).requires_grad_() # Query\n    k = F.normalize(k, p=2, dim=-1).requires_grad_() # Key\n    l = l.requires_grad_() # Logit scale\n\n    return q, k, l\n\n\nif __name__ == \"__main__\":\n    # Assume that the distributed environment has been initialized\n    dist.init_process_group(\"nccl\")\n\n    rank = dist.get_rank()\n    world_size = dist.get_world_size()\n\n    torch.cuda.set_device(rank)\n\n    # Exampled by Image-Text Contrastive Learning, q is the global image features, \n    # k is the text features, and l is the logit scale.\n    q, k, l = create_cl_tensors(rank, world_size)\n\n    # labels are diagonal elements by default. \n    # labels = torch.arange(q.shape[0])\n    loss = cal_inf_loss(q, k, scale=l.exp())\n\n    print(loss)\n\n```\n\n## 🚀 Main Results\n\n### Memory Cost\n\u003cp\u003e\u003cimg src=\"https://github.com/user-attachments/assets/05dd3fea-0a93-4716-b321-0a94965e1fbe\" width=\"800\" \"/\u003e\u003c/p\u003e\n\n\\* denotes adopting \"data offload\" strategy. \n\n### Max Supported Batch Size\n\u003cp\u003e\u003cimg src=\"https://github.com/user-attachments/assets/eb38fb90-3b7e-4696-b078-b7766893f758\" width=\"800\" \"/\u003e\u003c/p\u003e\n\n### Speed\n\u003cp\u003e\u003cimg src=\"https://github.com/user-attachments/assets/da72e99b-508b-450a-b12e-401d4991291a\" width=\"800\" \"/\u003e\u003c/p\u003e\n\n### Batch Size Scaling\n\u003cp\u003e\u003cimg src=\"https://github.com/user-attachments/assets/5b55fa98-6558-4509-9b66-e290ecf77b41\" width=\"800\" \"/\u003e\u003c/p\u003e\n\nTraining with larger data scale needs larger batch size.\n\n## 🗝️ Training \u0026 Evaluation\n\n### Quick Start\n\nTo facilitate further development on top of our codebase, we provide a quick-start guide on how to use Inf-CLIP to train a customized CLIP and evaluate the trained model on the mainstream clip benchmarks.\n\n1. Training Data Structure:\n```bash\nInf-CLIP\n├── datasets\n│   ├── cc3m/ # https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc3m.md\n|   |   ├── 0000.tar\n|   |   ├── 0001.tar\n|   |   ├── ...\n|   |   └── 0301.tar\n│   ├── cc12m/ # https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc12m.md\n|   |   ├── 0000.tar\n|   |   ├── 0001.tar\n|   |   ├── ...\n|   |   └── 1044.tar\n│   ├── laion400m/ # https://github.com/rom1504/img2dataset/blob/main/dataset_examples/laion400m.md\n|   |   ├── 00000.tar\n|   |   ├── 00001.tar\n|   |   ├── ...\n|   |   └── 41407.tar\n```\n2. Command:\n```bash\nbash scripts/cc3m/lit_vit-b-32_bs16k.sh\nbash scripts/cc12m/lit_vit-b-32_bs32k.sh\nbash scripts/laion400m/lit_vit-b-32_bs256k.sh\n```\n3. Evaluation Data Structure:\n```bash\nInf-CLIP\n├── datasets\n│   ├── imagenet-1k/ # download val_images.tar.gz of imagenet from https://huggingface.co/datasets/ILSVRC/imagenet-1k/tree/main/data\n|   |   └── val/ # python datasets/reformat_imagenet.py\n|   |   |   ├── n01440764\n|   |   |   ├── n01443537\n|   |   |   ├── ...\n|   |   |   └── n15075141\n│   ├── clip-benchmark/ # bash datasets/benchmarks_download.sh\n|   |   ├── wds_mscoco_captions\n|   |   ├── wds_flickr8k\n|   |   ├── wds_flickr30k\n|   |   ├── wds_imagenet1k\n|   |   ├── wds_imagenetv2\n|   |   ├── wds_imagenet_sketch\n|   |   ├── wds_imagenet-a\n|   |   ├── wds_imagenet-r\n|   |   ├── wds_imagenet-o\n|   |   └── wds_objectnet\n```\n4. Command:\n```bash\n# imagenet evaluation\nbash scripts/imagenet_eval.sh\n# overall evaluation\nbash scripts/benchmarks_eval.sh\n```\n\n## 📑 Citation\n\nIf you find Inf-CLIP useful for your research and applications, please cite using this BibTeX:\n```bibtex\n@article{damovl2024infcl,\n  title={Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss},\n  author={Zesen Cheng, Hang Zhang, Kehan Li, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing},\n  journal={arXiv preprint arXiv:2410.17243},\n  year={2024},\n  url={https://arxiv.org/abs/2410.12787}\n}\n```\n\n## 👍 Acknowledgement\nThe codebase of Inf-CLIP is adapted from [**OpenCLIP**](https://github.com/mlfoundations/open_clip). We are also grateful for the following projects our Inf-CL arose from:\n* [**OpenAI CLIP**](https://openai.com/index/clip/), [**img2dataset**](https://github.com/rom1504/img2dataset), [**CLIP-Benchmark**](https://github.com/LAION-AI/CLIP_benchmark).\n* [**FlashAttention**](https://github.com/Dao-AILab/flash-attention), [**RingAttention**](https://github.com/haoliuhl/ringattention), [**RingFlashAttention**](https://github.com/zhuzilin/ring-flash-attention). \n\n\n## 🔒 License\n\nThis project is released under the Apache 2.0 license as found in the LICENSE file.\nThe service is a research preview intended for **non-commercial use ONLY**, subject to the model Licenses of CLIP, Terms of Use of the data generated by OpenAI, and Laion. Please get in touch with us if you find any potential violations.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDAMO-NLP-SG%2FInf-CLIP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDAMO-NLP-SG%2FInf-CLIP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDAMO-NLP-SG%2FInf-CLIP/lists"}