https://github.com/alipay/ant-multi-modal-framework

Research Code for Multimodal-Cognition Team in Ant Group
https://github.com/alipay/ant-multi-modal-framework

image-text-retrieval multimodal-learning multimodal-llm video-editing video-text-retrieval

Last synced: 9 months ago
JSON representation

Research Code for Multimodal-Cognition Team in Ant Group

Host: GitHub
URL: https://github.com/alipay/ant-multi-modal-framework
Owner: alipay
License: cc-by-4.0
Created: 2023-08-21T05:11:23.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-07-11T07:09:40.000Z (almost 2 years ago)
Last Synced: 2025-03-27T08:45:36.444Z (about 1 year ago)
Topics: image-text-retrieval, multimodal-learning, multimodal-llm, video-editing, video-text-retrieval
Language: Python
Homepage:
Size: 17 MB
Stars: 138
Watchers: 4
Forks: 5
Open Issues: 12
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # 蚂蚁多模态框架

Read this in [English](https://github.com/alipay/Ant-Multi-Modal-Framework/blob/main/README_EN.md).

# 简介

本代码库包含蚂蚁多模态认知团队在AntMMF中集成的多模态方向研究代码。AntMMF多模态框架封装了包括数据集管理、数据处理、训练流程、模型和模块在内的标准多模态功能，同时支持这些组件的自定义扩展。

## News

- 2025.05: [M2-omni](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_omni)开源, 对应论文: [M2-omni](https://www.arxiv.org/abs/2502.18778)

- 2024.05: [M2-Encoder](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_Encoder)核心网络结构[SyCoca](https://arxiv.org/abs/2401.02137)被ICML2024接收

- 2024.04: 增强指代理解能力的多模态大模型-[Pink](https://arxiv.org/abs/2310.00582)被CVPR2024接收，开源对应论文代码: [Pink](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/Pink).

- 2024.03: [M2-RAAP](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_RAAP)被SIGIR 2024接收，介绍了将CLIP模型高效拓展为SOTA的视频-文本检索模型的方法

- 2024.02: 开源中英双语多模态CLIP:[M2-Encoder](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_Encoder), 使用大规模中英文数据进行训练（~60亿图文对）

- 2023.12: 开源以下论文代码 [SNP-S3](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/snps3_vtp), [DMAE](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/dmae_vtp), and [CNVid-3.5M](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/cnvid_vtp).

- 2023.06: [SNP-S3](https://ieeexplore.ieee.org/document/10214396) 被IEEE T-CSVT(Transactions on Circuits and Systems for Video Technology) 2023接收.

- 2023.05: [DMAE](https://arxiv.org/pdf/2309.11082.pdf) 被ACM MultiMedia 2023接收.

- 2023.03: [CNVid-3.5M](https://openaccess.thecvf.com/content/CVPR2023/papers/Gan_CNVid-3.5M_Build_Filter_and_Pre-Train_the_Large-Scale_Public_Chinese_Video-Text_CVPR_2023_paper.pdf) 被CVPR 2023接收.

 

## 研究方向

### 视频-文本预训练

- 数据集

  - [CNVid-3.5M](https://openaccess.thecvf.com/content/CVPR2023/papers/Gan_CNVid-3.5M_Build_Filter_and_Pre-Train_the_Large-Scale_Public_Chinese_Video-Text_CVPR_2023_paper.pdf) (CVPR-2023): 中文视频文本预训练数据集。

- 预训练方法及模型

  - [SNP-S3](https://ieeexplore.ieee.org/document/10214396) (IEEE T-CSVT 2023): 语义增强的视频预训练。

### 视频-文本检索 

- [DMAE](https://arxiv.org/pdf/2309.11082.pdf) (ACM MM-2023): 双模态注意力增强和偏序对比学习的视频文本检索。

### 视频编辑

- [EVE](https://arxiv.org/abs/2308.10648): 高效的零样本视频编辑方法。

## 环境设置

- 请按照以下步骤初始化AntMMF运行环境。

```

# 创建新环境

conda create -n antmmf python=3.8

source activate antmmf

# 克隆项目代码到本地

git clone https://github.com/alipay/Ant-Multi-Modal-Framework

# 安装项目依赖

cd antmmf

pip install -r requirements.txt

```

## Citations

如果您觉得AntMMF对您的工作有帮助，请考虑引用：

```

@misc{qp2023AntMMF,

  author =       {Qingpei, Guo and Xingning, Dong and Xiaopei, Wan and Xuzheng, Yu and Chen, Jiang and Xiangyuan, Ren and Kiasheng, Yao and Shiyu, Xuan},

  title =        {AntMMF: Ant Multi-Modal Framework},

  howpublished = {\url{https://github.com/alipay/Ant-Multi-Modal-Framework}},

  year =         {2023}

}

```

## License

本项目根据[Apache 2.0](https://github.com/apache/.github/blob/main/LICENSE) 授权，在正确引用出处的情况下，允许在任何媒介中无限制地使用、分发和复制。

## 致谢

我们的代码基于[FAIR mmf](https://github.com/facebookresearch/mmf)，感谢作者的重要开源贡献。

## 联系我们

:raising_hand: 如需帮助或解决与本代码库相关的问题，请提交issue。

:star: 我们正在招聘，如果您对我们的工作感兴趣，请通过`qingpei.gqp@antgroup.com`联系我们。

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alipay/ant-multi-modal-framework

Awesome Lists containing this project

README