{"id":19456929,"url":"https://github.com/alipay/ant-multi-modal-framework","last_synced_at":"2025-09-10T23:39:14.951Z","repository":{"id":200744076,"uuid":"681021458","full_name":"alipay/Ant-Multi-Modal-Framework","owner":"alipay","description":" Research Code for  Multimodal-Cognition Team in Ant Group","archived":false,"fork":false,"pushed_at":"2024-07-11T07:09:40.000Z","size":17859,"stargazers_count":138,"open_issues_count":12,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-27T08:45:36.444Z","etag":null,"topics":["image-text-retrieval","multimodal-learning","multimodal-llm","video-editing","video-text-retrieval"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alipay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-21T05:11:23.000Z","updated_at":"2025-03-26T01:09:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"d843829f-daf6-4f13-9650-b38fca21bfd5","html_url":"https://github.com/alipay/Ant-Multi-Modal-Framework","commit_stats":null,"previous_names":["alipay/ant-multi-modal-framework"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alipay%2FAnt-Multi-Modal-Framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alipay%2FAnt-Multi-Modal-Framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alipay%2FAnt-Multi-Modal-Framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alipay%2FAnt-Multi-Modal-Framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alipay","download_url":"https://codeload.github.com/alipay/Ant-Multi-Modal-Framework/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250760708,"owners_count":21482852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image-text-retrieval","multimodal-learning","multimodal-llm","video-editing","video-text-retrieval"],"created_at":"2024-11-10T17:19:03.226Z","updated_at":"2025-09-10T23:39:14.925Z","avatar_url":"https://github.com/alipay.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 蚂蚁多模态框架\nRead this in [English](https://github.com/alipay/Ant-Multi-Modal-Framework/blob/main/README_EN.md).\n\n# 简介\n本代码库包含蚂蚁多模态认知团队在AntMMF中集成的多模态方向研究代码。AntMMF多模态框架封装了包括数据集管理、数据处理、训练流程、模型和模块在内的标准多模态功能，同时支持这些组件的自定义扩展。\n\n\n## News\n- 2025.05: [M2-omni](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_omni)开源, 对应论文: [M2-omni](https://www.arxiv.org/abs/2502.18778)\n- 2024.05: [M2-Encoder](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_Encoder)核心网络结构[SyCoca](https://arxiv.org/abs/2401.02137)被ICML2024接收\n- 2024.04: 增强指代理解能力的多模态大模型-[Pink](https://arxiv.org/abs/2310.00582)被CVPR2024接收，开源对应论文代码: [Pink](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/Pink).\n- 2024.03: [M2-RAAP](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_RAAP)被SIGIR 2024接收，介绍了将CLIP模型高效拓展为SOTA的视频-文本检索模型的方法\n- 2024.02: 开源中英双语多模态CLIP:[M2-Encoder](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_Encoder), 使用大规模中英文数据进行训练（~60亿图文对）\n- 2023.12: 开源以下论文代码 [SNP-S3](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/snps3_vtp), [DMAE](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/dmae_vtp), and [CNVid-3.5M](https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/cnvid_vtp).\n- 2023.06: [SNP-S3](https://ieeexplore.ieee.org/document/10214396) 被IEEE T-CSVT(Transactions on Circuits and Systems for Video Technology) 2023接收.\n- 2023.05: [DMAE](https://arxiv.org/pdf/2309.11082.pdf) 被ACM MultiMedia 2023接收.\n- 2023.03: [CNVid-3.5M](https://openaccess.thecvf.com/content/CVPR2023/papers/Gan_CNVid-3.5M_Build_Filter_and_Pre-Train_the_Large-Scale_Public_Chinese_Video-Text_CVPR_2023_paper.pdf) 被CVPR 2023接收.\n \n## 研究方向\n\n### 视频-文本预训练\n- 数据集\n  - [CNVid-3.5M](https://openaccess.thecvf.com/content/CVPR2023/papers/Gan_CNVid-3.5M_Build_Filter_and_Pre-Train_the_Large-Scale_Public_Chinese_Video-Text_CVPR_2023_paper.pdf) (CVPR-2023): 中文视频文本预训练数据集。\n- 预训练方法及模型\n  - [SNP-S3](https://ieeexplore.ieee.org/document/10214396) (IEEE T-CSVT 2023): 语义增强的视频预训练。\n\n### 视频-文本检索 \n- [DMAE](https://arxiv.org/pdf/2309.11082.pdf) (ACM MM-2023): 双模态注意力增强和偏序对比学习的视频文本检索。\n\n### 视频编辑\n- [EVE](https://arxiv.org/abs/2308.10648): 高效的零样本视频编辑方法。\n\n\n## 环境设置\n\n- 请按照以下步骤初始化AntMMF运行环境。\n```\n# 创建新环境\nconda create -n antmmf python=3.8\nsource activate antmmf\n\n# 克隆项目代码到本地\ngit clone https://github.com/alipay/Ant-Multi-Modal-Framework\n\n# 安装项目依赖\ncd antmmf\npip install -r requirements.txt\n```\n\n## Citations\n如果您觉得AntMMF对您的工作有帮助，请考虑引用：\n```\n@misc{qp2023AntMMF,\n  author =       {Qingpei, Guo and Xingning, Dong and Xiaopei, Wan and Xuzheng, Yu and Chen, Jiang and Xiangyuan, Ren and Kiasheng, Yao and Shiyu, Xuan},\n  title =        {AntMMF: Ant Multi-Modal Framework},\n  howpublished = {\\url{https://github.com/alipay/Ant-Multi-Modal-Framework}},\n  year =         {2023}\n}\n```\n\n## License\n\n本项目根据[Apache 2.0](https://github.com/apache/.github/blob/main/LICENSE) 授权，在正确引用出处的情况下，允许在任何媒介中无限制地使用、分发和复制。\n\n## 致谢\n我们的代码基于[FAIR mmf](https://github.com/facebookresearch/mmf)，感谢作者的重要开源贡献。\n\n## 联系我们\n\n:raising_hand: 如需帮助或解决与本代码库相关的问题，请提交issue。\n\n:star: 我们正在招聘，如果您对我们的工作感兴趣，请通过`qingpei.gqp@antgroup.com`联系我们。\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falipay%2Fant-multi-modal-framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falipay%2Fant-multi-modal-framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falipay%2Fant-multi-modal-framework/lists"}