{"id":19279990,"url":"https://github.com/showlab/all-in-one","last_synced_at":"2025-04-09T16:20:40.380Z","repository":{"id":49366752,"uuid":"469748031","full_name":"showlab/all-in-one","owner":"showlab","description":"[CVPR2023] All in One: Exploring Unified Video-Language Pre-training","archived":false,"fork":false,"pushed_at":"2023-03-25T11:46:31.000Z","size":1608,"stargazers_count":281,"open_issues_count":4,"forks_count":17,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-02T10:49:43.085Z","etag":null,"topics":["codebase","pre-training","pytorch","video-language"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2203.07303","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/showlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-14T13:35:03.000Z","updated_at":"2025-03-14T10:30:04.000Z","dependencies_parsed_at":"2025-01-10T12:39:48.982Z","dependency_job_id":"59175ed0-3f40-496d-91e5-7107fa9bbc4b","html_url":"https://github.com/showlab/all-in-one","commit_stats":{"total_commits":29,"total_committers":2,"mean_commits":14.5,"dds":0.03448275862068961,"last_synced_commit":"ae0e288a8b2c505d48057dd0f8f4c215706611df"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2Fall-in-one","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2Fall-in-one/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2Fall-in-one/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/showlab%2Fall-in-one/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/showlab","download_url":"https://codeload.github.com/showlab/all-in-one/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248065285,"owners_count":21041872,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codebase","pre-training","pytorch","video-language"],"created_at":"2024-11-09T21:16:17.646Z","updated_at":"2025-04-09T16:20:40.354Z","avatar_url":"https://github.com/showlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/all-in-one-exploring-unified-video-language/visual-question-answering-on-msrvtt-qa-1)](\nhttps://paperswithcode.com/sota/visual-question-answering-on-msrvtt-qa-1?p=all-in-one-exploring-unified-video-language)\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/all-in-one-exploring-unified-video-language/visual-question-answering-on-msvd-qa-1)](\nhttps://paperswithcode.com/sota/visual-question-answering-on-msvd-qa-1?p=all-in-one-exploring-unified-video-language)\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/all-in-one-exploring-unified-video-language/tgif-frame-on-tgif-qa)](\nhttps://paperswithcode.com/sota/tgif-frame-on-tgif-qa?p=all-in-one-exploring-unified-video-language)\n\n[comment]: \u003c\u003e ([![PWC]\u0026#40;https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/all-in-one-exploring-unified-video-language/video-retrieval-on-msr-vtt\u0026#41;]\u0026#40;)\n\n[comment]: \u003c\u003e (https://paperswithcode.com/sota/video-retrieval-on-msr-vtt?p=all-in-one-exploring-unified-video-language\u0026#41;)\n\n\n# All-in-one\n\nCode for the paper: All in One: Exploring Unified Video-Language Pre-training [Arxiv](https://arxiv.org/abs/2203.07303)\n---\n\n![ppl](figures/ppl.jpg)\n\n\n## News\n- 2022.03.25 Update Readme.\n- 2022.06.07 Release the model AllInOne+ pre-trained on Eight Dataset (YTT+WebVid+HowTo+CC3+CC12+CoCo+VG+SBU). \n- 2022.05.07 AllInOne+ is released. The main different between AllInOne is the Image and Video Co-train. \n- 2022.03.14 The first version of AllInOne is released.\n\n## Install\n\n### 1.  PytorchLighting\nIn this work, we use PytorchLighting for distributed training with mixed precision.\nInstall pytorch and PytorchLighting first.\n\n```bash\nconda create -n allinone python=3.7\nsource activate allinone\ncd [Path_To_This_Code]\npip install -r requirements.txt\n```\n\nIf all packages include ffmpeg installed, please skip step 2.\n\n### 2. On-the-fly decode (may skip)\nTo speed up the pre-training, we adopt on-the-fly decode for fast IO.\nInstall ffmpeg as below.\n\n#### 1. ffmpeg\n```bash\nsudo conda install -y ffmpeg\n```\n\nPlease install the required packages if not included in the requirements.txt.\n\nIf you server cannot connect to http or install ffmpeg slowly. Please download static binary file from [FFmpeg Static Builds](https://johnvansickle.com/ffmpeg/) and then add to path variable, as follows:\n\n```bash\nexport PATH=[PATH_TO_Dir/]ffmpeg-git-20220108-amd64-static:$PATH\n```\n\n#### 2. pytorch video\nInstall pytorchvideo (for data augmentation) as below:\n\n```bash\npip install ffmpeg-python\npip install pytorchvideo\n```\n\n## Download Pretrained Weights\nWe provide three pretrained weights in google driver.\n\n|  Model  | PT Data | Parameter | Pretrained Weight  | Trained Log | Hparams |\n|  ----  |  ----|  ---- | ----  | ---- | ---- |\n| All-in-one-Ti |Webvid+HowTo| 12M| [Google Driver](https://drive.google.com/file/d/1-mS9U1xRnvumaftjhxJsr_t4WjJ-gp7t/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/1j27-i7WsNDtj9k0CSnDC9sThMMjMRF-U/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/1DmZ5apWqIuUMRg7igdN2sHM2INrT_UZo/view?usp=sharing)|\n| All-in-one-S | Webvid+HowTo|33M| [Google Driver](https://drive.google.com/file/d/1ntyEsFWLG8XQZ9oliYsrRZmhp_OMbQJ-/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/10uJZUMH10D1QD_o2g0WmXfv47xTAV5hJ/view?usp=sharing) |  [Google Driver](https://drive.google.com/file/d/12levE9kXQbWykJHUKqXNQZz32vtOPRLt/view?usp=sharing)|\n| All-in-one-B | Webvid+HowTo|110M| [Google Driver](https://drive.google.com/file/d/1z3g891ND6CGCUkVzCXr2647wVG-15uUS/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/1FBs6HOeXr3Bo_UZLDq13qscLTMqITGWC/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/1D7OiF9HpIIsFk20LkCUWYThpXo_NPzT0/view?usp=sharing) |\n| All-in-one-B+ | Webvid+HowTo+\u003cbr\u003eCC3|110M| [Google Driver](https://drive.google.com/file/d/1t-yWNjXJxGslBkKujlyYh-HUIdCc_gF7/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/1EN1D0KjqOze9tDW15raC2AULIEqfd2DQ/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/1uxtfWhVmi1BAhHzOzJMXjmwE6H3go2L9/view?usp=sharing) |\n| All-in-one-B+ | Webvid+YTT+HowTo+\u003cbr\u003eCC3+CC12+Coco+VG+SBU|110M| [Google Driver](https://drive.google.com/file/d/1Yd2lKppaduqG_RO1gCA6OpAfB0_IXDoX/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/1azTwITjlo7YA1pLP42mlJ45K9IV4JSxR/view?usp=sharing) | [Google Driver](https://drive.google.com/file/d/1ddz8wtd0VSnqhu3Dd0MKiHqcNklG3NSv/view?usp=sharing) |\n\n\nAfter downloaded these pretrained weights, move them into pretrained dir.\n```bash\nmkdir pretrained\ncp *.ckpt pretrained/\n```\n\n### Compare with state-of-the-arts\n\n\n|Model|Param|Data|Frames|TGIF-Action|TGIF-Frame|MSR R@5|MSR R@10|\n|---|---|---|---|---|---|---|---|\n|ClipBERT|137M|I:Coco+VG|8 x 2|82.9|59.4|49.2|63.5|\n|VIOLET|198M|V:Webvid+\u003cbr\u003eI:CC3|16|87.1|-|63.0|73.4|\n|All-in-one-S|33M|V:WebVid+Howto|3|91.2|64.0|61.5|70.9|\n|All-in-one-B|110M|V:WebVid+Howto|3|**92.9**|**64.2**|**67.0**|**77.1**|\n|All-in-one-B+|110M|V:Webvid+\u003cbr\u003eI:CC3|3|**95.4**|**67.2**|**68.1**|**77.3**|\n|All-in-one-B+|110M|V:Webvid+YTT+HowTo+\u003cbr\u003eI:CC3+CC12+Coco+VG+SBU|3|**96.3**|**68.5**|**70.3**|**79.2**|\n\n\nI is short for Image and V is short for Video in this table.\n\n## Dataset Preparation\nSee [`DATA.md`](DATA.md)\n\n## Pre-training\n### Full Video Pre-training\nSee [`TRAIN.md`](TRAIN.md)\n### Co-training with Image Dataset (All-in-one+)\nSee [`COTRAIN.md`](COTRAIN.md)\n\n## Evaluation on Downstream Tasks\nSee [`EVAL.md`](EVAL.md)\n\nBy unified design and sparse sampling, AllInOne show much small flops.\n\n![](figures/introduction.jpg)\n\n\n## Citation\nIf you find our work helps, please cite our paper.\n\n```bash\n@article{wang2022allinone,\n  title={All in One: Exploring Unified Video-Language Pre-training},\n  author={Wang, Alex Jinpeng and Ge, Yixiao and Yan, Rui and Ge Yuying and Lin, Xudong and Cai, Guanyu  and Wu, Jianping and Shan, Ying and Qie, Xiaohu and Shou, Mike Zheng},\n  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  year={2023}\n}\n```\n\n## Contact\n\nEmail: _awinyimgprocess at gmail dot com_\n\nIf you have any problem or have difficult in reproducing the results reported in this code, you can email to me or open a question in issues.\nWe are also willing to merge the code if transfer our All-in-one to different tasks or datasets.\n\n\n## Acknowledgement\nThis work is mainly based on [ViLT](https://github.com/dandelin/ViLT), [Frozen](https://github.com/m-bain/frozen-in-time) and [Merlot](https://github.com/rowanz/merlot).\n\n## License\nMIT","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshowlab%2Fall-in-one","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshowlab%2Fall-in-one","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshowlab%2Fall-in-one/lists"}