{"id":13442842,"url":"https://github.com/OpenGVLab/UniFormerV2","last_synced_at":"2025-03-20T15:31:12.591Z","repository":{"id":63454413,"uuid":"567111150","full_name":"OpenGVLab/UniFormerV2","owner":"OpenGVLab","description":"[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer","archived":false,"fork":false,"pushed_at":"2024-04-02T17:00:02.000Z","size":1864,"stargazers_count":277,"open_issues_count":13,"forks_count":15,"subscribers_count":7,"default_branch":"main","last_synced_at":"2024-08-01T03:42:11.122Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2211.09552","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenGVLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-11-17T04:53:37.000Z","updated_at":"2024-07-17T03:06:01.000Z","dependencies_parsed_at":"2023-11-14T02:38:38.425Z","dependency_job_id":null,"html_url":"https://github.com/OpenGVLab/UniFormerV2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FUniFormerV2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FUniFormerV2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FUniFormerV2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGVLab%2FUniFormerV2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenGVLab","download_url":"https://codeload.github.com/OpenGVLab/UniFormerV2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221772572,"owners_count":16878131,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:51.993Z","updated_at":"2024-10-28T03:31:11.567Z","avatar_url":"https://github.com/OpenGVLab.png","language":"Python","funding_links":[],"categories":["Python","Pretrained Models"],"sub_categories":["Motion Generation and Estimation"],"readme":"# UniFormerV2\n\nThis repo is the official implementation of [\"UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer\"](https://arxiv.org/abs/2211.09552).\nBy [Kunchang Li](https://scholar.google.com/citations?user=D4tLSbsAAAAJ), [Yali Wang](https://scholar.google.com/citations?user=hD948dkAAAAJ), [Yinan He](https://dblp.org/pid/93/7763.html), [Yizhuo Li](https://scholar.google.com/citations?user=pyBSGjgAAAAJ), [Yi Wang](https://scholar.google.com.hk/citations?hl=zh-CN\u0026user=Xm2M8UwAAAAJ), [Limin Wang](https://scholar.google.com/citations?user=HEuN8PcAAAAJ) and [Yu Qiao](https://scholar.google.com/citations?user=gFtI-8QAAAAJ\u0026hl).\n\n## Update\n\n***11/14/2023***\n\nThanks for Innat'help [@innat](https://github.com/innat). Now our models also support [Keras](https://github.com/innat/UniFormerV2)! 😄\n\n***07/14/2023***\n\nUniFormerV2 has been accepted by ICCV2023! 🎉\n\n***02/13/2023***\n\nUniFormerV2 has been integrated into [MMAction2](https://github.com/open-mmlab/mmaction2/tree/dev-1.x/configs/recognition/uniformerv2). Training code will be provided soon! 😄\n\n***11/20/2022***\n\nWe give a video demo in [hugging face](https://huggingface.co/spaces/Andy1621/uniformerv2_demo). Have a try! 😄\n\n***11/19/2022***\n\nWe give a blog in Chinese [Zhihu](https://zhuanlan.zhihu.com/p/584669411).\n\n***11/18/2022***\n\nAll the code, models and configs are provided. Don't hesitate to open an issue if you have any problem! 🙋🏻 \n\n## Introduction\n\nIn UniFormerV2, we propose a generic paradigm to build a powerful family of video networks, by arming the pre-trained [ViTs](https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/vision_transformer.py) with efficient [UniFormer](https://github.com/Sense-X/UniFormer) designs. It inherits the concise style of the UniFormer block. But it contains brand- new local and global relation aggregators, which allow for preferable accuracy-computation balance by seamlessly integrating advantages from both ViTs and UniFormer.\n![teaser](img/framework.png)\nIt gets the state-of-the-art recognition performance on 8 popular video benchmarks, including scene-related Kinetics-400/600/700 and Moments in Time, temporal-related Something-Something V1/V2, untrimmed ActivityNet and HACS. In particular, **it is the first model to achieve 90% top-1 accuracy on Kinetics-400**.\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniformerv2-spatiotemporal-learning-by-arming/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=uniformerv2-spatiotemporal-learning-by-arming)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniformerv2-spatiotemporal-learning-by-arming/action-classification-on-kinetics-600)](https://paperswithcode.com/sota/action-classification-on-kinetics-600?p=uniformerv2-spatiotemporal-learning-by-arming)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniformerv2-spatiotemporal-learning-by-arming/action-classification-on-kinetics-700)](https://paperswithcode.com/sota/action-classification-on-kinetics-700?p=uniformerv2-spatiotemporal-learning-by-arming)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniformerv2-spatiotemporal-learning-by-arming/action-classification-on-moments-in-time)](https://paperswithcode.com/sota/action-classification-on-moments-in-time?p=uniformerv2-spatiotemporal-learning-by-arming)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniformerv2-spatiotemporal-learning-by-arming/action-classification-on-activitynet)](https://paperswithcode.com/sota/action-classification-on-activitynet?p=uniformerv2-spatiotemporal-learning-by-arming)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniformerv2-spatiotemporal-learning-by-arming/action-recognition-on-hacs)](https://paperswithcode.com/sota/action-recognition-on-hacs?p=uniformerv2-spatiotemporal-learning-by-arming)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniformerv2-spatiotemporal-learning-by-arming/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=uniformerv2-spatiotemporal-learning-by-arming)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniformerv2-spatiotemporal-learning-by-arming/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=uniformerv2-spatiotemporal-learning-by-arming)\n\n## Model Zoo\n\nAll the models can be found in [MODEL_ZOO](MODEL_ZOO.md).\n\n## Instructions\n\nSee [INSTRUCTIONS](INSTRUCTIONS.md) for more details about:\n- Environment installation\n- Dataset preparation\n- Training and validation\n\n\n##  Cite Uniformer\n\nIf you find this repository useful, please use the following BibTeX entry for citation.\n\n```latex\n@misc{li2022uniformerv2,\n      title={UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer}, \n      author={Kunchang Li and Yali Wang and Yinan He and Yizhuo Li and Yi Wang and Limin Wang and Yu Qiao},\n      year={2022},\n      eprint={2211.09552},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n## License\n\nThis project is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.\n\n## Acknowledgement\n\nThis repository is built based on [UniFormer](https://github.com/Sense-X/UniFormer) and [SlowFast](https://github.com/facebookresearch/SlowFast) repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenGVLab%2FUniFormerV2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenGVLab%2FUniFormerV2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenGVLab%2FUniFormerV2/lists"}