{"id":25843915,"url":"https://github.com/foundationvision/vnext","last_synced_at":"2025-04-04T14:05:34.479Z","repository":{"id":47388353,"uuid":"515486565","full_name":"FoundationVision/VNext","owner":"FoundationVision","description":"Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral),  and IDOL(ECCV Oral))","archived":false,"fork":false,"pushed_at":"2024-02-21T18:33:40.000Z","size":56263,"stargazers_count":613,"open_issues_count":42,"forks_count":55,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-04-04T13:59:29.173Z","etag":null,"topics":["instance-segmentation","motion","object-detection","tracking","transformer","video-instance-segmentation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FoundationVision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-07-19T07:47:24.000Z","updated_at":"2025-04-03T08:11:20.000Z","dependencies_parsed_at":"2024-04-18T19:00:29.046Z","dependency_job_id":null,"html_url":"https://github.com/FoundationVision/VNext","commit_stats":null,"previous_names":["foundationvision/vnext","wjf5203/vnext"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FVNext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FVNext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FVNext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FVNext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FoundationVision","download_url":"https://codeload.github.com/FoundationVision/VNext/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247190248,"owners_count":20898702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["instance-segmentation","motion","object-detection","tracking","transformer","video-instance-segmentation"],"created_at":"2025-03-01T07:12:34.923Z","updated_at":"2025-04-04T14:05:34.459Z","avatar_url":"https://github.com/FoundationVision.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VNext: \n\n\n\n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/VNext.png\" width=\"300\"/\u003e\u003c/p\u003e\n\n\n\n\n\n\n\n- VNext is a **Next**-generation **V**ideo instance recognition framework on top of [Detectron2](https://github.com/facebookresearch/detectron2). \n- Currently it provides advanced online and offline video instance segmentation algorithms, and a motion model for object-centric video segmentation task. \n- We will continue to update and improve it to provide a unified and efficient framework for the field of video instance recognition to nourish this field.\n\n\n\nTo date, VNext contains the official implementation of the following algorithms:\n\n**InstMove**: Instance Motion for Object-centric Video Segmentation (CVPR 2023)\n\n**IDOL**: In Defense of Online Models for Video Instance Segmentation (ECCV2022 Oral)\n\n**SeqFormer**: Sequential Transformer for Video Instance Segmentation (ECCV2022 Oral)\n\n\n\n## NEWS!!:\n\n- InstMove is accepted to CVPR 2023, the code and models can be found [here](./projects/InstMove/InstMove.md)!\n- IDOL is accepted to ECCV 2022 as an **oral presentation**!\n- SeqFormer is accepted to ECCV 2022 as an **oral presentation**!\n- IDOL won **first place** in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).\n\n\n## Getting started\n\n1. For Installation and data preparation, please refer to  to [INSTALL.md](./INSTALL.md) for more details.\n1. For InstMove training, evaluation, plugin, and model zoo, please refer to [InstMove.md](./projects/InstMove/InstMove.md)\n3. For IDOL training, evaluation, and model zoo, please refer to [IDOL.md](./projects/IDOL/IDOL.md)\n3. For SeqFormer training, evaluation and model zoo, please refer to [SeqFormer.md](./projects/SeqFormer/SeqFormer.md)\n\n\n\n\n## IDOL\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=in-defense-of-online-models-for-video)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-youtube-vis-2)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-2?p=in-defense-of-online-models-for-video)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=in-defense-of-online-models-for-video)\n\n\n\n\n\n[In Defense of Online Models for Video Instance Segmentation](https://arxiv.org/abs/2207.10661)\n\nJunfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai\n\n\n### Introduction\n\n\n- In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models are usually inferior to the contemporaneous offline models by over 10 AP, which is a huge drawback.\n\n- By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association and  propose IDOL, which outperforms all online and offline methods on three benchmarks. \n\n- IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022). \n\n\n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/IDOL/arch.png\" width=\"1000\"/\u003e\u003c/p\u003e\n\n \n\n### Visualization results on OVIS valid set\n\n \n\n\u003cimg src=\"assets/IDOL/vid_2.gif\" width=\"400\"/\u003e\u003cimg src=\"assets/IDOL/vid_61.gif\" width=\"400\"/\u003e\n\u003cimg src=\"assets/IDOL/vid_96.gif\" width=\"400\"/\u003e\u003cimg src=\"assets/IDOL/vid_116.gif\" width=\"400\"/\u003e\n\n\n\n\n\n### Quantitative results\n\n#### YouTube-VIS 2019\n\n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/IDOL/ytvis2019_results.png\" width=\"1000\"/\u003e\u003c/p\u003e\n\n \n\n#### OVIS 2021\n\n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/IDOL/ovis_results.png\" width=\"1000\"/\u003e\u003c/p\u003e\n\n \n\n## \n\n## SeqFormer\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seqformer-a-frustratingly-simple-model-for/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=seqformer-a-frustratingly-simple-model-for)\n\n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/SeqFormer/SeqFormer_sota.png\" width=\"500\"/\u003e\u003c/p\u003e\n\n[SeqFormer: Sequential Transformer for Video Instance Segmentation](https://arxiv.org/abs/2112.08275)\n\nJunfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai\n\n\n\n### Introduction\n\n\n- SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically. \n\n- SeqFormer is a robust, accurate, neat offline model and instance tracking is achieved naturally without tracking branches or post-processing. \n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/SeqFormer/SeqFormer_arch.png\" width=\"1000\"/\u003e\u003c/p\u003e\n\n \n\n### Visualization results on YouTube-VIS 2019 valid set\n\n \n\n\u003cimg src=\"assets/SeqFormer/vid_15.gif\" width=\"400\"/\u003e\u003cimg src=\"assets/SeqFormer/vid_78.gif\" width=\"400\"/\u003e\n\u003cimg src=\"assets/SeqFormer/vid_133.gif\" width=\"400\"/\u003e\u003cimg src=\"assets/SeqFormer/vid_210.gif\" width=\"400\"/\u003e\n\n\n\n### Quantitative results\n\n#### YouTube-VIS 2019\n\n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/SeqFormer/ytvis2019_results.png\" width=\"1000\"/\u003e\u003c/p\u003e\n\n \n\n#### YouTube-VIS 2021\n\n\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"assets/SeqFormer/ytvis2021_results.png\" width=\"1000\"/\u003e\u003c/p\u003e\n\n \n\n#### \n\n\n\n\n## Citation\n\n```\n@inproceedings{seqformer,\n  title={SeqFormer: Sequential Transformer for Video Instance Segmentation},\n  author={Wu, Junfeng and Jiang, Yi and Bai, Song and Zhang, Wenqing and Bai, Xiang},\n  booktitle={ECCV},\n  year={2022},\n}\n\n@inproceedings{IDOL,\n  title={In Defense of Online Models for Video Instance Segmentation},\n  author={Wu, Junfeng and Liu, Qihao and Jiang, Yi and Bai, Song and Yuille, Alan and Bai, Xiang},\n  booktitle={ECCV},\n  year={2022},\n}\n```\n\n## Acknowledgement\n\nThis repo is based on [detectron2](https://github.com/facebookresearch/detectron2), [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR), [VisTR](https://github.com/Epiphqny/VisTR), and [IFC](https://github.com/sukjunhwang/IFC)  Thanks for their wonderful works.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundationvision%2Fvnext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffoundationvision%2Fvnext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundationvision%2Fvnext/lists"}