https://github.com/wjf5203/vnext
Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))
https://github.com/wjf5203/vnext
instance-segmentation motion object-detection tracking transformer video-instance-segmentation
Last synced: 2 months ago
JSON representation
Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))
- Host: GitHub
- URL: https://github.com/wjf5203/vnext
- Owner: wjf5203
- License: apache-2.0
- Created: 2022-07-19T07:47:24.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-02-21T18:33:40.000Z (about 1 year ago)
- Last Synced: 2025-02-22T06:09:55.402Z (2 months ago)
- Topics: instance-segmentation, motion, object-detection, tracking, transformer, video-instance-segmentation
- Language: Python
- Homepage:
- Size: 53.7 MB
- Stars: 609
- Watchers: 16
- Forks: 54
- Open Issues: 42
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VNext:
- VNext is a **Next**-generation **V**ideo instance recognition framework on top of [Detectron2](https://github.com/facebookresearch/detectron2).
- Currently it provides advanced online and offline video instance segmentation algorithms, and a motion model for object-centric video segmentation task.
- We will continue to update and improve it to provide a unified and efficient framework for the field of video instance recognition to nourish this field.To date, VNext contains the official implementation of the following algorithms:
**InstMove**: Instance Motion for Object-centric Video Segmentation (CVPR 2023)
**IDOL**: In Defense of Online Models for Video Instance Segmentation (ECCV2022 Oral)
**SeqFormer**: Sequential Transformer for Video Instance Segmentation (ECCV2022 Oral)
## NEWS!!:
- InstMove is accepted to CVPR 2023, the code and models can be found [here](./projects/InstMove/InstMove.md)!
- IDOL is accepted to ECCV 2022 as an **oral presentation**!
- SeqFormer is accepted to ECCV 2022 as an **oral presentation**!
- IDOL won **first place** in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).## Getting started
1. For Installation and data preparation, please refer to to [INSTALL.md](./INSTALL.md) for more details.
1. For InstMove training, evaluation, plugin, and model zoo, please refer to [InstMove.md](./projects/InstMove/InstMove.md)
3. For IDOL training, evaluation, and model zoo, please refer to [IDOL.md](./projects/IDOL/IDOL.md)
3. For SeqFormer training, evaluation and model zoo, please refer to [SeqFormer.md](./projects/SeqFormer/SeqFormer.md)## IDOL
[](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=in-defense-of-online-models-for-video)
[](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-2?p=in-defense-of-online-models-for-video)
[](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=in-defense-of-online-models-for-video)[In Defense of Online Models for Video Instance Segmentation](https://arxiv.org/abs/2207.10661)
Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai
### Introduction
- In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models are usually inferior to the contemporaneous offline models by over 10 AP, which is a huge drawback.
- By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association and propose IDOL, which outperforms all online and offline methods on three benchmarks.
- IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).
### Visualization results on OVIS valid set
![]()
### Quantitative results
#### YouTube-VIS 2019
#### OVIS 2021
##
## SeqFormer
[](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=seqformer-a-frustratingly-simple-model-for)
[SeqFormer: Sequential Transformer for Video Instance Segmentation](https://arxiv.org/abs/2112.08275)
Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai
### Introduction
- SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically.
- SeqFormer is a robust, accurate, neat offline model and instance tracking is achieved naturally without tracking branches or post-processing.
### Visualization results on YouTube-VIS 2019 valid set
![]()
### Quantitative results
#### YouTube-VIS 2019
#### YouTube-VIS 2021
####
## Citation
```
@inproceedings{seqformer,
title={SeqFormer: Sequential Transformer for Video Instance Segmentation},
author={Wu, Junfeng and Jiang, Yi and Bai, Song and Zhang, Wenqing and Bai, Xiang},
booktitle={ECCV},
year={2022},
}@inproceedings{IDOL,
title={In Defense of Online Models for Video Instance Segmentation},
author={Wu, Junfeng and Liu, Qihao and Jiang, Yi and Bai, Song and Yuille, Alan and Bai, Xiang},
booktitle={ECCV},
year={2022},
}
```## Acknowledgement
This repo is based on [detectron2](https://github.com/facebookresearch/detectron2), [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR), [VisTR](https://github.com/Epiphqny/VisTR), and [IFC](https://github.com/sukjunhwang/IFC) Thanks for their wonderful works.