An open API service indexing awesome lists of open source software.

https://github.com/wjf5203/vnext

Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))
https://github.com/wjf5203/vnext

instance-segmentation motion object-detection tracking transformer video-instance-segmentation

Last synced: 2 months ago
JSON representation

Next-generation Video instance recognition framework on top of Detectron2 which supports InstMove (CVPR 2023), SeqFormer(ECCV Oral), and IDOL(ECCV Oral))

Awesome Lists containing this project

README

        

# VNext:

- VNext is a **Next**-generation **V**ideo instance recognition framework on top of [Detectron2](https://github.com/facebookresearch/detectron2).
- Currently it provides advanced online and offline video instance segmentation algorithms, and a motion model for object-centric video segmentation task.
- We will continue to update and improve it to provide a unified and efficient framework for the field of video instance recognition to nourish this field.

To date, VNext contains the official implementation of the following algorithms:

**InstMove**: Instance Motion for Object-centric Video Segmentation (CVPR 2023)

**IDOL**: In Defense of Online Models for Video Instance Segmentation (ECCV2022 Oral)

**SeqFormer**: Sequential Transformer for Video Instance Segmentation (ECCV2022 Oral)

## NEWS!!:

- InstMove is accepted to CVPR 2023, the code and models can be found [here](./projects/InstMove/InstMove.md)!
- IDOL is accepted to ECCV 2022 as an **oral presentation**!
- SeqFormer is accepted to ECCV 2022 as an **oral presentation**!
- IDOL won **first place** in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

## Getting started

1. For Installation and data preparation, please refer to to [INSTALL.md](./INSTALL.md) for more details.
1. For InstMove training, evaluation, plugin, and model zoo, please refer to [InstMove.md](./projects/InstMove/InstMove.md)
3. For IDOL training, evaluation, and model zoo, please refer to [IDOL.md](./projects/IDOL/IDOL.md)
3. For SeqFormer training, evaluation and model zoo, please refer to [SeqFormer.md](./projects/SeqFormer/SeqFormer.md)

## IDOL

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=in-defense-of-online-models-for-video)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-youtube-vis-2)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-2?p=in-defense-of-online-models-for-video)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=in-defense-of-online-models-for-video)

[In Defense of Online Models for Video Instance Segmentation](https://arxiv.org/abs/2207.10661)

Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

### Introduction

- In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models are usually inferior to the contemporaneous offline models by over 10 AP, which is a huge drawback.

- By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association and propose IDOL, which outperforms all online and offline methods on three benchmarks.

- IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

### Visualization results on OVIS valid set


### Quantitative results

#### YouTube-VIS 2019

#### OVIS 2021

##

## SeqFormer

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/seqformer-a-frustratingly-simple-model-for/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=seqformer-a-frustratingly-simple-model-for)

[SeqFormer: Sequential Transformer for Video Instance Segmentation](https://arxiv.org/abs/2112.08275)

Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

### Introduction

- SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically.

- SeqFormer is a robust, accurate, neat offline model and instance tracking is achieved naturally without tracking branches or post-processing.

### Visualization results on YouTube-VIS 2019 valid set


### Quantitative results

#### YouTube-VIS 2019

#### YouTube-VIS 2021

####

## Citation

```
@inproceedings{seqformer,
title={SeqFormer: Sequential Transformer for Video Instance Segmentation},
author={Wu, Junfeng and Jiang, Yi and Bai, Song and Zhang, Wenqing and Bai, Xiang},
booktitle={ECCV},
year={2022},
}

@inproceedings{IDOL,
title={In Defense of Online Models for Video Instance Segmentation},
author={Wu, Junfeng and Liu, Qihao and Jiang, Yi and Bai, Song and Yuille, Alan and Bai, Xiang},
booktitle={ECCV},
year={2022},
}
```

## Acknowledgement

This repo is based on [detectron2](https://github.com/facebookresearch/detectron2), [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR), [VisTR](https://github.com/Epiphqny/VisTR), and [IFC](https://github.com/sukjunhwang/IFC) Thanks for their wonderful works.