https://github.com/rkyuca/awesome-video-segmentation

Awesome Video Segmentation: Recent progress in video segmentation
https://github.com/rkyuca/awesome-video-segmentation
Last synced: 6 months ago
JSON representation
Awesome Video Segmentation: Recent progress in video segmentation
Host: GitHub
URL: https://github.com/rkyuca/awesome-video-segmentation
Owner: rkyuca
License: apache-2.0
Created: 2023-07-24T22:56:41.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-09-08T23:17:53.000Z (almost 2 years ago)
Last Synced: 2024-04-21T09:09:38.143Z (about 1 year ago)
Size: 25.4 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - awesome-video-segmentation - Awesome Video Segmentation: Recent progress in video segmentation. (Other Lists / Julia Lists)
README

        # Awesome Video Segmentation

A list of some recent papers on different types of video segmentation tasks.  

### Updates

- [x] Tasks and datasets

- [x] Recent papers list

  - [x] List of papers

  - [x] Links to papers and codes

- [ ] SOTA result tables on various datasets

- [ ] Discussion on recent trends

- [ ] Potential future directions

## Contents

> 0\. [Tasks and Datasets](#Tasks-and-Datasets)

>

> 1\. [Automatic Video Object Segmentation](#Automatic-Video-Object-Segmentation)

>

> 2\. [Semi Automatic Video Object Segmentation](#Semi-Automatic-Video-Object-Segmentation)

>

> 3\. [Interactive Video Object Segmentation](#Interactive-Video-Object-Segmentation)

>

> 4\. [Video Instance Segmentation](#Video-Instance-Segmentation)

>

> 5\. [Actor Action Segmentation](#[Actor-Action-Segmentation)

>

> 6\. [Video Semantic Segmentation](#Video-Semantic-Segmentation)

>

> 7\. [Video Panoptic Segmentation](#Video-Panoptic-Segmentation)

>

> 8\. [Depth Aware Video Panoptic Segmentation](#Depth-Aware-Video-Panoptic-Segmentation)

>

> 9\. [Panoramic Video Panoptic Segmentation](#Panoramic-Video-Panoptic-Segmentation)

>

> 10\. [Text Referring Video Object Segmentation](#Text-Referring-Video-Object-Segmentation)

>

> 11\. [Audio Referring Video Object Segmentation](#Audio-Referring-Video-Object-Segmentation)

## Tasks and Datasets

| Task Category               | Task                                                        | Target                                | Instances          | Tracking           | Datasets                                       |

|-----------------------------|-------------------------------------------------------------|---------------------------------------|--------------------|--------------------|------------------------------------------------|

|           Objects           | Automatic Video Object Segmentation (AVOS)                  | Primary moving object                 | -                  | -                  | DAVIS 2016, MoCA, YouTube-VOS, YouTube-Objects |

|                             | Semi-automatic VOS (SVOS)                                   | Mask-guided object                    | -                  | -                  | DAVIS'2017                                     |

|                             | Interactive VOS (IVOS)                                      | Scribble-guided object                | -                  | -                  | DAVIS'2017                                     |

|                             | Video Instance Segmentation (VIS)                           | All Objects                           | :heavy_check_mark: | :heavy_check_mark: | YouTube-VIS, OVIS                              |

|         Actor-action        | Actor-action segmentation                                   | Primary Object related to actions     | -                  | -                  | A2D                                            |

|             Scene           | Video Semantic Segmentation/ Video Scene Parsing (VSS/ VSP) | All thing and stuff classes           | -                  | -                  | VIPER, VSPW                                    |

|                             | Video Panoptic Segmentation (VPS)                           | All thing and stuff classes           | :heavy_check_mark: | :heavy_check_mark: | Cityscapes-VPS, VIPER, VIPSeg                  |

|                             | Depth-aware Video Panoptic Segmentation (DVPS)              | All thing and stuff classes and depth | :heavy_check_mark: | :heavy_check_mark: | Cityscapes-DVPS, SemanticKITTI-DVPS            |

|                             | Panoramic Video Panoptic Segmentation (PVPS)                | All thing and stuff classes           | :heavy_check_mark: | :heavy_check_mark: | WOD:PVPS                                       |

|            Multimodal       | Text Referring VOS/Referring-VOS (RVOS)                     | Text reference guided object          | -                  | -                  | A2D-Sentence, RE-DAVIS, RVOS                   |

|                             | Audio Referring VOS (ARVOS)                                 | Audio reference guided object         | -                  | -                  | AVOS                                           |

## Automatic Video Object Segmentation

- **[MED-VT]** MED-VT: Multiscale encoder-decoder video transformer with application to object segmentation. *CVPR 2023*, [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Karim_MED-VT_Multiscale_Encoder-Decoder_Video_Transformer_With_Application_To_Object_Segmentation_CVPR_2023_paper.pdf) [Code](https://rkyuca.github.io/medvt/)

- **[PMN]** Unsupervised video object segmentation via prototype memory network. *WACV 2023*, [Paper](https://openaccess.thecvf.com/content/WACV2023/papers/Lee_Unsupervised_Video_Object_Segmentation_via_Prototype_Memory_Network_WACV_2023_paper.pdf)

- **[TMO]** Treating motion as option to reduce motion dependency in unsupervised video object segmentation. *WACV 2023*, [Paper](https://openaccess.thecvf.com/content/WACV2023/papers/Cho_Treating_Motion_as_Option_To_Reduce_Motion_Dependency_in_Unsupervised_WACV_2023_paper.pdf)

- **[HFAN]** Hierarchical feature alignment network for unsupervised video object segmentation. *ECCV 2022*, [Paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136940584.pdf), [Code](https://github.com/NUST-Machine-Intelligence-Laboratory/HFAN)

- **[IMP]** Iteratively selecting an easy reference frame makes unsupervised video object segmentation easier. *AAAI 2022*, [Paper](https://arxiv.org/abs/2112.12402)

- **[RTNet]** Reciprocal transformations for unsupervised video object segmentation. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Ren_Reciprocal_Transformations_for_Unsupervised_Video_Object_Segmentation_CVPR_2021_paper.pdf), [Code](https://github.com/OliverRensu/RTNet)

- **[MATNet]** Motionattentive transition for zero-shot video object segmentation. *AAAI 2020*, [Paper](https://arxiv.org/pdf/2003.04253.pdf), [Code](https://github.com/tfzhou/MATNet)

- **[COSNet]** See more,know more: Unsupervised video object segmentation with co-attention Siamese networks. *CVPR 2019*, [Paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Lu_See_More_Know_More_Unsupervised_Video_Object_Segmentation_With_Co-Attention_CVPR_2019_paper.pdf), [Code](https://github.com/carrierlxk/COSNet)

## Semi Automatic Video Object Segmentation

- **[PCVOS]**  Per-clip video object segmentation. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Park_Per-Clip_Video_Object_Segmentation_CVPR_2022_paper.pdf), [Code](https://github.com/pkyong95/PCVOS) 

- **[AOT]** Associating objects with transformers for video object segmentation. *NeurIPS 2021*, [Paper](https://openreview.net/pdf?id=hl3v8io3ZYt)

- **[CFBI+]** Collaborative video object segmentation by multiscale foreground-background integration. *PAMI 2021*, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9435058), [Code](https://github.com/z-x-yang/CFBI)

- **[SST]**  SSTVOS: Sparse spatiotemporal transformers for video object segmentation. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Duke_SSTVOS_Sparse_Spatiotemporal_Transformers_for_Video_Object_Segmentation_CVPR_2021_paper.pdf), [Code](https://github.com/dukebw/SSTVOS)

- **[STCN]** Rethinking space-time networks with improved memory coverage for efficient video object segmentation. *NeurIPS 2021*, [Paper](https://openreview.net/pdf?id=vllRjSTWcLs), [Code](https://github.com/hkchengrex/STCN)

- **[HMMN]** Hierarchical memory matching network for video object segmentation. *ICCV 2021*, [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Seong_Hierarchical_Memory_Matching_Network_for_Video_Object_Segmentation_ICCV_2021_paper.pdf), [Code](https://github.com/Hongje/HMMN)

- **[KMN]** Kernelized memory network for video object segmentation. *ECCV 2020*, [Paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123670630.pdf)

## Interactive Video Object Segmentation  

- **[MiVOS]** Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Cheng_Modular_Interactive_Video_Object_Segmentation_Interaction-to-Mask_Propagation_and_Difference-Aware_Fusion_CVPR_2021_paper.pdf), [Code](https://github.com/hkchengrex/MiVOS) 

- **[GIS]**  Guided interactive video object segmentation using reliability-based attention maps. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Heo_Guided_Interactive_Video_Object_Segmentation_Using_Reliability-Based_Attention_Maps_CVPR_2021_paper.pdf), [Code](https://github.com/yuk6heo/GIS-RAmap)

- **[ATNet]** Interactive video object segmentation using global and local transfer modules. *ECCV 2020*, [Paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123620290.pdf), [Code](https://github.com/yuk6heo/IVOS-ATNet)

- **[MANet]** Memory aggregation networks for efficient interactive video object segmentation. *CVPR 2020*, [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Miao_Memory_Aggregation_Networks_for_Efficient_Interactive_Video_Object_Segmentation_CVPR_2020_paper.pdf), [Code](https://github.com/lightas/CVPR2020_MANet)

- **[IPNet]** Fast user-guided video object segmentation by interaction-and-propagation networks. *CVPR 2019*, [Paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Oh_Fast_User-Guided_Video_Object_Segmentation_by_Interaction-And-Propagation_Networks_CVPR_2019_paper.pdf), [Code](https://github.com/seoungwugoh/ivs-demo)

## Video Instance Segmentation

| Methods             | Transformer-based  | YouTube-VIS-2019   | YouTube-VIS-2021 | OVIS     | UVO      |

|---------------------|--------------------|--------------------|------------------|----------|----------|

| CrossVIS            |       \-           |   36.6             |  34.2            | 14.9     | \-       |

| VisTR               |                    |   40.1             | \-               | \-       | \-       |

| IFC                 |                    |   42.6             | 35.2             | \-       | \-       |

| Seq Mask R-CNN      |              \-    |   47.6             | \-               | \-       | \-       |

| EfficientVIS        |                    |   39.8             | \-               | \-       | \-       |

| TeViT               |                    |   46.6             | 37.9             | 17.4     | \-       |

| SeqFormer           |                    |   59.3             | \-               | \-       | \-       |

| TubeFormer-DeepLab  |                    |   47.5             | 41.2             | \-       | \-       |

| Video K-Net         |                    |   51.4             | \-               | \-       | \-       |

| FreeSOLO            |          \-        |     \-             | \-               | \-       | 4.8      |

| IDOL                |                    |    62.2            | 56.1             | 42.6     | \-       |

| VMT                 |                    |    59.7            | \-               | 19.8     | \-       |

| MS-STS VIS          |                    |    61.0            | \-               | \-       | \-       |

| InstMove            |                    |     \-             | \-               | 30.7     | \-       |

| GenVIS              |                    |   **64.0**         | **59.6**         | **45.4** | \-       |

| CAROQ               |                    |     61.4           | 54.5             | 38.2     | \-       |

| CutLER              |                    |       \-           | \-               | \-       | **10.1** |

- **[CutLER]** Cut and learn for unsupervised object detection and instance segmentation. *CVPR 2023*, [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Cut_and_Learn_for_Unsupervised_Object_Detection_and_Instance_Segmentation_CVPR_2023_paper.pdf), [Code](https://github.com/facebookresearch/CutLER)

- **[CAROQ]** Context-aware relative object queries to unify video instance and panoptic segmentation. *CVPR 2023*, [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Choudhuri_Context-Aware_Relative_Object_Queries_To_Unify_Video_Instance_and_Panoptic_CVPR_2023_paper.pdf), [Code](https://github.com/AnwesaChoudhuri/CAROQ)

- **[GenVIS]** A generalized framework for video instance segmentation. *CVPR 2023*, [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Heo_A_Generalized_Framework_for_Video_Instance_Segmentation_CVPR_2023_paper.pdf), [Code](https://github.com/miranheo/GenVIS)

- **[InstMove]** InstMove: Instance motion for object-centric video segmentation. *CVPR 2023*, [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_InstMove_Instance_Motion_for_Object-Centric_Video_Segmentation_CVPR_2023_paper.pdf), [Code](https://github.com/wjf5203/VNext)

- **[MS-STS VIS]** Video instance segmentation via multi-scale spatio-temporal split attention transformer. *ECCV 2022*, [Paper](https://arxiv.org/pdf/2203.13253.pdf), [Code](https://github.com/OmkarThawakar/MSSTS-VIS)

- **[VMT]** Video mask transfiner for high-quality video instance segmentation. *ECCV 2022*, [Paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880721.pdf), [Code](https://github.com/SysCV/vmt)

- **[IDOL]** In defense of online models for video instance segmentation. *ECCV 2022*, [Paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880582.pdf), [Code](https://github.com/wjf5203/VNext)

- **[FreeSOLO ]** FreeSOLO: Learning to segment objects without annotations. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_FreeSOLO_Learning_To_Segment_Objects_Without_Annotations_CVPR_2022_paper.pdf), [Code](https://github.com/NVlabs/FreeSOLO)

- **[Video K-Net]** Video K-Net: A simple, strong, and unified baseline for video segmentation. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Video_K-Net_A_Simple_Strong_and_Unified_Baseline_for_Video_CVPR_2022_paper.pdf), [Code](https://github.com/lxtGH/Video-K-Net)

- **[TubeFormer-DeepLab]** TubeFormer-DeepLab: Video mask transformer. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Kim_TubeFormer-DeepLab_Video_Mask_Transformer_CVPR_2022_paper.pdf), [Code]() 

- **[SeqFormer]** SeqFormer: Sequential transformer for video instance segmentation. *ECCV 2022*, [Paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880547.pdf), [Code](https://github.com/wjf5203/SeqFormer) 

- **[TeViT]** Temporally efficient vision transformer for video instance segmentation. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Yang_Temporally_Efficient_Vision_Transformer_for_Video_Instance_Segmentation_CVPR_2022_paper.pdf), [Code](https://github.com/hustvl/TeViT)

- **[EfficientVIS]** Efficient video instance segmentation via tracklet query and proposal. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Wu_Efficient_Video_Instance_Segmentation_via_Tracklet_Query_and_Proposal_CVPR_2022_paper.pdf), [Code]()

- **[Seq Mask R-CNN]** Video instance segmentation with a propose-reduce paradigm. *ICCV 2021*, [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Lin_Video_Instance_Segmentation_With_a_Propose-Reduce_Paradigm_ICCV_2021_paper.pdf), [Code](https://github.com/dvlab-research/ProposeReduce)

- **[IFC]** Video instance segmentation using inter-frame communication transformers. *NeurIPS 2021*, [Paper](https://proceedings.neurips.cc/paper/2021/file/6f2688a5fce7d48c8d19762b88c32c3b-Paper.pdf), [Code](https://github.com/sukjunhwang/IFC)

- **[VisTR]**  End-to-end video instance segmentation with transformers. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_End-to-End_Video_Instance_Segmentation_With_Transformers_CVPR_2021_paper.pdf), [Code](https://github.com/Epiphqny/VisTR)

- **[CrossVIS]** Crossover learning for fast online video instance segmentation. *ICCV 2021*, [Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Crossover_Learning_for_Fast_Online_Video_Instance_Segmentation_ICCV_2021_paper.pdf), [Code](https://github.com/hustvl/CrossVIS) 

## Actor Action Segmentation

 | Methods     | Transformer-based | A2D      |

 |-------------|-------------------|----------|

 | Ji et al.   | -                 | 36.9     |   

 | Dang et al. | -                 | 38.6     |  

 | SSA2D       | -                 | 39.5     | 

 | MED-VT      |                   | **52.6** |

- **[MED-VT]** MED-VT: Multiscale encoder-decoder video transformer with application to object segmentation. *CVPR 2023*, [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Karim_MED-VT_Multiscale_Encoder-Decoder_Video_Transformer_With_Application_To_Object_Segmentation_CVPR_2023_paper.pdf) [Code](https://rkyuca.github.io/medvt/)

- **[SSA2D]** We don’t need thousand proposals: Single shot actor action detection in videos. *WACV 2021 *, [Paper](https://openaccess.thecvf.com/content/WACV2021/papers/Rana_We_Dont_Need_Thousand_Proposals_Single_Shot_Actor-Action_Detection_in_WACV_2021_paper.pdf), [Code](https://github.com/aayushjr/ssa2d)

- **[Dang et al.]** Actor-action semantic segmentation with region masks. *BMVC 2018*, [Paper](https://arxiv.org/pdf/1807.08430.pdf), [Code]()

- **[Ji et al.]** End-to-end joint semantic segmentation of actors and actions in video. *ECCV 2018 *, [Paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Jingwei_Ji_End-to-End_Joint_Semantic_ECCV_2018_paper.pdf), [Code](https://github.com/JingweiJ/JointActorActionSeg)

## Video Semantic Segmentation

- **[TubeFormer-DeepLab]** TubeFormer-DeepLab: Video mask transformer. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Kim_TubeFormer-DeepLab_Video_Mask_Transformer_CVPR_2022_paper.pdf), [Code]() 

- **[CFFM]** Coarse-to-fine feature mining for video semantic segmentation. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Sun_Coarse-To-Fine_Feature_Mining_for_Video_Semantic_Segmentation_CVPR_2022_paper.pdf), [Code](https://github.com/GuoleiSun/VSS-CFFM)

- **[Video K-Net]** Video K-Net: A simple, strong, and unified baseline for video segmentation. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Video_K-Net_A_Simple_Strong_and_Unified_Baseline_for_Video_CVPR_2022_paper.pdf), [Code](https://github.com/lxtGH/Video-K-Net)

- **[SegFormer]** SegFormer: Simple and efficient design for semantic segmentation with transformers. *NeurIPS 2021*, [Paper](https://proceedings.neurips.cc/paper/2021/file/64f1f27bf1b4ec22924fd0acb550c235-Paper.pdf), [Code](https://github.com/NVlabs/SegFormer)

- **[TCB]** VSPW: A large-scale dataset for video scene parsing in the wild. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Miao_VSPW_A_Large-scale_Dataset_for_Video_Scene_Parsing_in_the_CVPR_2021_paper.pdf), [Code](https://github.com/VSPW-dataset/VSPW_code)

- **[STT]** Video semantic segmentation via sparse temporal transformer. *MM 2021*, [Paper](https://dl.acm.org/doi/abs/10.1145/3474085.3475409), [Code]()

- **[TMANet]** Temporal memory attention for video semantic segmentation. * *, [Paper](https://arxiv.org/abs/2102.08643), [Code](https://github.com/wanghao9610/TMANet)

## Video Panoptic Segmentation

- **[CAROQ]** Context-aware relative object queries to unify video instance and panoptic segmentation. *CVPR 2023*, [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Choudhuri_Context-Aware_Relative_Object_Queries_To_Unify_Video_Instance_and_Panoptic_CVPR_2023_paper.pdf), [Code](https://github.com/AnwesaChoudhuri/CAROQ)

- **[Slot-VPS]** Slot-VPS: Object-centric representation learning for video panoptic segmentation. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_Slot-VPS_Object-Centric_Representation_Learning_for_Video_Panoptic_Segmentation_CVPR_2022_paper.pdf), [Code](https://github.com/SAITPublic/SlotVPS)

- **[TubeFormer-DeepLab]** TubeFormer-DeepLab: Video mask transformer. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Kim_TubeFormer-DeepLab_Video_Mask_Transformer_CVPR_2022_paper.pdf), [Code]() 

- **[ViP-Deeplab]** ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Qiao_VIP-DeepLab_Learning_Visual_Perception_With_Depth-Aware_Video_Panoptic_Segmentation_CVPR_2021_paper.pdf), [Code](https://github.com/joe-siyuan-qiao/ViP-DeepLab)

- **[Video K-Net]** Video K-Net: A simple, strong, and unified baseline for video segmentation. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Video_K-Net_A_Simple_Strong_and_Unified_Baseline_for_Video_CVPR_2022_paper.pdf), [Code](https://github.com/lxtGH/Video-K-Net)

- **[SiamTrack]** Learning to associate every segment for video panoptic segmentation. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Woo_Learning_To_Associate_Every_Segment_for_Video_Panoptic_Segmentation_CVPR_2021_paper.pdf), [Code]()

- **[VPSNet]** Video panoptic segmentation. *CVPR 2020*, [Paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Kim_Video_Panoptic_Segmentation_CVPR_2020_paper.pdf), [Code](https://github.com/mcahny/vps)

## Depth Aware Video Panoptic Segmentation

- **[PolyphonicFormer]** PolyphonicFormer: Unified query learning for depth-aware video panoptic segmentation. *ECCV 2022*, [Paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136870574.pdf), [Code](https://github.com/HarborYuan/PolyphonicFormer)

- **[TubeFormer-DeepLab]** TubeFormer-DeepLab: Video mask transformer. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Kim_TubeFormer-DeepLab_Video_Mask_Transformer_CVPR_2022_paper.pdf), [Code]() 

- **[ViP-Deeplab]** ViP-DeepLab: Learning visual perception with depth-aware video panoptic segmentation. *CVPR 2021*, [Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Qiao_VIP-DeepLab_Learning_Visual_Perception_With_Depth-Aware_Video_Panoptic_Segmentation_CVPR_2021_paper.pdf), [Code](https://github.com/joe-siyuan-qiao/ViP-DeepLab)

## Panoramic Video Panoptic Segmentation

- **[ViP-DeepLab+]** Waymo open dataset: Panoramic video panoptic segmentation. *ECCV 2022*, [Paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136890052.pdf), [Code](https://github.com/waymo-research/waymo-open-dataset/tree/master)

## Text Referring Video Object Segmentation

- **[ReferFormer]** Language as queries for referring video object segmentation. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Wu_Language_As_Queries_for_Referring_Video_Object_Segmentation_CVPR_2022_paper.pdf), [Code](https://github.com/wjn922/ReferFormer)

- **[MTTR]** End-to-end referring video object segmentation with multimodal transformers. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Botach_End-to-End_Referring_Video_Object_Segmentation_With_Multimodal_Transformers_CVPR_2022_paper.pdf), [Code](https://github.com/mttr2021/MTTR)

- **[YOFO]** You only infer once: Cross-modal meta-transfer for referring video object segmentation. *AAAI 2022*, [Paper](https://ojs.aaai.org/index.php/AAAI/article/view/20017), [Code]()

- **[URVOS]** URVOS: Unified referring video object segmentation network with a large-scale benchmark. *ECCV 2020*, [Paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf), [Code](https://github.com/skynbe/Refer-Youtube-VOS)

## Audio Referring Video Object Segmentation

- **[Wnet]** Wnet: Audio-guided video object segmentation via wavelet-based cross-modal denoising networks. *CVPR 2022*, [Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Pan_Wnet_Audio-Guided_Video_Object_Segmentation_via_Wavelet-Based_Cross-Modal_Denoising_Networks_CVPR_2022_paper.pdf), [Code](https://github.com/asudahkzj/Wnet)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rkyuca/awesome-video-segmentation

Awesome Lists containing this project

README