https://github.com/suhwan-cho/awesome-video-object-segmentation

A list of video object segmentation (VOS) papers
https://github.com/suhwan-cho/awesome-video-object-segmentation
Last synced: 6 months ago
JSON representation
A list of video object segmentation (VOS) papers
Host: GitHub
URL: https://github.com/suhwan-cho/awesome-video-object-segmentation
Owner: suhwan-cho
License: mit
Created: 2021-07-07T08:29:13.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2024-04-18T06:29:39.000Z (about 1 year ago)
Last Synced: 2024-05-22T23:02:51.420Z (about 1 year ago)
Homepage:
Size: 10.7 KB
Stars: 218
Watchers: 11
Forks: 21
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - awesome-video-object-segmentation - A list of video object segmentation (VOS) papers. (Other Lists / Julia Lists)
README

        # Awesome Video Object Segmentation [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

A list of video object segmentation (VOS) papers.

Any suggestions and requests are always welcomed :)

## Contents

> 1\. [Semi-Supervised VOS Papers](#semi-supervised-vos-papers)

>

> 2\. [Unsupervised VOS Papers](#unsupervised-vos-papers)

>

> 3\. [Referring VOS Papers](#referring-vos-papers)

>

> 4\. [Other Related Papers](#other-related-papers)

## Semi-Supervised VOS Papers

### 2024

- **[STMA]** Spatial-Temporal Multi-level Association for Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/08391.pdf) [[arXiv]](https://arxiv.org/abs/2404.06265) [[Code]](https://github.com/yahooo-m/VOS-Solution)

- **[OneVOS]** OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07505.pdf) [[arXiv]](https://arxiv.org/abs/2403.08682) [[Code]](https://github.com/L599wy/OneVOS)

- **[RMem]** RMem: Restricted Memory Banks Improve Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_RMem_Restricted_Memory_Banks_Improve_Video_Object_Segmentation_CVPR_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2406.08476) [[Page]](https://restricted-memory.github.io/)

- **[Point-VOS]** Point-VOS: Pointing Up Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Mahadevan_Point-VOS_Pointing_Up_Video_Object_Segmentation_CVPR_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2402.05917) [[Page]](https://pointvos.github.io/)

- **[Cutie]** Putting the Object Back into Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Cheng_Putting_the_Object_Back_into_Video_Object_Segmentation_CVPR_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2310.12982) [[Code]](https://github.com/hkchengrex/Cutie)

- **[DeVOS]** DeVOS: Flow-Guided Deformable Transformer for Video Object Segmentation, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2024/papers/Fedynyak_DeVos_Flow-Guided_Deformable_Transformer_for_Video_Object_Segmentation_WACV_2024_paper.pdf)

### 2023

- **[TTT]** Test-time Training for Matching-based Video Object Segmentation, *NeurIPS* [[Paper]](https://openreview.net/pdf?id=9QsdPQlWiE) [[Code]](https://github.com/ttt-matching-based-vos/ttt_matching_vos)

- **[READMem]** READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation, *BMVC* [[Paper]](https://papers.bmvc2023.org/0603.pdf) [[arXiv]](https://arxiv.org/abs/2305.12823) [[Code]](https://github.com/Vujas-Eteph/READMem)

- **[XMem++]** XMem++: Production-level Video Segmentation From Few Annotated Frames, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Bekuzarov_XMem_Production-level_Video_Segmentation_From_Few_Annotated_Frames_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2307.15958) [[Code]](https://github.com/max810/XMem2)

- **[SimVOS]** Scalable Video Object Segmentation with Simplified Framework, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_Scalable_Video_Object_Segmentation_with_Simplified_Framework_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2308.09903) [[Code]](https://github.com/jimmy-dq/SimVOS)

- **[TMRN]** Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Sun_Alignment_Before_Aggregation_Trajectory_Memory_Retrieval_Network_for_Video_Object_ICCV_2023_paper.pdf)

- **[ISVOS]** Look Before You Match: Instance Understanding Matters in Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Look_Before_You_Match_Instance_Understanding_Matters_in_Video_Object_CVPR_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2212.06826)

- **[CorrLearn]** Boosting Video Object Segmentation via Space-time Correspondence Learning, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhang_Boosting_Video_Object_Segmentation_via_Space-Time_Correspondence_Learning_CVPR_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2304.06211) [[Code]](https://github.com/wenguanwang/VOS_Correspondence)

- **[MobileVOS]** MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Miles_MobileVOS_Real-Time_Video_Object_Segmentation_Contrastive_Learning_Meets_Knowledge_Distillation_CVPR_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2303.07815)

- **[TSVOS]** Two-shot Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Yan_Two-Shot_Video_Object_Segmentation_CVPR_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2303.12078) [[Code]](https://github.com/yk-pku/Two-shot-Video-Object-Segmentation)

- **[LLB]** Learning to Learn Better for Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/25203/24975) [[arXiv]](https://arxiv.org/abs/2212.02112) [[Code]](https://github.com/vitae-transformer/vos-llb)

### 2022

- **[DeAOT]** Decoupling Features in Hierarchical Propagation for Video Object Segmentation, *NeurIPS* [[Paper]](https://openreview.net/pdf?id=DgM7-7eMkq0) [[arXiv]](https://arxiv.org/abs/2210.09782) [[Code]](https://github.com/z-x-yang/AOT)

- **[AOC]** Towards Robust Video Object Segmentation with Adaptive Object Calibration, *ACMMM* [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3503161.3547824) [[arXiv]](https://arxiv.org/abs/2207.00887) [[Code]](https://github.com/JerryX1110/Robust-Video-Object-Segmentation)

- **[BATMAN]** BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136890603.pdf) [[arXiv]](https://arxiv.org/abs/2208.01159)

- **[XMem]** XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880633.pdf) [[arXiv]](https://arxiv.org/abs/2207.07115) [[Code]](https://github.com/hkchengrex/XMem)

- **[QDMN]** Learning Quality-aware Dynamic Memory for Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136890462.pdf) [[arXiv]](https://arxiv.org/abs/2207.07922) [[Code]](https://github.com/workforai/qdmn)

- **[TBD]** Tackling Background Distraction in Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136820434.pdf) [[arXiv]](https://arxiv.org/abs/2207.06953) [[Code]](https://github.com/suhwan-cho/TBD)

- **[GSFM]** Global Spectral Filter Memory Network for Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136890639.pdf) [[arXiv]](https://arxiv.org/abs/2210.05567) [[Code]](https://github.com/workforai/GSFM)

- **[RDE-VOS]** Recurrent Dynamic Embedding for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Recurrent_Dynamic_Embedding_for_Video_Object_Segmentation_CVPR_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2205.03761) [[Code]](https://github.com/limingxing00/rde-vos-cvpr2022)

- **[PCVOS]** Per-Clip Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Park_Per-Clip_Video_Object_Segmentation_CVPR_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2208.01924) [[Code]](https://github.com/pkyong95/PCVOS)

- **[CoVOS]** Accelerating Video Object Segmentation with Compressed Video, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Xu_Accelerating_Video_Object_Segmentation_With_Compressed_Video_CVPR_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2107.12192) [[Code]](https://github.com/kai422/CoVOS)

- **[SWEM]** SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Lin_SWEM_Towards_Real-Time_Video_Object_Segmentation_With_Sequential_Weighted_Expectation-Maximization_CVPR_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2208.10128) [[Code]](https://github.com/lmm077/SWEM)

- **[RPCMVOS]** Reliable Propagation-Correction Modulation for Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/20200/19959) [[arXiv]](https://arxiv.org/abs/2112.02853) [[Code]](https://github.com/jerryx1110/rpcmvos)

- **[SITVOS]** Siamese Network with Interactive Transformer for Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/download/20009/version/18306/19768) [[arXiv]](https://arxiv.org/abs/2112.13983)

- **[BMVOS]** Pixel-Level Bijective Matching for Video Object Segmentation, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2022/papers/Cho_Pixel-Level_Bijective_Matching_for_Video_Object_Segmentation_WACV_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2110.01644) [[Code]](https://github.com/suhwan-cho/BMVOS)

### 2021

- **[AOT]** Associating Objects with Transformers for Video Object Segmentation, *NeurIPS* [[Paper]](https://proceedings.neurips.cc//paper/2021/file/147702db07145348245dc5a2f2fe5683-Paper.pdf) [[arXiv]](https://arxiv.org/abs/2106.02638) [[Code]](https://github.com/z-x-yang/AOT)

- **[STCN]** Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation, *NeurIPS* [[Paper]](https://proceedings.neurips.cc//paper/2021/file/61b4a64be663682e8cb037d9719ad8cd-Paper.pdf) [[arXiv]](https://arxiv.org/abs/2106.05210) [[Code]](https://github.com/hkchengrex/STCN)

- **[JOINT]** Joint Inductive and Transductive Learning for Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Mao_Joint_Inductive_and_Transductive_Learning_for_Video_Object_Segmentation_ICCV_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2108.03679) [[Code]](https://github.com/maoyunyao/joint)

- **[HMMN]** Hierarchical Memory Matching Network for Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Seong_Hierarchical_Memory_Matching_Network_for_Video_Object_Segmentation_ICCV_2021_paper.pdf)

[[arXiv]](https://arxiv.org/abs/2109.11404) [[Code]](https://github.com/hongje/hmmn)

- **[DMN-AOA]** Video Object Segmentation with Dynamic Memory Networks and Adaptive Object Alignment, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Liang_Video_Object_Segmentation_With_Dynamic_Memory_Networks_and_Adaptive_Object_ICCV_2021_paper.pdf) [[Code]](https://github.com/liang4sx/dmn-aoa)

- **[RMNet]** Efficient Regional Memory Network for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Xie_Efficient_Regional_Memory_Network_for_Video_Object_Segmentation_CVPR_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2103.12934) [[Code]](https://github.com/hzxie/RMNet)

- **[LCM]** Learning Position and Target Consistency for Memory-Based Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Hu_Learning_Position_and_Target_Consistency_for_Memory-Based_Video_Object_Segmentation_CVPR_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2104.04329) 

- **[GIEL]** Video Object Segmentation Using Global and Instance Embedding Learning, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Ge_Video_Object_Segmentation_Using_Global_and_Instance_Embedding_Learning_CVPR_2021_paper.pdf) 

- **[SwiftNet]** SwiftNet: Real-time Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_SwiftNet_Real-Time_Video_Object_Segmentation_CVPR_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2102.04604) [[Code]](https://github.com/haochenheheda/SwiftNet)

- **[SSTVOS]** SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Duke_SSTVOS_Sparse_Spatiotemporal_Transformers_for_Video_Object_Segmentation_CVPR_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2101.08833) [[Code]](https://github.com/dukebw/SSTVOS)

- **[Reuse-VOS]** Learning Dynamic Network Using a Reuse Gate Function in Semi-Supervised Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Park_Learning_Dynamic_Network_Using_a_Reuse_Gate_Function_in_Semi-Supervised_CVPR_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2012.11655) [[Code]](https://github.com/HYOJINPARK/Reuse_VOS)

- **[STG-Net]** Spatiotemporal Graph Neural Network Based Mask Reconstruction for Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/16307/16114) [[arXiv]](https://arxiv.org/abs/2012.05499)

- **[QMRA]** Query-Memory Re-Aggregation for Weakly-Supervised Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/16300/16107)

 

### 2020

- **[STM-cycle]** Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation, *NeurIPS* [[Paper]](https://papers.nips.cc/paper/2020/file/0d5bd023a3ee11c7abca5b42a93c4866-Paper.pdf) [[arXiv]](https://arxiv.org/abs/2010.12176) [[Code]](https://github.com/lyxok1/STM-Training)

- **[AFB-URR]** Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement, *NeurIPS* [[Paper]](https://papers.nips.cc/paper/2020/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf) [[arXiv]](https://arxiv.org/abs/2010.07958) [[Code]](https://github.com/xmlyqing00/AFB-URR)

- **[e-OSVOS]** Make One-Shot Video Object Segmentation Efficient Again, *NeurIPS* [[Paper]](https://papers.nips.cc/paper/2020/file/781397bc0630d47ab531ea850bddcf63-Paper.pdf) [[arXiv]](https://arxiv.org/abs/2012.01866) [[Code]](https://github.com/dvl-tum/e-osvos)

- **[LWL]** Learning What to Learn for Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123470766.pdf) [[arXiv]](https://arxiv.org/abs/2003.11540) [[Code]](https://github.com/visionml/pytracking)

 

- **[EGMN]** Video Object Segmentation with Episodic Graph Memory Networks, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123480664.pdf) [[arXiv]](https://arxiv.org/abs/2007.07020) [[Code]](https://github.com/carrierlxk/GraphMemVOS)

- **[CFBI]** Collaborative Video Object Segmentation by Foreground-Background Integration, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123500324.pdf) [[arXiv]](https://arxiv.org/abs/2003.08333) [[Code]](https://github.com/z-x-yang/CFBI)

- **[GC]** Fast Video Object Segmentation using the Global Context Module, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123550732.pdf) [[arXiv]](https://arxiv.org/abs/2001.11243)

- **[KMN]** Kernelized Memory Network for Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123670630.pdf) [[arXiv]](https://arxiv.org/abs/2007.08270)

- **[SAT]** State-Aware Tracker for Real-Time Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_State-Aware_Tracker_for_Real-Time_Video_Object_Segmentation_CVPR_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2003.00482) [[Code]](https://github.com/MegviiDetection/video_analyst)

- **[FRTM]** Learning Fast and Robust Target Models for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Robinson_Learning_Fast_and_Robust_Target_Models_for_Video_Object_Segmentation_CVPR_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2003.00908) [[Code]](https://github.com/andr345/frtm-vos)

- **[TVOS]** A Transductive Approach for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_A_Transductive_Approach_for_Video_Object_Segmentation_CVPR_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2004.07193) [[Code]](https://github.com/microsoft/transductive-vos.pytorch)

- **[TAN-DTTM]** Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Huang_Fast_Video_Object_Segmentation_With_Temporal_Aggregation_Network_and_Dynamic_CVPR_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2007.05687) 

- **[FTMU]** Fast Template Matching and Update for Video Object Tracking and Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Sun_Fast_Template_Matching_and_Update_for_Video_Object_Tracking_and_CVPR_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2004.07538) [[Code]](https://github.com/insomnia94/FTMU)

- **[DIPNet]** DIPNet: Dynamic Identity Propagation Network for Video Object Segmentation, *WACV* [[Paper]](https://openaccess.thecvf.com/content_WACV_2020/papers/Hu_DIPNet_Dynamic_Identity_Propagation_Network_for_Video_Object_Segmentation_WACV_2020_paper.pdf)

### 2019

- **[DMM-Net]** DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Zeng_DMM-Net_Differentiable_Mask-Matching_Network_for_Video_Object_Segmentation_ICCV_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1909.12471) [[Code]](https://github.com/ZENGXH/DMM_Net)

- **[AGSS-VOS]** AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Lin_AGSS-VOS_Attention_Guided_Single-Shot_Video_Object_Segmentation_ICCV_2019_paper.pdf) [[Code]](https://github.com/dvlab-research/AGSS-VOS)

- **[RANet]** RANet: Ranking Attention Network for Fast Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_RANet_Ranking_Attention_Network_for_Fast_Video_Object_Segmentation_ICCV_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1908.06647) [[Code]](https://github.com/Storife/RANet)

- **[DTN]** Fast Video Object Segmentation via Dynamic Targeting Network, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Zhang_Fast_Video_Object_Segmentation_via_Dynamic_Targeting_Network_ICCV_2019_paper.pdf) 

- **[CapsuleVOS]** CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Duarte_CapsuleVOS_Semi-Supervised_Video_Object_Segmentation_Using_Capsule_Routing_ICCV_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1910.00132) [[Code]](https://github.com/KevinDuarte/CapsuleVOS)

- **[STM]** Video Object Segmentation Using Space-Time Memory Networks, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Oh_Video_Object_Segmentation_Using_Space-Time_Memory_Networks_ICCV_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1904.00607) [[Code]](https://github.com/seoungwugoh/STM)

- **[MHP-VOS]** MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Xu_MHP-VOS_Multiple_Hypotheses_Propagation_for_Video_Object_Segmentation_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1904.08141) [[Code]](https://github.com/shuangjiexu/MHP-VOS)

- **[STCNN]** Spatiotemporal CNN for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Xu_Spatiotemporal_CNN_for_Video_Object_Segmentation_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1904.02363) [[Code]](https://github.com/longyin880815/STCNN)

- **[RVOS]** RVOS: End-To-End Recurrent Network for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Ventura_RVOS_End-To-End_Recurrent_Network_for_Video_Object_Segmentation_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1903.05612) [[Code]](https://github.com/imatge-upc/rvos)

- **[A-GAME]** A Generative Appearance Model for End-To-End Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Johnander_A_Generative_Appearance_Model_for_End-To-End_Video_Object_Segmentation_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1811.11611) [[Code]](https://github.com/joakimjohnander/agame-vos)

- **[FEELVOS]** FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Voigtlaender_FEELVOS_Fast_End-To-End_Embedding_Learning_for_Video_Object_Segmentation_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1902.09513) [[Code]](https://github.com/kim-younghan/FEELVOS)

- **[SiamMask]** Fast Online Object Tracking and Segmentation: A Unifying Approach, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Fast_Online_Object_Tracking_and_Segmentation_A_Unifying_Approach_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1812.05050) [[Code]](https://github.com/foolwood/SiamMask)

- **[TIS]** Tukey-Inspired Video Object Segmentation, *WACV* [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8658909) [[arXiv]](https://arxiv.org/abs/1811.07958) [[Code]](https://github.com/griffbr/TIS)

### 2018

- **[S2S]** YouTube-VOS: Sequence-to-Sequence Video Object Segmentation, *ECCV* [[Paper]](https://openaccess.thecvf.com/content_ECCV_2018/papers/Ning_Xu_YouTube-VOS_Sequence-to-Sequence_Video_ECCV_2018_paper.pdf) [[arXiv]](https://arxiv.org/abs/1809.00461) [[Code]](https://github.com/BehradToghi/ConvLSTM_VOS)

- **[PReMVOS]** PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation, *ACCV* [[arXiv]](https://arxiv.org/abs/1807.09190) [[Code]](https://github.com/JonathonLuiten/PReMVOS)

- **[OSMN]** Efficient Video Object Segmentation via Network Modulation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_cvpr_2018/papers/Yang_Efficient_Video_Object_CVPR_2018_paper.pdf) [[arXiv]](https://arxiv.org/abs/1802.01218) [[Code]](https://github.com/linjieyangsc/video_seg)

- **[RGMP]** Fast Video Object Segmentation by Reference-Guided Mask Propagation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_cvpr_2018/papers/Oh_Fast_Video_Object_CVPR_2018_paper.pdf) [[Code]](https://github.com/seoungwugoh/RGMP)

- **[FAVOS]** Fast and Accurate Online Video Object Segmentation via Tracking Parts, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_cvpr_2018/papers/Cheng_Fast_and_Accurate_CVPR_2018_paper.pdf) [[arXiv]](https://arxiv.org/abs/1806.02323) [[Code]](https://github.com/JingchunCheng/FAVOS)

### 2017

- **[SegFlow]** SegFlow: Joint Learning for Video Object Segmentation and Optical Flow, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2017/papers/Cheng_SegFlow_Joint_Learning_ICCV_2017_paper.pdf) [[arXiv]](https://arxiv.org/abs/1709.06750) [[Code]](https://github.com/JingchunCheng/SegFlow)

- **[OSVOS]** One-Shot Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_cvpr_2017/papers/Caelles_One-Shot_Video_Object_CVPR_2017_paper.pdf) [[arXiv]](https://arxiv.org/abs/1611.05198) [[Code]](https://github.com/kmaninis/OSVOS-PyTorch)

- **[MaskTrack]** Learning Video Object Segmentation from Static Images, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_cvpr_2017/papers/Perazzi_Learning_Video_Object_CVPR_2017_paper.pdf) [[arXiv]](https://arxiv.org/abs/1612.02646) [[Code]](https://github.com/omkar13/MaskTrack)

## Unsupervised VOS Papers

### 2024

- **[DPA]** Dual Prototype Attention for Unsupervised Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Cho_Dual_Prototype_Attention_for_Unsupervised_Video_Object_Segmentation_CVPR_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2211.12036) [[Code]](https://github.com/Hydragon516/DPA)

- **[GSA-Net]** Guided Slot Attention for Unsupervised Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Lee_Guided_Slot_Attention_for_Unsupervised_Video_Object_Segmentation_CVPR_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2303.08314) [[Code]](https://github.com/Hydragon516/GSANet)

- **[DATTT]** Depth-aware Test-Time Training for Zero-shot Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Depth-aware_Test-Time_Training_for_Zero-shot_Video_Object_Segmentation_CVPR_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2403.04258) [[Code]](https://github.com/NiFangBaAGe/DATTT)

- **[GFA]** Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28295/28581)

### 2023

- **[SimulFlow]** SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation, *ACMMM* [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3581783.3611804) [[arXiv]](https://arxiv.org/abs/2311.18286)

- **[TGFormer]** Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation, *ACMMM* [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3581783.3612017)

- **[Isomer]** Isomer: Isomerous Transformer for Zero-Shot Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Yuan_Isomer_Isomerous_Transformer_for_Zero-shot_Video_Object_Segmentation_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2308.06693) [[Code]](https://github.com/DLUT-yyc/Isomer)

- **[OAST]** Unsupervised Video Object Segmentation with Online Adversarial Self-Tuning, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Su_Unsupervised_Video_Object_Segmentation_with_Online_Adversarial_Self-Tuning_ICCV_2023_paper.pdf)

- **[PMN]** Unsupervised Video Object Segmentation via Prototype Memory Network, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2023/papers/Lee_Unsupervised_Video_Object_Segmentation_via_Prototype_Memory_Network_WACV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2209.03712) [[Code]](https://github.com/Hydragon516/PMN)

- **[TMO]** Treating Motion as Option to Reduce Motion Dependency in Unsupervised Video Object Segmentation, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2023/papers/Cho_Treating_Motion_as_Option_To_Reduce_Motion_Dependency_in_Unsupervised_WACV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2209.03138) [[Code]](https://github.com/suhwan-cho/TMO)

### 2022

 - **[HFAN]** Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136940584.pdf) [[arXiv]](https://arxiv.org/abs/2207.08485) [[Code]](https://github.com/NUST-Machine-Intelligence-Laboratory/HFAN)

- **[IMP]** Iteratively Selecting an Easy Reference Frame Makes Unsupervised Video Object Segmentation Easier, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/download/20011/version/18308/19770) [[arXiv]](https://arxiv.org/abs/2112.12402)

- **[D2Conv3D]** D2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2022/papers/Schmidt_D2Conv3D_Dynamic_Dilated_Convolutions_for_Object_Segmentation_in_Videos_WACV_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2111.07774) [[Code]](https://github.com/schmiddo/d2conv3d)

- **[CFAM]** Video Salient Object Detection via Contrastive Features and Attention Modules, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2022/papers/Chen_Video_Salient_Object_Detection_via_Contrastive_Features_and_Attention_Modules_WACV_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2111.02368)

### 2021

- **[FSNet]** Full-Duplex Strategy for Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Ji_Full-Duplex_Strategy_for_Video_Object_Segmentation_ICCV_2021_paper.pdf) [[arXiv]](https://arxiv.org/pdf/2108.03151) [[Code]](https://github.com/GewelsJI/FSNet)

- **[TransportNet]** Deep Transport Network for Unsupervised Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhang_Deep_Transport_Network_for_Unsupervised_Video_Object_Segmentation_ICCV_2021_paper.pdf) 

- **[AMC-Net]** Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Learning_Motion-Appearance_Co-Attention_for_Zero-Shot_Video_Object_Segmentation_ICCV_2021_paper.pdf) [[Code]](https://github.com/isyangshu/amc-net)

- **[RTNet]** Reciprocal Transformations for Unsupervised Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Ren_Reciprocal_Transformations_for_Unsupervised_Video_Object_Segmentation_CVPR_2021_paper.pdf) [[Code]](https://github.com/OliverRensu/RTNet)

- **[F2Net]** F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/16308/16115) [[arXiv]](https://arxiv.org/abs/2012.02534) [[Code]](https://github.com/liudaizong/F2Net)

- **[FrameSelect]** Mask Selection and Propagation for Unsupervised Video Object Segmentation, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2021/papers/Garg_Mask_Selection_and_Propagation_for_Unsupervised_Video_Object_Segmentation_WACV_2021_paper.pdf) [[Code]](https://github.com/vidit98/FrameSelect)

### 2020

- **[3DC-Seg]** Making a Case for 3D Convolutions for Object Segmentation in Videos, *BMVC* [[Paper]](https://www.bmvc2020-conference.com/assets/papers/0233.pdf) [[arXiv]](https://arxiv.org/abs/2008.11516) [[Code]](https://github.com/sabarim/3DC-Seg)

- **[WCS-Net]** Unsupervised Video Object Segmentation with Joint Hotspot Tracking, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590477.pdf) [[Code]](https://github.com/luzhangada/code-for-WCS-Net)

- **[DFNet]** Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720443.pdf) [[arXiv]](https://arxiv.org/abs/2008.01270)

- **[MATNet]** Motion-Attentive Transition for Zero-Shot Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/7008/6862) [[arXiv]](https://arxiv.org/abs/2003.04253) [[Code]](https://github.com/tfzhou/MATNet)

- **[UnOVOST]** UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking, *WACV* [[Paper]](https://openaccess.thecvf.com/content_WACV_2020/papers/Luiten_UnOVOST_Unsupervised_Offline_Video_Object_Segmentation_and_Tracking_WACV_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2001.05425) [[Code]](https://github.com/idilesenzulfikar/UNOVOST)

- **[EpO-Net]** EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency, *WACV* [[Paper]](https://openaccess.thecvf.com/content_WACV_2020/papers/Akhter_EpO-Net_Exploiting_Geometric_Constraints_on_Dense_Trajectories_for_Motion_Saliency_WACV_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/1909.13258) [[Code]](https://github.com/mfaisal59/EpONet)

### 2019

- **[AD-Net]** Anchor Diffusion for Unsupervised Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Yang_Anchor_Diffusion_for_Unsupervised_Video_Object_Segmentation_ICCV_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1910.10895) [[Code]](https://github.com/yz93/anchor-diff-VOS)

- **[AGNN]** Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks, *ICCV* [[Paper]](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Zero-Shot_Video_Object_Segmentation_via_Attentive_Graph_Neural_Networks_ICCV_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/2001.06807) [[Code]](https://github.com/carrierlxk/AGNN)

- **[AGS]** Learning Unsupervised Video Object Segmentation Through Visual Attention, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_Unsupervised_Video_Object_Segmentation_Through_Visual_Attention_CVPR_2019_paper.pdf) [[Code]](https://github.com/wenguanwang/AGS)

- **[COSNet]** See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Lu_See_More_Know_More_Unsupervised_Video_Object_Segmentation_With_Co-Attention_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/2001.06810) [[Code]](https://github.com/carrierlxk/COSNet)

- **[SSAV]** Shifting More Attention to Video Salient Object Detection, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Fan_Shifting_More_Attention_to_Video_Salient_Object_Detection_CVPR_2019_paper.pdf) [[Code]](https://github.com/DengPingFan/DAVSOD)

- **[MOTAdapt]** Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting, *ICRA* [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8794254) [[arXiv]](https://arxiv.org/abs/1810.07733) [[Code]](https://github.com/MSiam/motion_adaptation)

### 2018

- **[PDB]** Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection, *ECCV* [[Paper]](https://openaccess.thecvf.com/content_ECCV_2018/papers/Hongmei_Song_Pseudo_Pyramid_Deeper_ECCV_2018_paper.pdf) [[Code]](https://github.com/shenjianbing/PDB-ConvLSTM)

## Referring VOS Papers

### 2024

- **[VISA]** VISA: Reasoning Video Object Segmentation via Large Language Models, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/02307.pdf) [[arXiv]](https://arxiv.org/abs/2407.11325) [[Code]](https://github.com/cilinyan/VISA)

- **[VD-IT]** Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01985.pdf) [[arXiv]](https://arxiv.org/abs/2403.12042) [[Code]](https://github.com/buxiangzhiren/VD-IT)

- **[ActionVOS]** ActionVOS: Actions as Prompts for Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/01553.pdf) [[arXiv]](https://arxiv.org/abs/2407.07402) [[Code]](https://github.com/ut-vision/ActionVOS)

- **[LoSh]** LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Yuan_LoSh_Long-Short_Text_Joint_Prediction_Network_for_Referring_Video_Object_CVPR_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2306.08736) [[Code]](https://github.com/LinfengYuan1997/LoSh)

- **[MUTR]** Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/28465/28905) [[arXiv]](https://arxiv.org/abs/2305.16318) [[Code]](https://github.com/OpenGVLab/MUTR)

- **[TCE-RVOS]** Temporal Context Enhanced Referring Video Object Segmentation, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2024/papers/Hu_Temporal_Context_Enhanced_Referring_Video_Object_Segmentation_WACV_2024_paper.pdf) [[Code]](https://github.com/haliphinx/TCE-RVOS)

### 2023

- **[SOC]** SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation, *NeurIPS* [[Paper]](https://openreview.net/pdf?id=KQyXyIAfK8) [[arXiv]](https://arxiv.org/abs/2305.17011) [[Page]](https://github.com/RobertLuo1/NeurIPS2023_SOC)

- **[HTML]** HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_HTML_Hybrid_Temporal-scale_Multimodal_Learning_Framework_for_Referring_Video_Object_ICCV_2023_paper.pdf) [[Page]](https://mingfei.info/HTML)

- **[OnlineRefer]** OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_OnlineRefer_A_Simple_Online_Baseline_for_Referring_Video_Object_Segmentation_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2307.09356) [[Code]](https://github.com/wudongming97/OnlineRefer)

- **[CMA]** Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Learning_Cross-Modal_Affinity_for_Referring_Video_Object_Segmentation_Targeting_Limited_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2309.02041) [[Code]](https://github.com/hengliusky/few_shot_rvos)

- **[R2VOS]** Robust Referring Video Object Segmentation with Cyclic Structural Consensus, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Robust_Referring_Video_Object_Segmentation_with_Cyclic_Structural_Consensus_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2207.01203) [[Code]](https://github.com/lxa9867/R2VOS)

- **[SgMg]** Spectrum-guided Multi-granularity Referring Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Miao_Spectrum-guided_Multi-granularity_Referring_Video_Object_Segmentation_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2307.13537) [[Code]](https://github.com/bo-miao/SgMg)

- **[TempCD]** Temporal Collection and Distribution for Referring Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Tang_Temporal_Collection_and_Distribution_for_Referring_Video_Object_Segmentation_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2309.03473) [[Code]](https://github.com/Toneyaya/TempCD)

### 2022

- **[MANet]** Multi-Attention Network for Compressed Video Referring Object Segmentation, *ACMMM* [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3503161.3547761) [[arXiv]](https://arxiv.org/abs/2207.12622) [[Code]](https://github.com/dexianghong/manet)

- **[MTTR]** End-to-End Referring Video Object Segmentation with Multimodal Transformers, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Botach_End-to-End_Referring_Video_Object_Segmentation_With_Multimodal_Transformers_CVPR_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2111.14821) [[Code]](https://github.com/mttr2021/MTTR)

- **[ReferFormer]** Language as Queries for Referring Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Wu_Language_As_Queries_for_Referring_Video_Object_Segmentation_CVPR_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2201.00487) [[Code]](https://github.com/wjn922/referformer)

- **[LBDT]** Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Ding_Language-Bridged_Spatial-Temporal_Interaction_for_Referring_Video_Object_Segmentation_CVPR_2022_paper.pdf) [[arXiv]](https://arxiv.org/abs/2206.03789) [[Code]](https://github.com/dzh19990407/lbdt)

- **[MLRL]** Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Wu_Multi-Level_Representation_Learning_With_Semantic_Alignment_for_Referring_Video_Object_CVPR_2022_paper.pdf)

- **[YOFO]** You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/20017/19776)

### 2020

- **[URVOS]** URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf) [[Code]](https://github.com/skynbe/Refer-Youtube-VOS)

## Other Related Papers

### 2024

- **[BA]** Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06142.pdf) [[arXiv]](https://arxiv.org/abs/2311.17893) [[Code]](https://github.com/shvdiwnkozbw/SSL-UVOS)

- **[LLE-VOS]** Event-assisted Low-Light Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Event-assisted_Low-Light_Video_Object_Segmentation_CVPR_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2404.01945) [[Code]](https://github.com/HebeiFast/EventLowLightVOS)

- **[EVA-VOS]** Learning the What and How of Annotation in Video Object Segmentation, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2024/papers/Delatolas_Learning_the_What_and_How_of_Annotation_in_Video_Object_WACV_2024_paper.pdf) [[arXiv]](https://arxiv.org/abs/2311.04414) [[Code]](https://github.com/thanosDelatolas/eva-vos)

### 2023

- **[Training-Free-VOS]** From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models, *NeurIPS* [[Paper]](https://openreview.net/pdf?id=jfsjKBDB1z) [[Code]](https://github.com/BGU-CS-VIL/Training-Free-VOS)

- **[DVSOD]** DVSOD: RGB-D Video Salient Object Detection, *NeurIPS* [[Paper]](https://openreview.net/pdf?id=Hm1Ih3uLII) [[arXiv]](https://arxiv.org/abs/2308.11796) [[Page]](https://dvsod.github.io/)

- **[VOSPGD]** Exploring the Adversarial Robustness of Video Object Segmentation via One-shot Adversarial Attacks, *ACMMM* [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3581783.3611827)

- **[DEVA]** Tracking Anything with Decoupled Video Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Cheng_Tracking_Anything_with_Decoupled_Video_Segmentation_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2309.03903) [[Code]](https://github.com/hkchengrex/Tracking-Anything-with-DEVA)

- **[Timetuning]** Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Salehi_Time_Does_Tell_Self-Supervised_Time-Tuning_of_Dense_Image_Representations_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2308.11796) [[Code]](https://github.com/SMSD75/Timetuning)

- **[VOS-VFI]** Video Object Segmentation-aware Video Frame Interpolation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Yoo_Video_Object_Segmentation-aware_Video_Frame_Interpolation_ICCV_2023_paper.pdf) [[Code]](https://github.com/junsang7777/VOS-VFI)

- **[LVOS]** LVOS: A Benchmark for Long-term Video Object Segmentation, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Hong_LVOS_A_Benchmark_for_Long-term_Video_Object_Segmentation_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2211.10181) [[Page]](https://lingyihongfd.github.io/lvos.github.io/)

- **[MOSE]** MOSE: A New Dataset for Video Object Segmentation in Complex Scenes, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Ding_MOSE_A_New_Dataset_for_Video_Object_Segmentation_in_Complex_ICCV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2302.01872) [[Page]](https://henghuiding.github.io/MOSE/)

- **[RCF]** Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Lian_Bootstrapping_Objectness_From_Videos_by_Relaxed_Common_Fate_and_Visual_CVPR_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2304.08025) [[Code]](https://github.com/TonyLianLong/RCF-UnsupVideoSeg)

- **[VOST]** Breaking the “Object” in Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Tokmakov_Breaking_the_Object_in_Video_Object_Segmentation_CVPR_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2212.06200) [[Page]](https://www.vostdataset.org/)

- **[InstMove]** InstMove: Instance Motion for Object-centric Video Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_InstMove_Instance_Motion_for_Object-Centric_Video_Segmentation_CVPR_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2303.08132) [[Code]](https://github.com/wjf5203/vnext)

- **[SSL-VOS]** A Simple and Powerful Global Optimization for Unsupervised Video Object Segmentation, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2023/papers/Ponimatkin_A_Simple_and_Powerful_Global_Optimization_for_Unsupervised_Video_Object_WACV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2209.09341) [[Code]](https://github.com/ponimatkin/ssl-vos)

- **[BURST]** BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2023/papers/Athar_BURST_A_Benchmark_for_Unifying_Object_Recognition_Segmentation_and_Tracking_WACV_2023_paper.pdf) [[arXiv]](https://arxiv.org/abs/2209.12118) [[Code]](https://github.com/ali2500/burst-benchmark)

### 2022

- **[EPIC-KITCHENS]** EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations, *NeurIPS* [[Paper]](https://openreview.net/pdf?id=djnKHOjpb7I) [[arXiv]](https://arxiv.org/abs/2209.13064) [[Page]](https://epic-kitchens.github.io/VISOR)

- **[SaVos]** Self-supervised Amodal Video Object Segmentation, *NeurIPS* [[Paper]](https://openreview.net/pdf?id=wlqb_RfSrKh) [[arXiv]](https://arxiv.org/abs/2210.12733)

- **[YouMVOS]** YouMVOS: An Actor-centric Multi-shot Video Object Segmentation Dataset, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Wei_YouMVOS_An_Actor-Centric_Multi-Shot_Video_Object_Segmentation_Dataset_CVPR_2022_paper.pdf) [[Page]](https://donglaiw.github.io/proj/youMVOS/)

- **[Wnet]** Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Pan_Wnet_Audio-Guided_Video_Object_Segmentation_via_Wavelet-Based_Cross-Modal_Denoising_Networks_CVPR_2022_paper.pdf) [[Code]](https://github.com/asudahkzj/Wnet)

### 2021

- **[DUL]** Dense Unsupervised Learning for Video Segmentation, *NeurIPS* [[Paper]](https://proceedings.neurips.cc/paper/2021/file/d516b13671a4179d9b7b458a6ebdeb92-Paper.pdf) [[arXiv]](https://arxiv.org/abs/2111.06265) [[Code]](https://github.com/visinf/dense-ulearn-vos)

- **[AMD]** The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos, *NeurIPS* [[Paper]](https://proceedings.neurips.cc/paper/2021/file/6d9cb7de5e8ac30bd5e8734bc96a35c1-Paper.pdf) [[arXiv]](https://arxiv.org/abs/2111.06394) [[Code]](https://github.com/rt219/the-emergence-of-objectness)

- **[MotionGroup]** Self-supervised Video Object Segmentation by Motion Grouping, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Self-Supervised_Video_Object_Segmentation_by_Motion_Grouping_ICCV_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2104.07658) [[Code]](https://github.com/charigyang/motiongrouping)

- **[GMB]** Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos, *ICCV* [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Generating_Masks_From_Boxes_by_Mining_Spatio-Temporal_Consistencies_in_Videos_ICCV_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2101.02196v1) [[Code]](https://github.com/visionml/pytracking)

- **[DANet]** Delving Deep Into Many-to-Many Attention for Few-Shot Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Delving_Deep_Into_Many-to-Many_Attention_for_Few-Shot_Video_Object_Segmentation_CVPR_2021_paper.pdf) [[Code]](https://github.com/scutpaul/DANet)

- **[IVOS-W]** Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Yin_Learning_To_Recommend_Frame_for_Interactive_Video_Object_Segmentation_in_CVPR_2021_paper.pdf) 

[[arXiv]](https://arxiv.org/abs/2103.10391) [[Code]](https://github.com/svip-lab/IVOS-W)

- **[GIS]** Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Heo_Guided_Interactive_Video_Object_Segmentation_Using_Reliability-Based_Attention_Maps_CVPR_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2104.10386) [[Code]](https://github.com/yuk6heo/GIS-RAmap)

- **[MiVOS]** Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion, *CVPR* [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Cheng_Modular_Interactive_Video_Object_Segmentation_Interaction-to-Mask_Propagation_and_Difference-Aware_Fusion_CVPR_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2103.07941) [[Code]](https://github.com/hkchengrex/MiVOS)

- **[ContrastCorr]** Contrastive Transformation for Self-supervised Correspondence Learning, *AAAI* [[Paper]](https://ojs.aaai.org/index.php/AAAI/article/view/17220/17027) [[arXiv]](https://arxiv.org/abs/2012.05057) [[Code]](https://github.com/594422814/ContrastCorr)

- **[TAO-VOS]** Reducing the Annotation Effort for Video Object Segmentation Datasets, *WACV* [[Paper]](https://openaccess.thecvf.com/content/WACV2021/papers/Voigtlaender_Reducing_the_Annotation_Effort_for_Video_Object_Segmentation_Datasets_WACV_2021_paper.pdf) [[arXiv]](https://arxiv.org/abs/2011.01142) [[Page]](https://www.vision.rwth-aachen.de/page/taovos)

### 2020

- **[CRW]** Space-Time Correspondence as a Contrastive Random Walk, *NeurIPS* [[Paper]](https://proceedings.neurips.cc/paper/2020/file/e2ef524fbf3d9fe611d5a8e90fefdc9c-Paper.pdf) [[arXiv]](https://arxiv.org/abs/2006.14613) [[Code]](https://github.com/ajabri/videowalk)

- **[ODMS]** Learning Object Depth from Camera Motion and Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123520290.pdf) [[arXiv]](https://arxiv.org/abs/2007.05676) [[Code]](https://github.com/griffbr/ODMS)

- **[ScribbleBox]** ScribbleBox: Interactive Annotation Framework for Video Object Segmentation, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123580290.pdf) [[arXiv]](https://arxiv.org/abs/2008.09721) [[Page]](http://www.cs.toronto.edu/~linghuan/scribblebox/)

- **[ATNet]** Interactive Video Object Segmentation Using Global and Local Transfer Modules, *ECCV* [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123620290.pdf) [[arXiv]](https://arxiv.org/abs/2007.08139) [[Code]](https://github.com/yuk6heo/IVOS-ATNet)

- **[MAST]** MAST: A Memory-Augmented Self-Supervised Tracker, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Lai_MAST_A_Memory-Augmented_Self-Supervised_Tracker_CVPR_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2002.07793) [[Code]](https://github.com/zlai0/MAST)

- **[MuG]** Learning Video Object Segmentation From Unlabeled Videos, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Learning_Video_Object_Segmentation_From_Unlabeled_Videos_CVPR_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2003.05020) [[Code]](https://github.com/carrierlxk/MuG)

- **[MA-Net]** Memory Aggregation Networks for Efficient Interactive Video Object Segmentation, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Miao_Memory_Aggregation_Networks_for_Efficient_Interactive_Video_Object_Segmentation_CVPR_2020_paper.pdf) [[arXiv]](https://arxiv.org/abs/2003.13246) [[Code]](https://github.com/lightas/CVPR2020_MANet)

### 2019

- **[TimeCycle]** Learning Correspondence from the Cycle-Consistency of Time, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_Correspondence_From_the_Cycle-Consistency_of_Time_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1903.07593) [[Code]](https://github.com/xiaolonw/TimeCycle)

- **[BubbleNets]** BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Griffin_BubbleNets_Learning_to_Select_the_Guidance_Frame_in_Video_Object_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1903.11779) [[Code]](https://github.com/griffbr/BubbleNets)

- **[IPNet]** Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks, *CVPR* [[Paper]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Oh_Fast_User-Guided_Video_Object_Segmentation_by_Interaction-And-Propagation_Networks_CVPR_2019_paper.pdf) [[arXiv]](https://arxiv.org/abs/1904.09791) [[Code]](https://github.com/seoungwugoh/ivs-demo)

### 2018

- **[YouTube-VOS]** A Large-Scale Benchmark for Video Object Segmentation Dataset, *preprint* [[arXiv]](https://arxiv.org/abs/1809.03327) [[Page]](https://youtube-vos.org/)

### 2016

- **[DAVIS]** A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation, *CVPR* [[Paper]](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Perazzi_A_Benchmark_Dataset_CVPR_2016_paper.pdf) [[Page]](https://davischallenge.org/)

### 2014

- **[FBMS]** Segmentation of Moving Objects by Long Term Video Analysis, *TPAMI* [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6682905) [[Page]](https://lmb.informatik.uni-freiburg.de/resources/datasets/)

### 2012

- **[YouTube-Objects]** Learning Object Class Detectors from Weakly Annotated Video, *CVPR* [[Paper]](https://inria.hal.science/hal-00695940/file/VO.pdf) [[Page]](https://data.vision.ee.ethz.ch/cvl/youtube-objects/)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/suhwan-cho/awesome-video-object-segmentation

Awesome Lists containing this project

README