Awesome-Referring-Image-Segmentation

:books: A collection of papers about Referring Image Segmentation.
https://github.com/MarkMoHR/Awesome-Referring-Image-Segmentation

Last synced: 2 days ago
JSON representation

5. Referring Video Object Segmentation
4. Referring Video Object Segmentation
1. Datasets
3. Traditional Referring Image Segmentation
- LISA: Reasoning Segmentation via Large Language Model - research/LISA) |
- Text Augmented Spatial-aware Zero-shot Referring Image Segmentation
- Bilateral Knowledge Interaction Network for Referring Image Segmentation
- Advancing Referring Expression Segmentation Beyond Single Image - res) |
- PolyFormer: Referring Image Segmentation as Sequential Polygon Generation - science/polygon-transformer) [[project]](https://polyformer.github.io/) |
- Contrastive Grouping with Transformer for Referring Image Segmentation
- Towards Robust Referring Image Segmentation - ref-seg) [[project]](https://lxtgh.github.io/project/robust_ref_seg/) |
- Segmentation from natural language expressions
- Unsupervised Domain Adaptation for Referring Semantic Segmentation
- Adaptive Selection based Referring Image Segmentation - coder/ASDA) |
- Dual Convolutional LSTM Network for Referring Image Segmentation
- Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
- LISA: Reasoning Segmentation via Large Language Model - research/LISA) |
- Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
- Towards Robust Referring Image Segmentation - ref-seg) [[project]](https://lxtgh.github.io/project/robust_ref_seg/) |
- IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
- A Simple Baseline with Single-encoder for Referring Image Segmentation - Yu/Shared-RIS) |
- Mask Grounding for Referring Image Segmentation - grounding/) |
- Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
- CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation
- Segment Everything Everywhere All at Once - Decoder/Segment-Everything-Everywhere-All-At-Once) |
- Cascade Grouped Attention Network for Referring Expression Segmentation
- Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation - Yu/Pseudo-RIS) |
- Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
- GSVA: Generalized Segmentation via Multimodal Large Language Models
- Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
- Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation - Xuan/MRES) [[webpage]](https://rubics-xuan.github.io/MRES/) |
- Towards Robust Referring Image Segmentation - ref-seg) |
- Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network - segmentation) |
- Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency
- Shatter and Gather: Learning Referring Image Segmentation with Text Supervision
- Referring Image Segmentation Using Text Supervision
- Beyond One-to-One: Rethinking the Referring Image Segmentation - DMMI) |
- Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
- Segment Everything Everywhere All at Once - Decoder/Segment-Everything-Everywhere-All-At-Once) |
- SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation
- WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation
- Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
- X-Decoder: Generalized Decoding for Pixel, Image and Language - Decoder) [[project]](https://x-decoder-vl.github.io/) |
- Learning to Segment Every Referring Object Point by Point - RES) |
- Meta Compositional Referring Expression Segmentation
- Zero-shot Referring Image Segmentation with Global-Local Context Features - Yu/Zero-shot-RIS) |
- Learning From Box Annotations for Referring Image Segmentation - Supervised-RIS) |
- Instance-Specific Feature Propagation for Referring Segmentation
- LAVT: Language-Aware Vision Transformer for Referring Image Segmentation - RIS) |
- CRIS: CLIP-Driven Referring Image Segmentation
- ReSTR: Convolution-free Referring Image Segmentation Using Transformers
- Vision-Language Transformer and Query Generation for Referring Segmentation - Language-Transformer) |
- MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
- Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
- Bottom-Up Shift and Reasoning for Referring Image Segmentation
- Locate then Segment: A Strong Pipeline for Referring Image Segmentation
- Linguistic Structure Guided Context Modeling for Referring Image Segmentation - Refseg) |
- Referring Image Segmentation via Cross-Modal Progressive Comprehension - Refseg) |
- Bi-directional Relationship Inferring Network for Referring Image Segmentation - BRINet) |
- PhraseCut: Language-based Image Segmentation in the Wild
- Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
- See-Through-Text Grouping for Referring Image Segmentation
- Referring Expression Object Segmentation with Caption-Aware Consistency
- Cross-Modal Self-Attention Network for Referring Image Segmentation - Net) |
- Key-Word-Aware Network for Referring Expression Image Segmentation - word-aware-network-pycaffe) |
- Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries - Uniandes/DMS) |
- Referring Image Segmentation via Recurrent Refinement Networks
- MAttNet: Modular Attention Network for Referring Expression Comprehension
- Recurrent Multimodal Interaction for Referring Image Segmentation - phrasecut-public) |
- Segmentation from natural language expressions
- Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
- DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
- WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation
- Prompt-Driven Referring Image Segmentation with Instance Contrasting
- LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation
- GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method
- SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation - Application-and-Integration-Lab/SAM4MLLM) |
- ReMamber: Referring Image Segmentation with Mamba Twister - rain-song/ReMamber) |
- SafaRi: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
- GRES: Generalized Referring Expression Segmentation
- Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
- CARIS: Context-Aware Referring Image Segmentation
- Two-stage Visual Cues Enhancement Network for Referring Image Segmentation - net) |
- Latent Expression Generation for Referring Image Segmentation and Grounding
- Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
2. Traditional Referring Image Segmentation
- Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
- Dual Convolutional LSTM Network for Referring Image Segmentation
3. Interactive Referring Image Segmentation
- PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click
5. Referring 3D Instance Segmentation
- Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation
6. Referring 3D Instance Segmentation
- InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
4. Interactive Referring Image Segmentation
- UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning - ChenLab/UniPixel) [[webpage]](https://polyu-chenlab.github.io/unipixel/) |
2. Challenges
- RVOS Challenge - scale Video Object Segmentation Challenge](https://lsvos.github.io/) | Aug 2024| [[CodaLab]](https://codalab.lisn.upsaclay.fr/competitions/19583) |
- 1st MeViS Challenge - level Video Understanding in the Wild](https://www.vspwdataset.com/Workshop2024.html) | May 2024| [[CodaLab]](https://codalab.lisn.upsaclay.fr/competitions/15094) |

Programming Languages

Categories

3. Traditional Referring Image Segmentation 81 5. Referring Video Object Segmentation 35 4. Referring Video Object Segmentation 21 1. Datasets 9 2. Traditional Referring Image Segmentation 2 2. Challenges 2 4. Interactive Referring Image Segmentation 1 6. Referring 3D Instance Segmentation 1 5. Referring 3D Instance Segmentation 1 3. Interactive Referring Image Segmentation 1

Sub Categories

Keywords

video-understanding 1 referring-video-object-segmentation 1 referring-expression-segmentation 1 referring-expression-comprehension 1 multimodal-learning 1 mose-dataset 1 mevis-dataset 1