Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Segmentation-With-Transformer
[Arxiv-04-2023] Transformer-Based Visual Segmentation: A Survey
https://github.com/lxtGH/Awesome-Segmentation-With-Transformer
Last synced: 2 days ago
JSON representation
-
Methods: A Survey
-
Strong Representation
- SegNeXt:Rethinking Convolutional Attention Design for Semantic Segmentation - attention-network/segnext) |
- SparK: the first successful BERT/MAE-style pretraining on any convolutional networks - tian/SparK) |
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers - zvg/SETR) |
- Multiscale vision transformers
- MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
- Xcit: Crosscovariance image transformers
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions
- CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
- Co-Scale Conv-Attentional Image Transformers - ucsd/CoaT) |
- MPViT: Multi-Path Vision Transformer for Dense Prediction
- SegViT: Semantic Segmentation with Plain Vision Transformers
- Representation Separation for SemanticSegmentation with Vision Transformers
- Swin transformer: Hierarchical vision transformer using shifted windows - Transformer) |
- Swin Transformer V2: Scaling Up Capacity and Resolution - Transformer) |
- SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
- CMT: Convolutional Neural Networks Meet Vision Transformers - pytorch) |
- Twins: Revisiting the Design of Spatial Attention in Vision Transformers - AutoML/Twins) |
- CvT: Introducing Convolutions to Vision Transformers
- Vitae: Vision transformer advanced by exploring intrinsic inductive bias - Transformer/ViTAE-Transformer) |
- A ConvNet for the 2020s
- PoolFormer: MetaFormer Is Actually What You Need for Vision - sg/poolformer) |
- Demystify Transformers & Convolutions in Modern Image Deep Networks - Evaluation) |
- An Empirical Study of Training Self-Supervised Vision Transformers - v3) |
- Beit: Bert pre-training of image transformers
- Masked Feature Prediction for Self-Supervised Visual Pre-Training
- Masked Autoencoders Are Scalable Vision Learners
- MCMAE: Masked Convolution Meets Masked Autoencoders - VL/ConvMAE) |
- Scaling Language-Image Pre-training via Masking
- ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders - V2) |
-
Meta-Architecture
- End-to-End Object Detection with Transformers
- Deformable DETR: Deformable Transformers for End-to-End Object Detection - DETR) |
- MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation
- Lite detr: An interleaved multi-scale encoder for efficient detr - Research/Lite-DETR) |
- Lite detr: An interleaved multi-scale encoder for efficient detr - Research/Lite-DETR) |
-
Interaction Design in Decoder
- Accelerating DETR Convergence via Semantic-Aligned Matching - DETR) |
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals - CNN) |
- AdaMixer: A Fast-Converging Query-Based Object Detector - NJU/AdaMixer) |
- Masked-attention Mask Transformer for Universal Image Segmentation
- k-means Mask Transformer - research/deeplab2) |
- Instances as queries
- ISTR: End-to-End Instance Segmentation via Transformers
- Solq: Segmenting objects by learning queries - research/SOLQ) |
- Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers - li/Panoptic-SegFormer) |
- CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
- Sparse Instance Activation for Real-Time Instance Segmentation
- Fast Convergence of DETR with Spatially Modulated Co-Attention - DETR) |
- End-to-End Object Detection with Adaptive Clustering Transformer - DETR) |
- Dynamic DETR: End-to-End Object Detection with Dynamic Attention
- Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity - detr) |
- FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
- VisTR: End-to-End Video Instance Segmentation with Transformers
- Video instance segmentation using inter-frame communication transformers
- Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation
- TubeFormer-DeepLab: Video Mask Transformer
- Temporally efficient vision transformer for video instance segmentation
- SeqFormer: Sequential Transformer for Video Instance Segmentation
- Mask2Former for Video Instance Segmentation
- TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers - LuHe/TransVOD) |
- VITA: Video Instance Segmentation via Object Token Association
- K-Net: Towards Unified Image Segmentation - Net/) |
- MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers - research/deeplab2) |
-
Optimizing Object Query
- Conditional DETR for Fast Training Convergence
- Conditional detr v2:Efficient detection transformer with box queries
- Anchor detr: Query design for transformer-based detector - model/AnchorDETR) |
- DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR - DETR) |
- Towards Data-Efficient Detection Transformers - DETRs) |
- Dndetr:Accelerate detr training by introducing query denoising - opensource/DN-DETR) |
- DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection - Research/DINO) |
- Mp-former: Mask-piloted transformer for image segmentation - Research/MP-Former) |
- Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
- Learning equivariant segmentation with instance-unique querying
- DETRs with Hybrid Matching
- Group detr: Fast detr training with group-wise one-to-many assignment
- Detrs with collaborative hybrid assignments training - X/Co-DETR) |
- Efficient detr: improving end-to-end object detector with dense prior
-
Using Query For Association
- TrackFormer: Multi-Object Tracking with Transformer
- TransTrack: Multiple Object Tracking with Transformer
- MOTR: End-to-End Multiple-Object Tracking with TRansformer - research/MOTR) |
- MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
- In defense of online models for video instance segmentation
- A Generalized Framework for Video Instance Segmentation
- Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation
- Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation - PartFormer) |
- PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation
- Panopticdepth: A unified framework for depth-aware panoptic segmentation
- Fashionformer: A simple, effective and unified baseline for human fashion segmentation and recognition
- InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
- Universal Instance Perception as Object Discovery and Retrieval - IIAU/UNINEXT) |
- GLEE: General Object Foundation Model for Images and Videos at Scale - vision.github.io/) |
- OMG-Seg: Is One Model Good Enough For All Segmentation? - Seg) |
- UniVS: Unified and Universal Video Segmentation with Prompts as Queries
- CTVIS: Consistent Training for Online Video Instance Segmentation
- Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation - K-Net) |
-
Conditional Query Generation
- Vision-Language Transformer and Query Generation for Referring Segmentation - Language-Transformer) |
- Lavt: Language-aware vision transformer for referring image segmentation - RIS) |
- Restr:Convolution-free referring image segmentation using transformers
- Cris: Clip-driven referring image segmentation
- End-to-End Referring Video Object Segmentation with Multimodal Transformers
- Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
- Language as queries for referring video object segmentation
- Few-Shot Segmentation via Cycle-Consistent Transformer
- MatteFormer: Transformer-Based Image Matting via Prior-Tokens
- A Transformer-based Decoder for Semantic Segmentation with Multi-level Context Mining
- StructToken : Rethinking Semantic Segmentation with Structural Prior
- Mask Matching Transformer for Few-Shot Segmentation
- Adaptive Agent Transformer for Few-shot Segmentation
- Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation
- Mask Grounding for Referring Image Segmentation - grounding/) |
-
Tuning Foundation Models
- Conditional Prompt Learning for Vision-Language Models
- Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification - Adapter) |
- Frozen CLIP Models are Efficient Video Learners - video-recognition) |
- Vision Transformer Adapter for Dense Predictions - Adapter) |
- DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
- Image Segmentation Using Text and Image Prompts
- OneFormer: One Transformer to Rule Universal Image Segmentation - Labs/OneFormer) |
- Open-Vocabulary Object Detection Using Captions - cnn) |
- Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
- Detecting Twenty-thousand Classes using Image-level Supervision
- Open-Vocabulary DETR with Conditional Matching - DETR) |
- F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models - vlm/home) |
- Class-agnostic Object Detection with Multi-modal Transformer
- Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
- Language-driven Semantic Segmentation - org/lang-seg) |
- A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
- Extract Free Dense Labels from CLIP
- Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation - video-object) |
- Betrayed-by-Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation - by-captions) |
- Open-World Entity Segmentation - research/Entity/) |
- OW-DETR: Open-world Detection Transformer - DETR) |
- PROB: Probabilistic Objectness for Open World Object Detection
-
Related Domains and Beyond
- Point Transformer
- PCT: Point cloud transformer
- Stratified Transformer for 3D Point Cloud Segmentation - research/Stratified-Transformer) |
- Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling - BERT) |
- Masked Autoencoders for Point Cloud Self-supervised Learning - Yatian/Point-MAE) |
- Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training - M2AE) |
- Mask3D for 3D Semantic Instance Segmentation
- Superpoint Transformer for 3D Scene Instance Segmentation
- PUPS: Point Cloud Unified Panoptic Segmentation
- DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
- HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation
- MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
- Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers
- DA-DETR: Domain Adaptive Detection Transformer with Information Fusion
- MTTrans: Cross-Domain Object Detection with Mean-Teacher Transformer - Yu/MTTrans-OpenSource) |
- The devil is in the labels: Semantic segmentation from sentences
- LMSeg: Language-guided Multi-dataset Segmentation
- Simple multi-dataset detection
- Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
- Unifying Panoptic Segmentation for Autonomous Driving
- TarViS: A Unified Approach for Target-based Video Segmentation
- Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
- Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation
- Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation - PCM) |
- Emerging Properties in Self-Supervised Vision Transformers
- Localizing Objects with Self-Supervised Transformers and no Labels
- Unsupervised Semantic Segmentation by Distilling Feature Correspondences
- ReCo: Retrieve and Co-segment for Zero-shot Transfer
- Discovering Object Masks with Transformers for Unsupervised Semantic Segmentation
- FreeSOLO: Learning to Segment Objects without Annotations
- Cut and Learn for Unsupervised Object Detection and Instance Segmentation
- Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut
- MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer - cvnets) |
- Rethinking Mobile Block for Efficient Neural Models
- TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
- SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation - zvg/SeaFormer) |
- Mask Transfiner for High-Quality Instance Segmentation
- Video Mask Transfiner for High-Quality Video Instance Segmentation
- SimpleClick: Interactive Image Segmentation with Simple Vision Transformers
- PatchDCT: Patch Refinement for High Quality Instance Segmentation - w12/PatchDCT) |
- Video Object Segmentation using Space-Time Memory Networks
- Associating Objects with Transformers for Video Object Segmentation - x-yang/AOT) |
- Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
- XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
- Per-Clip Video Object Segmentation
- Look Before You Match: Instance Understanding Matters in Video Object Segmentation
- Attention-Based Transformers for Instance Segmentation of Cells in Microstructures - DETR) |
- TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
- Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation - Unet) |
- TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation
- UNETR: Transformers for 3D Medical Image Segmentation - MONAI/research-contributions/tree/main/UNETR) |
-
-
Related Repo For Segmentation and Detection
Programming Languages
Sub Categories
Keywords
deep-learning
3
computer-vision
2
sparse-convolution
1
self-supervised-learning
1
pytorch
1
pretraining
1
pretrain
1
pre-trained-model
1
object-detection
1
masked-image-modeling
1
masked-autoencoder
1
mask-rcnn
1
mae
1
instance-segmentation
1
iclr2023
1
iclr
1
convolutional-neural-networks
1
convnet
1
cnn
1
bert
1
tpami-2024
1
open-vocabulary
1
vit
1
visual-transformer
1
vision-transformer
1
transformers
1
transformer-with-cv
1
transformer-models
1
transformer-cv
1
transformer-awesome
1
transformer-architecture
1
transformer
1
self-attention
1
papers
1
detr
1
awesome-list
1
attention-mechanisms
1
attention-mechanism
1
ssl
1