Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
https://github.com/cmhungsteve/Awesome-Transformer-Attention
Last synced: 5 days ago
JSON representation
-
Video (High-level)
-
Other Video Tasks
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - zephyrus/AVION)]
- [Paper - VCLIP)]
- [Paper
- [Paper - vid.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper - narrator.github.io/)]
- [Paper - palm.github.io/Spacewalk-18/)]
- [Paper - Machine-Intelligence-Laboratory/VideoMAC)]
- [Paper
- [Paper - Prompt)]
- [Paper - Transformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - nu.com/dxformer)]
- [Paper
- [Paper
- [Paper
- [Paper - CFFM)]
- [Paper
- [Paper - K-Net)]
- [Paper - MRCFA)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Anything-with-DEVA)][[Website](https://hkchengrex.com/Tracking-Anything-with-DEVA/)]
- [Paper - Link)]
- [Paper - Mask)]
- [Paper
- [Paper
- [Paper - pt)]
- [Paper - ttt.github.io/)]
- [Paper - LuHe/TransVOD)]
- [Paper
- [Paper
- [Paper
- [Paper - Wong/PTSEFormer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - CVLAB/CETR)][[Website](https://ku-cvlab.github.io/CETR/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - chen/Efficient-Prompt)][[Website](https://ju-chen.github.io/efficient-prompt/)]
- [Paper - CLIP)]
- [Paper - video-recognition)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - image-models)][[Website](https://adapt-image-models.github.io/)]
- [Paper - CLIP)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - CLIP)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - website/index.html)]
- [Paper - VCLIP)]
- [Paper - Rings/ILA)]
- [Paper
- [Paper - mmai-research/DiST)]
- [Paper
- [Paper - at/MAXI)]
- [Paper
- [Paper
- [Paper
- [Paper - LLM)][[Website](https://mvig-rhos.com/symbol_llm)]
- [Paper - egoAR)][[Website](https://dibschat.github.io/openvocab-egoAR/)]
- [Paper - mmai-research/CLIP-FSAR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - page/2082/overview)]
- [Paper
- [Paper - Pre)]
- [Paper
- [Paper - kitchens.github.io/epic-fields/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/google-research/tree/master/video_timeline_modeling)]
- [Paper
- [Paper - ctj/OST)][[Website](https://tomchen-ctj.github.io/OST/)]
- [Paper
- [Paper - CLIP)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - NJU/VideoMAE)]
- [Paper
- [Paper
- [Paper
- [Paper - lab/WeakSVR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - mae-video.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - webpage/)]
- [Paper
- [Paper
- [Paper
- [Paper - mmai-research/MoLo)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - LX/VidVRD-tracklets)]
- [Paper
- [Paper - LX/VidSGG-BIG)]
- [Paper - LX/OpenVoc-VidVRD)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - DD)]
- [Paper
- [Paper
- [Paper - lab.github.io/dataset/RepCount_dataset.html)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - PD)]
- [Paper - Action/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - video-seg-univs)]
- [Paper - tao-whu/DVIS_Plus)]
- [Paper - AI/FROSTER)][[Website](https://visual-ai.github.io/froster/)]
- [Paper
- [Paper - VIS)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - MRCFA)]
- [Paper - jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/)]
- [Paper
-
Action Recognition
- [Paper - pytorch)]
- [Paper
- [Paper - research/gluonmm)]
- [Paper - pytorch)]
- [Paper
- [Paper - Transformer)]
- [Paper
- [Paper - transformers)]
- [Paper
- [Paper
- [Paper - MIIL/STAM)]
- [Paper
- [Paper
- [Paper
- [Paper - Swin-Transformer)]
- [Paper - X/UniFormer)]
- [Paper - cviu/DirecFormer)]
- [Paper
- [Paper
- [Paper - 3D)]
- [Paper
- [Paper
- [Paper
- [Paper - research/scenic/tree/main/scenic/projects/mtv)]
- [Paper
- [Paper - selfsupervision)]
- [Paper
- [Paper - sg/dualformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - adapter)]
- [Paper
- [Paper - in-Attention)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - jungin/DualPath)]
- [Paper - Video-Model)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - FocalNets)][[Website](https://talalwasim.github.io/Video-FocalNets/)]
- [Paper
- [Paper
- [Paper
- [Paper - L)]
- [Paper
- [Paper
- [Paper - VLL/CAST)][[Website](https://jong980812.github.io/CAST.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - mmai-research/TAdaConv)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Kim/GL-Transformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - TRS)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Action-Transformer-Network-Pytorch-)]
- [Paper
- [Paper
- [Paper - pytorch)]
- [Paper - mamba-suite)]
- [Paper - shamil/HandFormer)][[Website](https://s-shamil.github.io/HandFormer/)]
- [Paper - VICLab/SkateFormer)][[Website](https://jeonghyeokdo.github.io/SkateFormer_site/)]
- [Paper
- [Paper
- [Paper - Project/)]
-
Action Detection/Localization
- [Paper
- [Paper - NJU/RTD-Action)]
- [Paper
- [Paper - research/long-short-term-transformer)][[Website](https://xumingze0308.github.io/projects/lstr/)]
- [Paper
- [Paper
- [Paper
- [Paper - research/vidpress-sports)]
- [Paper - TCT)]
- [Paper
- [Paper
- [Paper - NJU/DDM)]
- [Paper - Stream-Transformer-for-Generic-Event-Boundary-Captioning)]
- [Paper
- [Paper
- [Paper
- [Paper - zephyrus/TeSTra)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - transformers)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/scenic/tree/main/scenic/projects/token_turing)]
- [Paper
- [Paper
- [Paper - research/scenic)]
- [Paper - NJU/EVAD)]
- [Paper - Nick/MS-DETR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - kim/CAFE_codebase)][[Website](https://dk-kim.github.io/CAFE/)]
- [Paper
- [Paper
- [Paper
-
Video Instance Segmentation
-
Action Prediction/Anticipation
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - zhong/AFFT)]
- [Paper
- [Paper
- [Paper
- [Paper - action-prediction)][[Website](https://alexandrosstergiou.github.io/project_pages/TemPr/index.html)]
- [Paper - and-Anticipation-Transformer)]
- [Paper - x/SwinLSTM)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - palm/ObjectPrompt)]
-
Video Object Segmentation
- [Paper
- [Paper
- [Paper
- [Paper - benchmark)][[Code (in construction)](https://github.com/z-x-yang/AOT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - x-yang/AOT)]
- [Paper - benchmark)]
- [Paper
- [Paper
- [Paper - yyc/Isomer)]
- [Paper
- [Paper
- [Paper
- [Paper - api)][[Website](https://henghuiding.github.io/MOSE/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
-
Image Classification / Backbone
-
Vision Transformer
- [Paper
- [Paper - research/vision_transformer)][[PyTorch (lucidrains)](https://github.com/lucidrains/vit-pytorch)][[JAX (conceptofmind)](https://github.com/conceptofmind/vit-flax)]
- [Paper - pytorch)]
- [Paper - ai/pit)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Transformer)][[PyTorch (berniwal)](https://github.com/berniwal/swin-transformer-pytorch)]
- [Paper - opensource/T2T-ViT)]
- [Paper
- [Paper - IVA-Lab/DPT)]
- [Paper - Transformer)]
- [Paper
- [Paper - AutoML/Twins)]
- [Paper - research/anti-aliasing-transformer)]
- [Paper - wang/Dynamic-Vision-Transformer)]
- [Paper
- [Paper - noah/CV-Backbones/tree/master/tnt_pytorch)][[PyTorch (lucidrains)](https://github.com/lucidrains/transformer-in-transformer)]
- [Paper
- [Paper
- [Paper - ViT)]
- [Paper
- [Paper - research/nested-transformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - group/LIT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Transformer)]
- [Paper
- [Paper - Group/Diverse-ViT)]
- [Paper - ViT)]
- [Paper
- [Paper
- [Paper - Transformer)]
- [Paper - Transformer)]
- [Paper - NomMer)]
- [Paper - Transformer)]
- [Paper - noah/CV-Backbones/tree/master/tnt_pytorch)]
- [Paper
- [Paper
- [Paper - research/Unified-Normalization)]
- [Paper
- [Paper
- [Paper
- [Paper - research/maxvit)]
- [Paper - Transformer/ViTAE-VSA)]
- [Paper
- [Paper
- [Paper
- [Paper - group/LITv2)]
- [Paper
- [Paper
- [Paper - transformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Labs/Neighborhood-Attention-Transformer)]
- [Paper - former)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - AutoML/CPVT)]
- [Paper - Research/LipsFormer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - x/SMT)]
- [Paper - Transformer)]
- [Paper
- [Paper - Former)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Transformer/QFormer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - CVC/GroupMixFormer)]
- [Paper
- [Paper
- [Paper - Attention)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - longformer)]
- [Paper
- [Paper
- [Paper - Group/SViTE)]
- [Paper
- [Paper - Transformer)]
- [Paper - research.xyz/)]
- [Paper
- [Paper
- [Paper - zvg/SOFT)][[Website](https://fudan-zvg.github.io/SOFT/)]
- [Paper - red/)]
- [Paper
- [Paper - Labs/Compact-Transformers)]
- [Paper
- [Paper
- [Paper
- [Paper - ls)]
- [Paper
- [Paper
- [Paper
- [Paper - Net)]
- [Paper
- [Paper - ViT)]
- [Paper - IDL/PaddleViT)]
- [Paper
- [Paper
- [Paper
- [Paper - Group/ViT-Anti-Oversmoothing)]
- [Paper
- [Paper - Yang/LVT)]
- [Paper - vit.github.io/)]
- [Paper
- [Paper - 1](https://github.com/karttikeya/minREV)][[PyTorch-2](https://github.com/facebookresearch/slowfast)]
- [Paper
- [Paper
- [Paper
- [Paper - fi/edgevit)]
- [Paper
- [Paper - X/SiT)]
- [Paper
- [Paper - Group/M3ViT)]
- [Paper
- [Paper
- [Paper - research/EfficientFormer)]
- [Paper - noah/Efficient-AI-Backbones/tree/master/ghostnetv2_pytorch)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ViT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Vision-Transformer)]
- [Paper - zvg/SOFT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Labs/Neighborhood-Attention-Transformer)]
- [Paper
- [Paper
- [Paper
- [Paper - Level-ViT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - R)]
- [Paper
- [Paper - Transformer)]
- [Paper - mmlab/mmpretrain/tree/main/configs/riformer)][[Website](https://techmonsterwang.github.io/RIFormer/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - han-lab/efficientvit)]
- [Paper - SEC-Lab/mpcvit)]
- [Paper
- [Paper - research/EfficientFormer)]
- [Paper
- [Paper
- [Paper - fastvit)]
- [Paper - ai/seit)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - resolution-vit)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ViT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ucsd/CoaT)]
- [Paper
- [Paper
- [Paper - hao-tian/ConTNet)]
- [Paper
- [Paper - cvnets)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - for-small-scale-datasets)]
- [Paper
- [Paper - sg/iFormer)]
- [Paper
- [Paper
- [Paper
- [Paper - cvnets)]
- [Paper - X/UniFormer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - sg/metaformer)]
- [Paper - Evaluation)]
- [Paper
- [Paper - Attention-Network)]
- [Paper - Q/SDMAE)]
- [Paper - tian/SparK)]
- [Paper - research/deeplab2)]
- [Paper
- [Paper
- [Paper
- [Paper - ai.cn/wugaojie/PSLT.html)]
- [Paper
- [Paper
- [Paper - gpt)]
- [Paper
- [Paper
- [Paper
- [Paper - Drloc)]
- [Paper
- [Paper
- [Paper - Ahmed/SiT)]
- [Paper - SSL)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - pytorch)]
- [Paper
- [Paper
- [Paper - ViTs-pytorch)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Prediction-Pretraining)]
- [Paper
- [Paper
- [Paper - X/TokenMix)]
- [Paper - transformers)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Group/Scalable-L2O)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/vision_transformer)][[PyTorch (rwightman)](https://github.com/rwightman/pytorch-image-models)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/big_vision)]
- [Paper - VL/ConvMAE)]
- [Paper - MAE)]
- [Paper - Ahmed/GMML)]
- [Paper
- [Paper - enyac/supmae)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/syn-rep-learn)]
- [Paper
- [Paper
- [Paper - cyy/Nested-Co-teaching)][[Website](https://yingyichen-cyy.github.io/Jigsaw-ViT/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - tangtao/AutoView)]
- [Paper
- [Paper - CLIP)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ntu.com/project/mfm/index.html)]
- [Paper
- [Paper - FDSL-on-VisualAtom)][[Website](https://masora1030.github.io/Visual-Atoms-Pre-training-Vision-Transformers-with-Sinusoidal-Waves/)]
- [Paper
- [Paper
- [Paper
- [Paper - Research/DisCo-CLIP)]
- [Paper
- [Paper
- [Paper - X/MixMIM)]
- [Paper
- [Paper
- [Paper - research/big_vision)]
- [Paper
- [Paper - research/big_vision/blob/main/big_vision/configs/proj/clippo/README.md)]
- [Paper - VLAA/DMAE)]
- [Paper - Wang409/HPM)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - AI/scaling-laws-openclip)]
- [Paper
- [Paper - Pretraining)]
- [Paper - Net)]
- [Paper - lite)]
- [Paper
- [Paper
- [Paper - AI/openmixup)]
- [Paper
- [Paper - ai/DreamTeacher/)]
- [Paper - nakamura/OFDB/)]
- [Paper - mmlab/mmpretrain)]
- [Paper - Align)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/big_vision)]
- [Paper
- [Paper - VLAA/CLIPA)]
- [Paper
- [Paper - research/RevCol)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - VLAA/CLIPA)]
- [Paper
- [Paper - Netv2)]
- [Paper
- [Paper
- [Paper
- [Paper - Wang409/DropPos)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/syn-rep-learn)]
- [Paper - research/syn-rep-learn)]
- [Paper - data-vits)]
- [Paper
- [Paper
- [Paper - mixer_cnn)]
- [Paper - vs-CNNs)]
- [Paper
- [Paper
- [Paper
- [Paper - Naseer/Improving-Adversarial-Transferability-of-Vision-Transformers)]
- [Paper
- [Paper
- [Paper
- [Paper - vit)]
- [Paper - wei/PNA-PatchOut)]
- [Paper
- [Paper - EIC/Patch-Fool)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - vit)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - takeshi188/CFA)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - VLAA/vit_cert)]
- [Paper - Adversarial-Training-Meets-Vision-Transformers)]
- [Paper
- [Paper - chefer/RobustViT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - VLAA/RobustCNN)]
- [Paper - wu/dmae)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - lipschitz)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - zhh/FQ-ViT)]
- [Paper - Group/UVC)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ViT)]
- [Paper
- [Paper
- [Paper - ViT)]
- [Paper
- [Paper - Wang/VTC-LFC)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ViT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/TPS-CVPR2023)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - hoan-le/binaryvit)]
- [Paper
- [Paper
- [Paper
- [Paper - group/evol-q)]
- [Paper
- [Paper - ViT)]
- [Paper - ViT)]
- [Paper - FP4)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ViT)]
- [Paper
- [Paper
- [Paper - Yang/Denoising-ViT)][[Website](https://jiawei-yang.github.io/DenoisingViT/)]
- [Paper
- [Paper
- [Paper - AutoML/VisionLLaMA)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - aim)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - xu/EfficientMod)]
- [Paper
- [Paper - y/Data-independent-Module-Aware-Pruning)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
Replace Conv w/ Attention
- [Paper - relational-nets)]
- [Paper - 1 (leaderj1001)](https://github.com/leaderj1001/Stand-Alone-Self-Attention)][[PyTorch-2 (MerHS)](https://github.com/MerHS/SASA-pytorch)]
- [Paper - attention)]
- [Paper
- [Paper - deeplab)]
- [Paper - self-attention-network)]
- [Paper - pytorch)]
- [Paper - CV/CoTNet)]
- [Paper - liu/HAT-Net)]
- [Paper - Augmented-Conv2d)][[Tensorflow (titu1994)](https://github.com/titu1994/keras-attention-augmented-convs)]
- [Paper
- [Paper - 1 (lucidrains)](https://github.com/lucidrains/lambda-networks)][[PyTorch-2 (leaderj1001)](https://github.com/leaderj1001/LambdaNetworks)]
- [Paper - 1 (lucidrains)](https://github.com/lucidrains/bottleneck-transformer-pytorch)][[PyTorch-2 (leaderj1001)](https://github.com/leaderj1001/BottleneckTransformers)]
- [Paper
- [Paper
- [Paper
-
Attention-Free
- [Paper
- [Paper
- [Paper - you-even-need-attention)]
- [Paper
- [Paper
- [Paper - Qibin/VisionPermutator)]
- [Paper
- [Paper
- [Paper - mlp)]
- [Paper
- [Paper
- [Paper - Labs/Convolutional-MLPs)]
- [Paper
- [Paper - research/vision_transformer)][[PyTorch-1 (lucidrains)](https://github.com/lucidrains/mlp-mixer-pytorch)][[PyTorch-2 (rishikksh20)](https://github.com/rishikksh20/MLP-Mixer-pytorch)]
- [Paper - Attention-to-MLPs)]
- [Paper
- [Paper
- [Paper - lab/AS-MLP)]
- [Paper - noah/CV-Backbones/tree/master/wavemlp_pytorch)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Theodore/MDMLP)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Process/Strip_MLP)]
- [Paper - sg/poolformer)]
- [Paper
- [Paper - pytorch)]
- [Paper
- [Paper - research.xyz/)]
- [Paper
- [Paper
- [Paper
- [Paper - xu/Context-Cluster)]
- [Paper - Group/SLaK)]
- [Paper - V2)]
- [Paper - official)][[Website](https://doranlyong.github.io/projects/spanet/)]
- [Paper
- [Paper
- [Paper - RWKV)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
Analysis for Transformer
- [Paper - cnn)][[Website](https://epfml.github.io/attention-cnn/)]
- [Paper - chefer/Transformer-Explainability)]
- [Paper
- [Paper
- [Paper
- [Paper - Naseer/Intriguing-Properties-of-Vision-Transformers)]
- [Paper
- [Paper
- [Paper
- [Paper - paired-viz)]
- [Paper - kodai/FractalDB-Pretrained-ViT-PyTorch)][[Website](https://hirokatsukataoka16.github.io/Vision-Transformers-without-Natural-Images/)]
- [Paper - do-vits-work)]
- [Paper - research/vision_transformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - vipa/ProtoPFormer)]
- [Paper - lab/ICLIP)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - shapley)]
- [Paper
- [Paper
- [Paper - ai/cl-vs-mim)]
- [Paper
- [Paper
- [Paper
- [Paper - umd/vit_analysis)][[Website](https://www.cs.umd.edu/~sakshams/vit_analysis/)]
- [Paper
- [Paper
- [Paper
- [Paper - DiffMask)]
- [Paper - brinkmann/social-biases-in-vision-transformers)]
- [Paper
- [Paper - of-the-Backbones)]
- [Paper
- [Paper - Alpha/AtMan)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - relation)][[Website](https://sites.google.com/view/spatial-relation)]
- [Paper - Guided-CAM-Visual-Explanations-of-Vision-Transformer-Guided-by-Self-Attention)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
-
Detection
-
3D Object Detection
- [Paper - SS3D)]
- [Paper
- [Paper
- [Paper
- [Paper - Free-3D)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Lab-SCUT/VISTA)]
- [Paper
- [Paper
- [Paper
- [Paper - detr.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/UVTR)]
- [Paper
- [Paper - zvg/DeepInteraction)]
- [Paper
- [Paper
- [Paper - zvg/PolarFormer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - n/PVT-SSD)]
- [Paper
- [Paper
- [Paper - W/DSVT)]
- [Paper
- [Paper - PJLab/MV-JAR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/PETR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - MS/V-DETR)]
- [Paper - tlabs/3DiffTection)][[Website](https://research.nvidia.com/labs/toronto-ai/3difftection/)]
- [Paper
- [Paper
- [Paper - for-MixSup)]
- [Paper
- [Paper
- [Paper
-
Object Detection
- [Paper - Research/detrex)]
- [Paper
- [Paper - DETR)]
- [Paper - detr)]
- [Paper - DETR)]
- [Paper
- [Paper - detr)]
- [Paper
- [Paper
- [Paper
- [Paper - DETR/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/AnchorDETR)]
- [Paper - detr)]
- [Paper - DETR)]
- [Paper - ust/DN-DETR)]
- [Paper - DETR)]
- [Paper - NJU/AdaMixer)]
- [Paper
- [Paper - DETR-REGO)]
- [Paper
- [Paper - DETRs)]
- [Paper
- [Paper
- [Paper - 21K-Detection)]
- [Paper - IVA-Lab/Obj2Seq)]
- [Paper
- [Paper
- [Paper
- [Paper - DETR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - DETR)]
- [Paper
- [Paper - Adapter)]
- [Paper - Research/Lite-DETR)]
- [Paper
- [Paper
- [Paper
- [Paper - DETR)]
- [Paper
- [Paper
- [Paper
- [Paper - Chen/SQR)]
- [Paper
- [Paper
- [Paper - ai/AlignDet)][[Website](https://liming-ai.github.io/AlignDet/)]
- [Paper - noah/noah-research/tree/master/Focus-DETR)][[MindSpore](https://gitee.com/mindspore/models/tree/master/research/cv/Focus-DETR)]
- [Paper - DETR)]
- [Paper - Laboratory/ASAG)]
- [Paper
- [Paper - Research/Stable-DINO)]
- [Paper
- [Paper
- [Paper - X/Co-DETR)]
- [Paper
- [Paper
- [Paper
- [Paper - DETR)]
- [Paper - detr)]
- [Paper - DETR)]
- [Paper - DETR)]
- [Paper
- [Paper
- [Paper - smart/box-detr)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ai/vidt)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - DETR)]
- [Paper - X/ViT-CoMer)]
- [Paper - DETR)]
-
Multi-Modal Detection
- [Paper - cnn)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/scenic/tree/main/scenic/projects/owl_vit)][[Hugging Face](https://huggingface.co/docs/transformers/model_doc/owlvit)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Research/DQ-DETR)]
- [Paper - vlm)]
- [Paper - 3DET)]
- [Paper
- [Paper
- [Paper - ovod)][[Website](https://www.robots.ox.ac.uk/~vgg/research/mm-ovod/)]
- [Paper
- [Paper - ntu.com/project/contextdet/index.html)]
- [Paper
-
HOI Detection
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - pairwise-transformers/)]
- [Paper
- [Paper - vlkt)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Li-2000/Understanding-Embodied-Reference-with-Touch-Line-Transformer)]
- [Paper
- [Paper
- [Paper - Park/ViPLO)]
- [Paper
- [Paper - xie/CQL)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - rhos.com/ego_pca)]
- [Paper
- [Paper
- [Paper
- [Paper - HOI)]
- [Paper - Ref)][[Website](https://sid2697.github.io/hoi-ref/)]
-
Salient Object Detection
-
Other Detection Tasks
- [Paper
- [Paper - research/omni-detr)]
- [Paper - psi.fr/Papers/TokenCut2022/)]
- [Paper
- [Paper - hui-zz/ReAttentionTransformer)]
- [Paper - psi.fr/Papers/TokenCut2022/)]
- [Paper - DETR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - lab/ALWOD)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - DETR)]
- [Paper
- [Paper
- [Paper
- [Paper - lang/TENET)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - DETR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - PLM)][[Website](https://www.nec-labs.com/~mas/VL-PLM/)]
- [Paper
- [Paper - is-where-by-looking)][[Demo](https://replicate.com/talshaharabany/what-is-where-by-looking)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - benchmark.github.io/)]
- [Paper
- [Paper - Beihang/ALLOW)]
- [Paper
- [Paper
- [Paper - detr)]
- [Paper
- [Paper
- [Paper - Lab/CoDet)]
- [Paper - lala/DAMEX)]
- [Paper
- [Paper - Det)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Seeing)][[Demo](https://huggingface.co/spaces/OpenGVLab/all-seeing)]
- [Paper
- [Paper
- [Papewr
- [Paper - det)]
- [Paper
- [Paper - UPLab/Recognize-Any-Regions)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Lin/WSOVOD)]
- [Paper - mmlab/mmdetection/tree/main/configs/grounding_dino)]
- [Paper - DETR-for-Pedestrian-Detection)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ucsd/LETR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - 31/ViTOL)]
- [Paper
- [Paper
- [Paper - ai-robotics/cow)][[Website](https://cow.cs.columbia.edu/)]
- [Paper
- [Paper
- [Paper - CAIR/RelTransformer)]
- [Paper
- [Paper - research/scenic/tree/main/scenic/projects/univrd)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - DETR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ucsd/TESTR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Plus-Plus)]
- [Paper
- [Paper
- [Paper - k/DPText-DETR)]
- [Paper
- [Paper - Transformer/DeepSolo)]
- [Paper
- [Paper
- [Paper - Transformer/DeepSolo)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - structured-reconstruction.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - CVC/YOLO-World)]
- [Paper - Cao/HASSOD)][[Website](https://hassod-neurips23.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - OWFormer)]
- [Paper
- [Paper - arica/CuVLER)]
- [Paper
- [Paper - Research/T-Rex)][[Website](https://deepdataspace.com/home)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Research/Grounding-DINO-1.5-API)]
- [Paper
- [Paper
-
-
Segmentation
-
Other Segmentation Tasks
- [Paper
- [Paper
- [Paper - IVA-Lab/FastSAM)]
- [Paper
- [Paper - Decoder/Semantic-SAM)]
- [Paper
- [Paper - Decoder/DINOv)]
- [Paper - SAM)]
- [Paper
- [Paper - ntu.github.io/project/edgesam/)]
- [Paper - MIG/RepViT)]
- [Paper
- [Paper - Decoder/FIND)][[Website](https://x-decoder-vl.github.io/)]
- [Paper
- [Paper - anything)]
- [Paper
- [Paper
- [Paper
- [Paper - org/lang-seg)]
- [Paper
- [Paper
- [Paper - research.xyz/)]
- [Paper
- [Paper - ntu.com/project/maskclip/)]
- [Paper
- [Paper
- [Paper
- [Paper - fields/)]
- [Paper - Decoder)][[Website](https://x-decoder-vl.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - pnvr/LD-ZNet)][[Website](https://koutilya-pnvr.github.io/LD-ZNet/)]
- [Paper
- [Paper
- [Paper - benchmark/)]
- [Paper
- [Paper
- [Paper
- [Paper
- CLS
- [Paper - research/LISA)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ssy/CLIP_as_RNN)][[Website](https://torrvision.com/clip_as_rnn/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - rain-song.github.io/Fusioner_webpage/)]
- [Paper - seg)][[Website](https://jeff-liangf.github.io/projects/ovseg/)]
- [Paper
- [Paper
- [Paper - OVIS)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Diffusion)][[Website](https://lipurple.github.io/Grounded_Diffusion/)]
- [Paper - by-captions)][[Website](https://www.mmlab-ntu.com/project/betrayed_caption/index.html)]
- [Paper - Research/OpenSeeD)]
- [Paper
- [Paper - uofa/SegPrompt)]
- [Paper
- [Paper
- [Paper - ucsd/MasQCLIP)][[Website](https://masqclip.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - hipie/HIPIE)][[Website](http://people.eecs.berkeley.edu/~xdwang/projects/HIPIE/)]
- [Paper
- [Paper - clip)]
- [Paper
- [Paper - anything)][[Website](https://segment-anything.com/)]
- [Paper - Decoder/Segment-Everything-Everywhere-All-At-Once)]
- [Paper - HQ)]
- [Paper
- [Paper
- [Paper - CEN/SegmentAnyRGBD)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Cho/Volumetric-Aggregation-Transformer)][[Website](https://seokju-cho.github.io/VAT/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - freiburg.de/)]
- [Paper
- [Paper
- [Paper
- [Paper - SZU/CLIMS)]
- [Paper
- [Paper
- [Paper - PCM)]
- [Paper - cv/TransFGU)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - C/SemFormer)]
- [Paper - ES)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - lan/SmooSeg)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Lab-Berkeley/CRATE)]
- [Paper
- [Paper
- [Paper - Lab-Berkeley/CRATE)][[Website](https://ma-lab-berkeley.github.io/CRATE/)]
- [Paper
- [Paper - research/semivl)]
- [Paper - self-reinforcement)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - FSPNet/index.html)]
- [Paper - SAM)]
- [Paper
- [Paper
- [Paper - AIR-SUN/Cerberus)]
- [Paper - UCSD/IRISformer)]
- [Paper - research/Stratified-Transformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - group/GrowSP)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/Mask-Attention-Free-Transformer)]
- [Paper - ADG/PCSeg)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - CVLAB/CAT-Seg)][[Website](https://ku-cvlab.github.io/CAT-Seg/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Model)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - vg/Sambor)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Liu/TagAlign)][[Website](https://qinying-liu.github.io/Tag-Align/)]
- [Paper - Net/)]
- [Paper
- [Paper - Research/MP-Former)]
- [Paper - Labs/OneFormer)][[Website](https://praeclarumjj3.github.io/oneformer/)]
- [Paper - IIAU/UNINEXT)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/deeplab2)]
- [Paper
- [Paper
- [Paper
- [Paper - RGBD/DFormer)]
- [Paper - deeplab)]
- [Paper
- [Paper
- [Paper
- [Paper - li/Panoptic-SegFormer)]
- [Paper - research/deeplab2)]
- [Paper - PartFormer)]
- [Paper
- [Paper
- [Paper - research/pix2seq)]
- [Paper
- [Paper - research/deeplab2)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - DISCOVER/TOIST)]
- [Paper - auto-labeler)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - research/KPAFlow)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Lift)][[Website](https://www.robots.ox.ac.uk/~vgg/research/contrastive-lift/)]
- [Paper - PJLab/P3Former)]
- [Paper
- [Paper - Free-Scene-Understanding)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - HQ)][[Website](https://lyclyc52.github.io/SANeRF-HQ/)]
- [Paper - Graph)][[Website](https://zju3dv.github.io/sam_graph/)]
- [Paper
- [Paper - Task-Transformer)]
- [Paper
- [Paper
- [Paper
- [Paper - Task-Transformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - seg.github.io/)]
- [Paper - diffattn)]
- [Paper
- [Paper
- [Paper - lps)]
- [Paper - former/)]
- [Paper
- [Paper
- [Paper - vit-features)][[Website](https://dino-vit-features.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - zhao/VPD)][[Website](https://vpd.ivg-research.xyz/)]
- [Paper - Diffusion)][[Website](https://dataset-diffusion.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Visual-Prompt)]
- [Paper - Visual-Prompt)]
- [Paper - guide-seg.github.io/)]
- [Paper
- [Paper
- [Paper - research/Entity)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - AICV/AISFormer)]
- [Paper - Seg)][[Website](https://jianxgao.github.io/C2F-Seg/)]
- [Paper
- [Paper
- [Paper - Unmasking-Anomalies-in-Road-Scene-Segmentation)]
- [Paper
- [Paper - ntu.com/project/ovsam/)]
- [Paper
- [Paper - SAM)]
- [Paper
- [Paper - lab/AllSpark)]
- [Paper
- [Paper
- [Paper - Seg)][[Website](https://lxtgh.github.io/project/omg_seg/)]
- [Paper - SAM/)][[Website](https://xushilin1.github.io/rap_sam/)]
- [Paper - SZU/QA-CLIMS)]
- [Paper - ai/latent-diffusion-segmentation)]
- [Paper - han-lab/efficientvit)]
- [Paper - lv/PTQ4SAM)]
- [Paper
- [Paper
- [Paper
- [Paper - yian.github.io/GraCo/)]
- [Paper
- [Paper - sam)]
- [Paper - Research/Grounded-Segment-Anything)]
- [Paper - Seg)]
- [Paper
- [Paper
- [Paper - OVSeg.pytorch)]
- [Paper
- [Paper - SAM)]
- [Paper - mllm.github.io/)]
- [Paper
- [Paper
- [Paper
- [Paper - web/)]
- [Paper - chaoyang/RefLDMSeg)][[Website](https://wang-chaoyang.github.io/project/refldmseg/)]
- [Paper - python/CPAL)]
- [Paper
- [Paper
- [Paper - sam)][[Website](https://xiaoaoran.github.io/projects/CAT-SAM)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - unsegmentable)]
- [Paper
- [Paper
- [Paper - text-co-decomposition)]
- [Paper
- [Paper - Your-CLIP)]
- [Paper
- [Paper
- [Paper
- [Paper
-
Semantic Segmentation
- [Paper - zvg/SETR)][[Website](https://fudan-zvg.github.io/SETR/)]
- [Paper
- [Paper - for-FSS)]
- [Paper
- [Paper - research/unified-ept)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - guide-dog)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - hao-tian/lawin)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - lab/hila)]
- [Paper - zvg/SETR)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - mps/cts-segmenter)][[Website](https://tue-mps.github.io/CTS/)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
Depth Estimation
- [Paper - isl/DPT)]
- [Paper
- [Paper - Chang-42/ASTransformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Depth-Estimation-Toolbox)]
- [Paper - Depth-Estimation-Toolbox)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - zym/CompletionFormer)][[Website](https://youmi-zym.github.io/projects/CompletionFormer/)]
- [Paper - Mono)]
- [Paper
- [Paper - ml/vidar)][[Website](https://sites.google.com/view/tri-zerodepth)]
- [Paper
- [Paper
- [Paper - Anything)][[Website](https://depth-anything.github.io/)]
- [Paper
-
Object Segmentation
-
-
Survey
- [Paper - MIM)]
- [Paper - Zhou-cv/Awesome-Text-to-Image)]
- [Paper
- [Paper - SOD)]
- [Paper - LLMs-for-Video-Understanding)]
- [Paper
- [Paper - survey/Awesome-Reasoning-Foundation-Models)]
- [Paper - Multimodal-Large-Language-Models)]
- [Paper
- [Paper
- [Paper - Instruction-Tuning)]
- [Paper
- [Paper - Unsupervised-Object-Localization)]
- [Paper - Video-Diffusion-Models)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - CV-Foundational-Models)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Open-Vocabulary)]
- [Paper
- [Paper - Multimodal-Large-Language-Models)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Segmenation-With-Transformer)]
- [Paper
- [Paper
- [Paper
- [Paper - institue/Awesome-Transformer)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - 022-0271-y)][[Github](https://github.com/MenghaoGuo/Awesome-Vision-Attentions)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - 15/Transformer-in-Remote-Sensing)]
- [Paper
- [Paper - vision-transformers)]
- [Paper
- [Paper - transformers-in-medical-imaging)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - image-captioning)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Creation/Awesome-Controllable-T2I-Diffusion-Models)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Mamba-A-Comprehensive-Survey-and-Taxonomy)]
- [Paper - research/General-World-Models-Survey)]
- [Paper - Video-Diffusion-Models)]
- [Paper
- [Paper - MLLM-Hallucination)]
- [Paper - Vision-Mamba-Models)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - sun/SoraReview)]
- [Paper - large-multimodal-agents)]
- [Paper - Parameter-Efficient-Transfer-Learning)]
- [Paper - Multimodal-LLM-Autonomous-Driving)]
- [Paper
- [Paper
- [Paper
- [Paper - Chart-Understanding)]
- [Paper - AHU/Mamba_State_Space_Model_Paper_List)]
- [Paper - Multimodal-LLMs-Survey)]
- [Paper - ai/Awesome-Text-to-Video-Generation)]
- [Paper - LLM-3D)]
-
References
-
Other Video Tasks
- Papers with Code
- Transformer tutorial (Lucas Beyer)
- CS25: Transformers United (Course @ Stanford)
- The Annotated Transformer (Blog)
- Transformer tutorial (Lucas Beyer)
- 3D Vision with Transformers (GitHub)
- Networks Beyond Attention (GitHub)
- Practical Introduction to Transformers (GitHub)
- Awesome Transformer Architecture Search (GitHub)
- Transformer-in-Vision (GitHub)
- Awesome Visual-Transformer (GitHub)
- Awesome Transformer for Vision Resources List (GitHub)
- Transformer-in-Computer-Vision (GitHub)
- Transformer Tutorial in ICASSP 2022)
-
Programming Languages
Categories
Sub Categories
Vision Transformer
602
Other Segmentation Tasks
376
Other Video Tasks
215
Other Detection Tasks
183
Action Recognition
110
Object Detection
91
3D Object Detection
67
Analysis for Transformer
59
Attention-Free
55
Semantic Segmentation
49
Action Detection/Localization
45
HOI Detection
31
Video Object Segmentation
31
Video Instance Segmentation
29
Depth Estimation
28
Multi-Modal Detection
22
Action Prediction/Anticipation
21
Salient Object Detection
16
Replace Conv w/ Attention
16
Object Segmentation
7