https://github.com/MenghaoGuo/Awesome-Vision-Attentions

Summary of related papers on visual attention. Related code will be released based on Jittor gradually.
https://github.com/MenghaoGuo/Awesome-Vision-Attentions

List: Awesome-Vision-Attentions

Last synced: 7 months ago
JSON representation

Summary of related papers on visual attention. Related code will be released based on Jittor gradually.

Host: GitHub
URL: https://github.com/MenghaoGuo/Awesome-Vision-Attentions
Owner: MenghaoGuo
Created: 2021-09-01T08:26:38.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2024-10-20T07:49:48.000Z (about 1 year ago)
Last Synced: 2025-04-19T07:16:11.038Z (7 months ago)
Language: Python
Homepage:
Size: 2.24 MB
Stars: 2,797
Watchers: 31
Forks: 408
Open Issues: 5
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-yolo-object-detection - MenghaoGuo/Awesome-Vision-Attentions - Vision-Attentions?style=social"/> : Summary of related papers on visual attention. Related code will be released based on Jittor gradually. "Attention Mechanisms in Computer Vision: A Survey". (**[arXiv 2021](https://arxiv.org/abs/2111.07624)**) (Object Detection Applications)
ultimate-awesome - Awesome-Vision-Attentions - Summary of related papers on visual attention. Related code will be released based on Jittor gradually. . (Other Lists / TeX Lists)
StarryDivineSky - MenghaoGuo/Awesome-Vision-Attentions - Vision-Attentions 收集了视觉注意力机制相关的论文。项目特色在于整理了大量视觉注意力相关的研究工作。项目计划基于Jittor框架逐步发布相关代码。该项目旨在为研究者提供一个全面的视觉注意力机制资源库。它涵盖了各种类型的视觉注意力模型，并可能包括论文的简要概述和关键技术。通过该项目，用户可以快速了解视觉注意力领域的最新进展。该项目将持续更新，并提供更多有价值的信息和资源。基于Jittor的代码实现将有助于研究人员复现和改进现有的注意力机制。该项目是学习和探索视觉注意力机制的宝贵资源。 (多模态大模型 / 资源传输下载)

README

          # This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey  [paper](https://arxiv.org/abs/2111.07624)

## 介绍该论文的中文版博客 [链接](https://mp.weixin.qq.com/s/0iOZ45NTK9qSWJQlcI3_kQ )

## Citation

If it is helpful for your work, please cite this paper:

```

@article{guo2022attention,

  title={Attention mechanisms in computer vision: A survey},

  author={Guo, Meng-Hao and Xu, Tian-Xing and Liu, Jiang-Jiang and Liu, Zheng-Ning and Jiang, Peng-Tao and Mu, Tai-Jiang and Zhang, Song-Hai and Martin, Ralph R and Cheng, Ming-Ming and Hu, Shi-Min},

  journal={Computational Visual Media},

  pages={1--38},

  year={2022},

  publisher={Springer}

}

```

![image](https://github.com/MenghaoGuo/Awesome-Vision-Attentions/blob/main/imgs/fuse.png)

- [Vision-Attention-Papers](#vision-attention-papers)

  * [Channel attention](#channel-attention)

  * [Spatial attention](#spatial-attention)

  * [Temporal attention](#temporal-attention)

  * [Branch attention](#branch-attention)

  * [Channel \& Spatial attention](#channelspatial-attention)

  * [Spatial \& Temporal attention](#spatialtemporal-attention)

* Codes about different attention mechanisms based on [Jittor](https://github.com/Jittor/jittor) are released now

* TODO :  collect more related papers. Contributions are welcome. 

🔥 (citations > 200)  

## Channel attention

* Squeeze-and-Excitation Networks (CVPR 2018) [pdf](https://arxiv.org/pdf/1709.01507), (PAMI2019 version) [pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8701503)  🔥 

* Image superresolution using very deep residual channel attention networks (ECCV 2018) [pdf](https://arxiv.org/pdf/1807.02758)   🔥 

* Context encoding for semantic segmentation (CVPR 2018) [pdf](https://arxiv.org/pdf/1803.08904)   🔥 

* Spatio-temporal channel correlation networks for action classification (ECCV 2018)  [pdf](https://arxiv.org/pdf/1806.07754)

* Global second-order pooling convolutional networks (CVPR 2019) [pdf](https://arxiv.org/pdf/1811.12006)

* Srm : A style-based recalibration module for convolutional neural networks (ICCV 2019)  [pdf](https://arxiv.org/pdf/1903.10829) 

* You look twice: Gaternet for dynamic filter selection in cnns (CVPR 2019)  [pdf](https://arxiv.org/pdf/1811.11205)

* Second-order attention network for single image super-resolution (CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Dai_Second-Order_Attention_Network_for_Single_Image_Super-Resolution_CVPR_2019_paper.pdf)  🔥 

* DIANet: Dense-and-Implicit Attention Network (AAAI 2020)[pdf](https://arxiv.org/pdf/1905.10671.pdf)

* Spsequencenet: Semantic segmentation network on 4d point clouds (CVPR 2020)  [pdf](https://openaccess.thecvf.com/content_CVPR_2020/html/Shi_SpSequenceNet_Semantic_Segmentation_Network_on_4D_Point_Clouds_CVPR_2020_paper.html)

* Ecanet: Efficient channel attention for deep convolutional neural networks (CVPR 2020) [pdf](https://arxiv.org/pdf/1910.03151)   🔥 

* Gated channel transformation for visual recognition (CVPR2020)  [pdf](https://arxiv.org/pdf/1909.11519) 

* Fcanet: Frequency channel attention networks (ICCV 2021)  [pdf](https://arxiv.org/pdf/2012.11879)

## Spatial attention

- Recurrent models of visual attention (NeurIPS 2014), [pdf](https://arxiv.org/pdf/1406.6247)   🔥 

- Show, attend and tell: Neural image caption generation with visual attention (PMLR 2015) [pdf](https://arxiv.org/pdf/1502.03044)   🔥 

- Draw: A recurrent neural network for image generation (ICML 2015) [pdf](https://arxiv.org/pdf/1502.04623)   🔥 

- Spatial transformer networks (NeurIPS 2015) [pdf](https://arxiv.org/pdf/1506.02025)   🔥 

- Multiple object recognition with visual attention (ICLR 2015) [pdf](https://arxiv.org/pdf/1412.7755)   🔥 

- Action recognition using visual attention (arXiv 2015) [pdf](https://arxiv.org/pdf/1511.04119)   🔥 

- Videolstm convolves, attends and flows for action recognition (arXiv 2016) [pdf](https://arxiv.org/pdf/1607.01794)   🔥 

- Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition (CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Fu_Look_Closer_to_CVPR_2017_paper.pdf)   🔥 

- Learning multi-attention convolutional neural network for fine-grained image recognition (ICCV 2017) [pdf](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zheng_Learning_Multi-Attention_Convolutional_ICCV_2017_paper.pdf)   🔥 

- Diversified visual attention networks for fine-grained object classification (TMM 2017) [pdf](https://arxiv.org/pdf/1606.08572)   🔥 

- High-Order Attention Models for Visual Question Answering (NeurIPS 2017) [pdf](https://arxiv.org/pdf/1711.04323)

- Attentional pooling for action recognition (NeurIPS 2017) [pdf](https://arxiv.org/pdf/1711.01467)   🔥 

- Non-local neural networks (CVPR 2018) [pdf](https://arxiv.org/pdf/1711.07971)   🔥 

- Attentional shapecontextnet for point cloud recognition (CVPR 2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Xie_Attentional_ShapeContextNet_for_CVPR_2018_paper.pdf) 

- Relation networks for object detection (CVPR 2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Relation_Networks_for_CVPR_2018_paper.pdf)   🔥 

- a2-nets: Double attention networks (NeurIPS 2018) [pdf](https://arxiv.org/pdf/1810.11579)   🔥 

- Attention-aware compositional network for person re-identification (CVPR 2018) [pdf](https://arxiv.org/pdf/1805.03344)   🔥 

- Tell me where to look: Guided attention inference network (CVPR 2018) [pdf](https://arxiv.org/pdf/1802.10171)   🔥 

- Pedestrian alignment network for large-scale person re-identification (TCSVT 2018) [pdf](https://arxiv.org/pdf/1707.00408)   🔥 

- Learn to pay attention (ICLR 2018) [pdf](https://arxiv.org/pdf/1804.02391.pdf)   🔥

- Attention U-Net: Learning Where to Look for the Pancreas (MIDL 2018) [pdf](https://arxiv.org/pdf/1804.03999.pdf)   🔥

- Psanet: Point-wise spatial attention network for scene parsing (ECCV 2018) [pdf](https://openaccess.thecvf.com/content_ECCV_2018/html/Hengshuang_Zhao_PSANet_Point-wise_Spatial_ECCV_2018_paper.html)   🔥 

- Self attention generative adversarial networks (ICML 2019) [pdf](https://arxiv.org/pdf/1805.08318)   🔥 

- Attentional pointnet for 3d-object detection in point clouds (CVPRW 2019) [pdf](https://openaccess.thecvf.com/content_CVPRW_2019/papers/WAD/Paigwar_Attentional_PointNet_for_3D-Object_Detection_in_Point_Clouds_CVPRW_2019_paper.pdf)

- Co-occurrent features in semantic segmentation (CVPR 2019) [pdf](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Co-Occurrent_Features_in_Semantic_Segmentation_CVPR_2019_paper.pdf)

- Factor Graph Attention (CVPR 2019) [pdf](https://arxiv.org/pdf/1904.05880)

- Attention augmented convolutional networks (ICCV 2019) [pdf](https://arxiv.org/pdf/1904.09925)   🔥 

- Local relation networks for image recognition (ICCV 2019) [pdf](https://arxiv.org/pdf/1904.11491)

- Latentgnn: Learning efficient nonlocal relations for visual recognition(ICML 2019) [pdf](https://arxiv.org/pdf/1905.11634)

- Graph-based global reasoning networks (CVPR 2019) [pdf](https://arxiv.org/pdf/1811.12814)   🔥 

- Gcnet: Non-local networks meet squeeze-excitation networks and beyond (ICCVW 2019) [pdf](https://arxiv.org/pdf/1904.11492)   🔥 

- Asymmetric non-local neural networks for semantic segmentation (ICCV 2019) [pdf](https://arxiv.org/pdf/1908.07678)   🔥 

- Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition (CVPR 2019) [pdf](https://arxiv.org/pdf/1903.06150) 

- Second-order non-local attention networks for person re-identification (ICCV 2019) [pdf](https://arxiv.org/pdf/1909.00295)   🔥 

- End-to-end comparative attention networks for person re-identification (ICCV 2019) [pdf](https://arxiv.org/pdf/1606.04404)   🔥 

- Modeling point clouds with self-attention and gumbel subset sampling (CVPR 2019) [pdf](https://arxiv.org/pdf/1904.03375)

- Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification (arXiv 2019) [pdf](https://arxiv.org/pdf/1801.09927)

- L2g autoencoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention (arXiv 2019) [pdf](https://arxiv.org/pdf/1908.00720)

- Generative pretraining from pixels (PMLR 2020) [pdf](https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf)

- Exploring self-attention for image recognition (CVPR 2020) [pdf](https://arxiv.org/pdf/2004.13621)

- Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self attention (ACM MM 20) [pdf](https://dl.acm.org/doi/pdf/10.1145/3394171.3413829)

- Disentangled non-local neural networks (ECCV 2020) [pdf](https://arxiv.org/pdf/2006.06668) 

- Relation-aware global attention for person re-identification (CVPR 2020) [pdf](https://arxiv.org/pdf/1904.02998)

- Segmentation transformer: Object-contextual representations for semantic segmentation (ECCV 2020) [pdf](https://arxiv.org/pdf/1909.11065)   🔥 

- Spatial pyramid based graph reasoning for semantic segmentation (CVPR 2020) [pdf](https://arxiv.org/pdf/2003.10211)

- Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation (CVPR 2020) [pdf](https://arxiv.org/pdf/2004.04581.pdf)

- End-to-end object detection with transformers (ECCV 2020) [pdf](https://arxiv.org/pdf/2005.12872)   🔥 

- Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling (CVPR 2020) [pdf](https://arxiv.org/pdf/2003.00492)

- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers (CVPR 2021) [pdf](https://arxiv.org/pdf/2012.15840)

- An image is worth 16x16 words: Transformers for image recognition at scale (ICLR 2021) [pdf](https://arxiv.org/pdf/2010.11929)   🔥 

- Is Attention Better Than Matrix Decomposition? (ICLR 2021) [pdf](https://arxiv.org/abs/2109.04553) 

- An empirical study of training selfsupervised vision transformers (CVPR 2021) [pdf](https://arxiv.org/pdf/2104.02057)

- Ocnet: Object context network for scene parsing (IJCV 2021) [pdf](https://arxiv.org/pdf/1809.00916)   🔥 

- Point transformer (ICCV 2021) [pdf](https://arxiv.org/pdf/2012.09164)

- PCT: Point Cloud Transformer (CVMJ 2021) [pdf](https://arxiv.org/pdf/2012.09688.pdf)

- Pre-trained image processing transformer (CVPR 2021) [pdf](https://arxiv.org/pdf/2012.00364)

- An empirical study of training self-supervised vision transformers (ICCV 2021) [pdf](https://arxiv.org/pdf/2104.02057)

- Segformer: Simple and efficient design for semantic segmentation with transformers (arxiv 2021) [pdf](https://arxiv.org/pdf/2105.15203)

- Beit: Bert pre-training of image transformers (arxiv 2021) [pdf](https://arxiv.org/pdf/2106.08254)

- Beyond Self-attention: External attention using two linear layers for visual tasks (arxiv 2021) [pdf](https://arxiv.org/pdf/2105.02358)

- Query2label: A simple transformer way to multi-label classification (arxiv 2021) [pdf](https://arxiv.org/pdf/2107.10834)

- Transformer in transformer (arxiv 2021) [pdf](https://arxiv.org/pdf/2103.00112)

## Temporal attention

- Jointly attentive spatial-temporal pooling networks for video-based person re-identification (ICCV 2017) [pdf](https://arxiv.org/pdf/1708.02286.pdf) 🔥

- Video person reidentification with competitive snippet-similarity aggregation and co-attentive snippet embedding (CVPR 2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/1036.pdf)

- Scan: Self-and-collaborative attention network for video person re-identification (TIP 2019) [pdf](https://arxiv.org/pdf/1807.05688.pdf) 

## Branch attention

- Training very deep networks (NeurIPS 2015) [pdf](https://arxiv.org/pdf/1507.06228.pdf) 🔥

- Selective kernel networks (CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Selective_Kernel_Networks_CVPR_2019_paper.pdf) 🔥

- CondConv: Conditionally Parameterized Convolutions for Efficient Inference (NeurIPS 2019) [pdf](https://arxiv.org/pdf/1904.04971.pdf)

- Dynamic convolution: Attention over convolution kernels (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Dynamic_Convolution_Attention_Over_Convolution_Kernels_CVPR_2020_paper.pdf)

- ResNest: Split-attention networks (arXiv 2020) [pdf](https://arxiv.org/pdf/2004.08955.pdf) 🔥

## Channel+Spatial attention

- Residual attention network for image classification (CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Wang_Residual_Attention_Network_CVPR_2017_paper.pdf) 🔥

- SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning (CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Chen_SCA-CNN_Spatial_and_CVPR_2017_paper.pdf) 🔥

- CBAM: convolutional block attention module (ECCV 2018) [pdf](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf)  🔥

- Harmonious attention network for person re-identification (CVPR 2018) [pdf](https://arxiv.org/pdf/1802.08122.pdf) 🔥

- Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks (TMI 2018) [pdf](https://arxiv.org/pdf/1808.08127.pdf)

- Mancs: A multi-task attentional network with curriculum sampling for person re-identification (ECCV 2018) [pdf](https://www.ecva.net/papers/eccv_2018/papers_ECCV/papers/Cheng_Wang_Mancs_A_Multi-task_ECCV_2018_paper.pdf) 🔥

- Bam: Bottleneck attention module(BMVC 2018) [pdf](http://bmvc2018.org/contents/papers/0092.pdf) 🔥

- Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition (ACM MM 2018) [pdf](https://arxiv.org/pdf/1808.07659.pdf)  

- Learning what and where to attend (ICLR 2019) [pdf](https://openreview.net/pdf?id=BJgLg3R9KQ)

- Dual attention network for scene segmentation (CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Fu_Dual_Attention_Network_for_Scene_Segmentation_CVPR_2019_paper.pdf) 🔥

- Abd-net: Attentive but diverse person re-identification (ICCV 2019) [pdf](https://openaccess.thecvf.com/content_ICCV_2019/papers/Chen_ABD-Net_Attentive_but_Diverse_Person_Re-Identification_ICCV_2019_paper.pdf)

- Mixed high-order attention network for person re-identification (ICCV 2019) [pdf](https://arxiv.org/pdf/1908.05819.pdf)

- Mlcvnet: Multi-level context votenet for 3d object detection (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Xie_MLCVNet_Multi-Level_Context_VoteNet_for_3D_Object_Detection_CVPR_2020_paper.pdf)

- Improving convolutional networks with self-calibrated convolutions (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Liu_Improving_Convolutional_Networks_With_Self-Calibrated_Convolutions_CVPR_2020_paper.pdf)

- Relation-aware global attention for person re-identification (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Relation-Aware_Global_Attention_for_Person_Re-Identification_CVPR_2020_paper.pdf)

- Strip Pooling: Rethinking spatial pooling for scene parsing (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Hou_Strip_Pooling_Rethinking_Spatial_Pooling_for_Scene_Parsing_CVPR_2020_paper.pdf)

- Rotate to attend: Convolutional triplet attention module, (WACV 2021) [pdf](https://arxiv.org/pdf/2010.03045.pdf)

- Coordinate attention for efficient mobile network design (CVPR 2021) [pdf](https://openaccess.thecvf.com/content/CVPR2021/papers/Hou_Coordinate_Attention_for_Efficient_Mobile_Network_Design_CVPR_2021_paper.pdf)

- Simam: A simple, parameter-free attention module for convolutional neural networks (ICML 2021) [pdf](http://proceedings.mlr.press/v139/yang21o/yang21o.pdf)

## Spatial+Temporal attention

- An end-to-end spatio-temporal attention model for human action recognition from skeleton data (AAAI 2017) [pdf](https://arxiv.org/pdf/1611.06067.pdf) 🔥

- Diversity regularized spatiotemporal attention for video-based person re-identification (arXiv 2018) 🔥

- Interpretable spatio-temporal attention for video action recognition (ICCVW 2019) [pdf](https://openaccess.thecvf.com/content_ICCVW_2019/papers/HVU/Meng_Interpretable_Spatio-Temporal_Attention_for_Video_Action_Recognition_ICCVW_2019_paper.pdf)

- A Simple Baseline for Audio-Visual Scene-Aware Dialog (CVPR 2019) [pdf](https://arxiv.org/pdf/1904.05876v1.pdf)

- Hierarchical lstms with adaptive attention for visual captioning (TPAMI 2020) [pdf](https://arxiv.org/pdf/1812.11004.pdf)

- Stat: Spatial-temporal attention mechanism for video captioning, (TMM 2020) [pdf](https://ieeexplore.ieee.org/abstract/document/8744407)

- Gta: Global temporal attention for video action understanding (arXiv 2020) [pdf](https://arxiv.org/pdf/2012.08510.pdf)

- Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification (CVPR 2020) [pdf](https://arxiv.org/pdf/2003.12224.pdf)

- Read: Reciprocal attention discriminator for image-to-video re-identification (ECCV 2020) [pdf](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590324.pdf)

- Decoupled spatial-temporal transformer for video inpainting (arXiv 2021) [pdf](https://arxiv.org/pdf/2104.06637.pdf)

- Towards Coherent Visual Storytelling with Ordered Image Attention (arXiv 2021) [pdf](https://arxiv.org/pdf/2108.02180)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/MenghaoGuo/Awesome-Vision-Attentions

Awesome Lists containing this project

README