Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/MenghaoGuo/Awesome-Vision-Attentions

Summary of related papers on visual attention. Related code will be released based on Jittor gradually.
https://github.com/MenghaoGuo/Awesome-Vision-Attentions

List: Awesome-Vision-Attentions

Last synced: 3 months ago
JSON representation

Summary of related papers on visual attention. Related code will be released based on Jittor gradually.

Awesome Lists containing this project

README

        

# This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey [paper](https://arxiv.org/abs/2111.07624)

## 介绍该论文的中文版博客 [链接](https://mp.weixin.qq.com/s/0iOZ45NTK9qSWJQlcI3_kQ )

## Citation

If it is helpful for your work, please cite this paper:

```
@article{guo2022attention,
title={Attention mechanisms in computer vision: A survey},
author={Guo, Meng-Hao and Xu, Tian-Xing and Liu, Jiang-Jiang and Liu, Zheng-Ning and Jiang, Peng-Tao and Mu, Tai-Jiang and Zhang, Song-Hai and Martin, Ralph R and Cheng, Ming-Ming and Hu, Shi-Min},
journal={Computational Visual Media},
pages={1--38},
year={2022},
publisher={Springer}
}
```

![image](https://github.com/MenghaoGuo/Awesome-Vision-Attentions/blob/main/imgs/fuse.png)

- [Vision-Attention-Papers](#vision-attention-papers)
* [Channel attention](#channel-attention)
* [Spatial attention](#spatial-attention)
* [Temporal attention](#temporal-attention)
* [Branch attention](#branch-attention)
* [Channel \& Spatial attention](#channelspatial-attention)
* [Spatial \& Temporal attention](#spatialtemporal-attention)

* Codes about different attention mechanisms based on [Jittor](https://github.com/Jittor/jittor) are released now
* TODO : collect more related papers. Contributions are welcome.

🔥 (citations > 200)

## Channel attention

* Squeeze-and-Excitation Networks (CVPR 2018) [pdf](https://arxiv.org/pdf/1709.01507), (PAMI2019 version) [pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8701503) 🔥
* Image superresolution using very deep residual channel attention networks (ECCV 2018) [pdf](https://arxiv.org/pdf/1807.02758) 🔥
* Context encoding for semantic segmentation (CVPR 2018) [pdf](https://arxiv.org/pdf/1803.08904) 🔥
* Spatio-temporal channel correlation networks for action classification (ECCV 2018) [pdf](https://arxiv.org/pdf/1806.07754)
* Global second-order pooling convolutional networks (CVPR 2019) [pdf](https://arxiv.org/pdf/1811.12006)
* Srm : A style-based recalibration module for convolutional neural networks (ICCV 2019) [pdf](https://arxiv.org/pdf/1903.10829)
* You look twice: Gaternet for dynamic filter selection in cnns (CVPR 2019) [pdf](https://arxiv.org/pdf/1811.11205)
* Second-order attention network for single image super-resolution (CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Dai_Second-Order_Attention_Network_for_Single_Image_Super-Resolution_CVPR_2019_paper.pdf) 🔥
* DIANet: Dense-and-Implicit Attention Network (AAAI 2020)[pdf](https://arxiv.org/pdf/1905.10671.pdf)
* Spsequencenet: Semantic segmentation network on 4d point clouds (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/html/Shi_SpSequenceNet_Semantic_Segmentation_Network_on_4D_Point_Clouds_CVPR_2020_paper.html)
* Ecanet: Efficient channel attention for deep convolutional neural networks (CVPR 2020) [pdf](https://arxiv.org/pdf/1910.03151) 🔥
* Gated channel transformation for visual recognition (CVPR2020) [pdf](https://arxiv.org/pdf/1909.11519)
* Fcanet: Frequency channel attention networks (ICCV 2021) [pdf](https://arxiv.org/pdf/2012.11879)

## Spatial attention

- Recurrent models of visual attention (NeurIPS 2014), [pdf](https://arxiv.org/pdf/1406.6247) 🔥
- Show, attend and tell: Neural image caption generation with visual attention (PMLR 2015) [pdf](https://arxiv.org/pdf/1502.03044) 🔥
- Draw: A recurrent neural network for image generation (ICML 2015) [pdf](https://arxiv.org/pdf/1502.04623) 🔥
- Spatial transformer networks (NeurIPS 2015) [pdf](https://arxiv.org/pdf/1506.02025) 🔥
- Multiple object recognition with visual attention (ICLR 2015) [pdf](https://arxiv.org/pdf/1412.7755) 🔥
- Action recognition using visual attention (arXiv 2015) [pdf](https://arxiv.org/pdf/1511.04119) 🔥
- Videolstm convolves, attends and flows for action recognition (arXiv 2016) [pdf](https://arxiv.org/pdf/1607.01794) 🔥
- Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition (CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Fu_Look_Closer_to_CVPR_2017_paper.pdf) 🔥
- Learning multi-attention convolutional neural network for fine-grained image recognition (ICCV 2017) [pdf](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zheng_Learning_Multi-Attention_Convolutional_ICCV_2017_paper.pdf) 🔥
- Diversified visual attention networks for fine-grained object classification (TMM 2017) [pdf](https://arxiv.org/pdf/1606.08572) 🔥
- High-Order Attention Models for Visual Question Answering (NeurIPS 2017) [pdf](https://arxiv.org/pdf/1711.04323)
- Attentional pooling for action recognition (NeurIPS 2017) [pdf](https://arxiv.org/pdf/1711.01467) 🔥
- Non-local neural networks (CVPR 2018) [pdf](https://arxiv.org/pdf/1711.07971) 🔥
- Attentional shapecontextnet for point cloud recognition (CVPR 2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Xie_Attentional_ShapeContextNet_for_CVPR_2018_paper.pdf)
- Relation networks for object detection (CVPR 2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Relation_Networks_for_CVPR_2018_paper.pdf) 🔥
- a2-nets: Double attention networks (NeurIPS 2018) [pdf](https://arxiv.org/pdf/1810.11579) 🔥
- Attention-aware compositional network for person re-identification (CVPR 2018) [pdf](https://arxiv.org/pdf/1805.03344) 🔥
- Tell me where to look: Guided attention inference network (CVPR 2018) [pdf](https://arxiv.org/pdf/1802.10171) 🔥
- Pedestrian alignment network for large-scale person re-identification (TCSVT 2018) [pdf](https://arxiv.org/pdf/1707.00408) 🔥
- Learn to pay attention (ICLR 2018) [pdf](https://arxiv.org/pdf/1804.02391.pdf) 🔥
- Attention U-Net: Learning Where to Look for the Pancreas (MIDL 2018) [pdf](https://arxiv.org/pdf/1804.03999.pdf) 🔥
- Psanet: Point-wise spatial attention network for scene parsing (ECCV 2018) [pdf](https://openaccess.thecvf.com/content_ECCV_2018/html/Hengshuang_Zhao_PSANet_Point-wise_Spatial_ECCV_2018_paper.html) 🔥
- Self attention generative adversarial networks (ICML 2019) [pdf](https://arxiv.org/pdf/1805.08318) 🔥
- Attentional pointnet for 3d-object detection in point clouds (CVPRW 2019) [pdf](https://openaccess.thecvf.com/content_CVPRW_2019/papers/WAD/Paigwar_Attentional_PointNet_for_3D-Object_Detection_in_Point_Clouds_CVPRW_2019_paper.pdf)
- Co-occurrent features in semantic segmentation (CVPR 2019) [pdf](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Co-Occurrent_Features_in_Semantic_Segmentation_CVPR_2019_paper.pdf)
- Factor Graph Attention (CVPR 2019) [pdf](https://arxiv.org/pdf/1904.05880)
- Attention augmented convolutional networks (ICCV 2019) [pdf](https://arxiv.org/pdf/1904.09925) 🔥
- Local relation networks for image recognition (ICCV 2019) [pdf](https://arxiv.org/pdf/1904.11491)
- Latentgnn: Learning efficient nonlocal relations for visual recognition(ICML 2019) [pdf](https://arxiv.org/pdf/1905.11634)
- Graph-based global reasoning networks (CVPR 2019) [pdf](https://arxiv.org/pdf/1811.12814) 🔥
- Gcnet: Non-local networks meet squeeze-excitation networks and beyond (ICCVW 2019) [pdf](https://arxiv.org/pdf/1904.11492) 🔥
- Asymmetric non-local neural networks for semantic segmentation (ICCV 2019) [pdf](https://arxiv.org/pdf/1908.07678) 🔥
- Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition (CVPR 2019) [pdf](https://arxiv.org/pdf/1903.06150)
- Second-order non-local attention networks for person re-identification (ICCV 2019) [pdf](https://arxiv.org/pdf/1909.00295) 🔥
- End-to-end comparative attention networks for person re-identification (ICCV 2019) [pdf](https://arxiv.org/pdf/1606.04404) 🔥
- Modeling point clouds with self-attention and gumbel subset sampling (CVPR 2019) [pdf](https://arxiv.org/pdf/1904.03375)
- Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification (arXiv 2019) [pdf](https://arxiv.org/pdf/1801.09927)
- L2g autoencoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention (arXiv 2019) [pdf](https://arxiv.org/pdf/1908.00720)
- Generative pretraining from pixels (PMLR 2020) [pdf](https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf)
- Exploring self-attention for image recognition (CVPR 2020) [pdf](https://arxiv.org/pdf/2004.13621)
- Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self attention (ACM MM 20) [pdf](https://dl.acm.org/doi/pdf/10.1145/3394171.3413829)
- Disentangled non-local neural networks (ECCV 2020) [pdf](https://arxiv.org/pdf/2006.06668)
- Relation-aware global attention for person re-identification (CVPR 2020) [pdf](https://arxiv.org/pdf/1904.02998)
- Segmentation transformer: Object-contextual representations for semantic segmentation (ECCV 2020) [pdf](https://arxiv.org/pdf/1909.11065) 🔥
- Spatial pyramid based graph reasoning for semantic segmentation (CVPR 2020) [pdf](https://arxiv.org/pdf/2003.10211)
- Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation (CVPR 2020) [pdf](https://arxiv.org/pdf/2004.04581.pdf)
- End-to-end object detection with transformers (ECCV 2020) [pdf](https://arxiv.org/pdf/2005.12872) 🔥
- Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling (CVPR 2020) [pdf](https://arxiv.org/pdf/2003.00492)
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers (CVPR 2021) [pdf](https://arxiv.org/pdf/2012.15840)
- An image is worth 16x16 words: Transformers for image recognition at scale (ICLR 2021) [pdf](https://arxiv.org/pdf/2010.11929) 🔥
- Is Attention Better Than Matrix Decomposition? (ICLR 2021) [pdf](https://arxiv.org/abs/2109.04553)
- An empirical study of training selfsupervised vision transformers (CVPR 2021) [pdf](https://arxiv.org/pdf/2104.02057)
- Ocnet: Object context network for scene parsing (IJCV 2021) [pdf](https://arxiv.org/pdf/1809.00916) 🔥
- Point transformer (ICCV 2021) [pdf](https://arxiv.org/pdf/2012.09164)
- PCT: Point Cloud Transformer (CVMJ 2021) [pdf](https://arxiv.org/pdf/2012.09688.pdf)
- Pre-trained image processing transformer (CVPR 2021) [pdf](https://arxiv.org/pdf/2012.00364)
- An empirical study of training self-supervised vision transformers (ICCV 2021) [pdf](https://arxiv.org/pdf/2104.02057)
- Segformer: Simple and efficient design for semantic segmentation with transformers (arxiv 2021) [pdf](https://arxiv.org/pdf/2105.15203)
- Beit: Bert pre-training of image transformers (arxiv 2021) [pdf](https://arxiv.org/pdf/2106.08254)
- Beyond Self-attention: External attention using two linear layers for visual tasks (arxiv 2021) [pdf](https://arxiv.org/pdf/2105.02358)
- Query2label: A simple transformer way to multi-label classification (arxiv 2021) [pdf](https://arxiv.org/pdf/2107.10834)
- Transformer in transformer (arxiv 2021) [pdf](https://arxiv.org/pdf/2103.00112)

## Temporal attention

- Jointly attentive spatial-temporal pooling networks for video-based person re-identification (ICCV 2017) [pdf](https://arxiv.org/pdf/1708.02286.pdf) 🔥
- Video person reidentification with competitive snippet-similarity aggregation and co-attentive snippet embedding (CVPR 2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/1036.pdf)
- Scan: Self-and-collaborative attention network for video person re-identification (TIP 2019) [pdf](https://arxiv.org/pdf/1807.05688.pdf)

## Branch attention

- Training very deep networks (NeurIPS 2015) [pdf](https://arxiv.org/pdf/1507.06228.pdf) 🔥
- Selective kernel networks (CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Selective_Kernel_Networks_CVPR_2019_paper.pdf) 🔥
- CondConv: Conditionally Parameterized Convolutions for Efficient Inference (NeurIPS 2019) [pdf](https://arxiv.org/pdf/1904.04971.pdf)
- Dynamic convolution: Attention over convolution kernels (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Dynamic_Convolution_Attention_Over_Convolution_Kernels_CVPR_2020_paper.pdf)
- ResNest: Split-attention networks (arXiv 2020) [pdf](https://arxiv.org/pdf/2004.08955.pdf) 🔥

## ChannelSpatial attention

- Residual attention network for image classification (CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Wang_Residual_Attention_Network_CVPR_2017_paper.pdf) 🔥
- SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning (CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Chen_SCA-CNN_Spatial_and_CVPR_2017_paper.pdf) 🔥
- CBAM: convolutional block attention module (ECCV 2018) [pdf](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf) 🔥
- Harmonious attention network for person re-identification (CVPR 2018) [pdf](https://arxiv.org/pdf/1802.08122.pdf) 🔥
- Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks (TMI 2018) [pdf](https://arxiv.org/pdf/1808.08127.pdf)
- Mancs: A multi-task attentional network with curriculum sampling for person re-identification (ECCV 2018) [pdf](https://www.ecva.net/papers/eccv_2018/papers_ECCV/papers/Cheng_Wang_Mancs_A_Multi-task_ECCV_2018_paper.pdf) 🔥
- Bam: Bottleneck attention module(BMVC 2018) [pdf](http://bmvc2018.org/contents/papers/0092.pdf) 🔥
- Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition (ACM MM 2018) [pdf](https://arxiv.org/pdf/1808.07659.pdf)
- Learning what and where to attend (ICLR 2019) [pdf](https://openreview.net/pdf?id=BJgLg3R9KQ)
- Dual attention network for scene segmentation (CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Fu_Dual_Attention_Network_for_Scene_Segmentation_CVPR_2019_paper.pdf) 🔥
- Abd-net: Attentive but diverse person re-identification (ICCV 2019) [pdf](https://openaccess.thecvf.com/content_ICCV_2019/papers/Chen_ABD-Net_Attentive_but_Diverse_Person_Re-Identification_ICCV_2019_paper.pdf)
- Mixed high-order attention network for person re-identification (ICCV 2019) [pdf](https://arxiv.org/pdf/1908.05819.pdf)
- Mlcvnet: Multi-level context votenet for 3d object detection (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Xie_MLCVNet_Multi-Level_Context_VoteNet_for_3D_Object_Detection_CVPR_2020_paper.pdf)
- Improving convolutional networks with self-calibrated convolutions (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Liu_Improving_Convolutional_Networks_With_Self-Calibrated_Convolutions_CVPR_2020_paper.pdf)
- Relation-aware global attention for person re-identification (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Relation-Aware_Global_Attention_for_Person_Re-Identification_CVPR_2020_paper.pdf)
- Strip Pooling: Rethinking spatial pooling for scene parsing (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Hou_Strip_Pooling_Rethinking_Spatial_Pooling_for_Scene_Parsing_CVPR_2020_paper.pdf)
- Rotate to attend: Convolutional triplet attention module, (WACV 2021) [pdf](https://arxiv.org/pdf/2010.03045.pdf)
- Coordinate attention for efficient mobile network design (CVPR 2021) [pdf](https://openaccess.thecvf.com/content/CVPR2021/papers/Hou_Coordinate_Attention_for_Efficient_Mobile_Network_Design_CVPR_2021_paper.pdf)
- Simam: A simple, parameter-free attention module for convolutional neural networks (ICML 2021) [pdf](http://proceedings.mlr.press/v139/yang21o/yang21o.pdf)

## SpatialTemporal attention

- An end-to-end spatio-temporal attention model for human action recognition from skeleton data (AAAI 2017) [pdf](https://arxiv.org/pdf/1611.06067.pdf) 🔥
- Diversity regularized spatiotemporal attention for video-based person re-identification (arXiv 2018) 🔥
- Interpretable spatio-temporal attention for video action recognition (ICCVW 2019) [pdf](https://openaccess.thecvf.com/content_ICCVW_2019/papers/HVU/Meng_Interpretable_Spatio-Temporal_Attention_for_Video_Action_Recognition_ICCVW_2019_paper.pdf)
- A Simple Baseline for Audio-Visual Scene-Aware Dialog (CVPR 2019) [pdf](https://arxiv.org/pdf/1904.05876v1.pdf)
- Hierarchical lstms with adaptive attention for visual captioning (TPAMI 2020) [pdf](https://arxiv.org/pdf/1812.11004.pdf)
- Stat: Spatial-temporal attention mechanism for video captioning, (TMM 2020) [pdf](https://ieeexplore.ieee.org/abstract/document/8744407)
- Gta: Global temporal attention for video action understanding (arXiv 2020) [pdf](https://arxiv.org/pdf/2012.08510.pdf)
- Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification (CVPR 2020) [pdf](https://arxiv.org/pdf/2003.12224.pdf)
- Read: Reciprocal attention discriminator for image-to-video re-identification (ECCV 2020) [pdf](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590324.pdf)
- Decoupled spatial-temporal transformer for video inpainting (arXiv 2021) [pdf](https://arxiv.org/pdf/2104.06637.pdf)
- Towards Coherent Visual Storytelling with Ordered Image Attention (arXiv 2021) [pdf](https://arxiv.org/pdf/2108.02180)