{"id":21813587,"url":"https://github.com/bobmcdear/attention-in-vision","last_synced_at":"2025-04-13T23:31:55.674Z","repository":{"id":132511380,"uuid":"527593294","full_name":"BobMcDear/attention-in-vision","owner":"BobMcDear","description":"PyTorch implementation of popular attention mechanisms in vision","archived":false,"fork":false,"pushed_at":"2023-04-03T00:04:46.000Z","size":39,"stargazers_count":15,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T13:45:29.537Z","etag":null,"topics":["attention","computer-vision","deep-learning","machine-learning","pytorch"],"latest_commit_sha":null,"homepage":"https://bobmcdear.github.io/posts/attention-in-vision/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BobMcDear.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-08-22T14:10:10.000Z","updated_at":"2024-08-12T23:20:42.000Z","dependencies_parsed_at":"2023-06-07T22:30:44.964Z","dependency_job_id":null,"html_url":"https://github.com/BobMcDear/attention-in-vision","commit_stats":{"total_commits":21,"total_committers":1,"mean_commits":21.0,"dds":0.0,"last_synced_commit":"26954dde3cce4f6d961ebeebbf98c89811646a85"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobMcDear%2Fattention-in-vision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobMcDear%2Fattention-in-vision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobMcDear%2Fattention-in-vision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BobMcDear%2Fattention-in-vision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BobMcDear","download_url":"https://codeload.github.com/BobMcDear/attention-in-vision/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248796330,"owners_count":21162941,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","computer-vision","deep-learning","machine-learning","pytorch"],"created_at":"2024-11-27T14:30:19.155Z","updated_at":"2025-04-13T23:31:55.618Z","avatar_url":"https://github.com/BobMcDear.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Attention in Vision\n\n• \u003cstrong\u003e[Introduction](#introduction)\u003c/strong\u003e\u003cbr\u003e\n• \u003cstrong\u003e[Modules](#modules)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Squeeze-and-Excitation (SE)](#squeeze-and-excitation-se--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Effective Squeeze-and-Excitation (eSE)](#effective-squeeze-and-excitation-ese--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Efficient Channel Attention (ECA)](#efficient-channel-attention-eca--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Convolutional Block Attention Module (CBAM)](#convolutional-block-attention-module-cbam--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Bottleneck Attention Module (BAM)](#bottleneck-attention-module-bam--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Gather-Excite (GE)](#gather-excite-ge--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Selective Kernel (SK)](#selective-kernel-sk--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Split Attention (SplAt)](#split-attention-splat--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Conditionally Parameterized Convolution (CondConv)](#conditionally-parameterized-convolution-condconv--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Dynamic convolution](#dynamic-convolution--paper)\u003c/strong\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp; • \u003cstrong\u003e[Multi-Headed Self-Attention (MHSA)](#multi-headed-self-attention-mhsa--paper)\u003c/strong\u003e\u003cbr\u003e\n\n## Introduction\nPyTorch implementations of popular attention mechanisms in computer vision can be found in this repository. The code aims to be lean, usable out of the box, and efficient, but first and foremost readable and instructive for those seeking to explore attention modules in vision. This repository is also accompanied by a [blog post](https://bobmcdear.github.io/posts/attention-in-vision/) that studies each layer in detail and elaborates on the code, so the two should be considered complementary material.\n## Modules\nBelow is a list of available attention layers, as well as their sample usage.\n\n### Squeeze-and-Excitation (SE) | [Paper](https://arxiv.org/abs/1709.01507)\nSqueeze-and-excitation (SE) is accessed as follows.\n\n```python\nfrom se import SE\n\nse = SE(\n        in_dim=in_dim, # Number of channels SE receives\n        reduction_factor=reduction_factor, # Reduction factor for the excitation module\n        )\n```\n### Effective Squeeze-and-Excitation (eSE) | [Paper](https://arxiv.org/abs/1911.06667)\nEffective squeeze-and-excitation (eSE) is accessed as follows.\n```python\nfrom ese import eSE\n\n\nese = eSE(\n        in_dim=in_dim, # Number of channels eSE receives\n        )\n```\n### Efficient Channel Attention (ECA) | [Paper](https://arxiv.org/abs/1910.03151)\nEfficient channel attention (ECA) is accessed as follows.\n```python\nfrom eca import ECA\n\n\n# ECA with automatically-calculated kernel size\neca = ECA(\n        beta=beta, # beta value used in calculating the kernel size\n        gamma=gamma, # gamma value used in calculating the kernel size\n        in_dim=in_dim, # Number of channels ECA receives, required when kernel size is None\n        )\n# ECA with custom kernel size\neca = ECA(\n        kernel_size=kernel_size, # Neighbourhood size, i.e., kernel size of the 1D convolution\n        )\n```\n\n### Convolutional Block Attention Module (CBAM) | [Paper](https://arxiv.org/abs/1807.06521)\nConvolutional block attention module (CBAM) is accessed as follows.\n```python\nfrom cbam import CBAM\n\n\ncbam = CBAM(\n        in_dim=in_dim, # Number of channels CBAM receives\n        reduction_factor=reduction_factor, # Reduction factor for channel attention\n        kernel_size=kernel_size, # Kernel size for spatial attention\n        )\n```\n\n### Bottleneck Attention Module (BAM) | [Paper](https://arxiv.org/abs/1807.06514)\nBottleneck attention module (BAM) is accessed as follows.\n```python\nfrom bam import BAM\n\n\nbam = BAM(\n        in_dim=in_dim, # Number of channels BAM receives\n        reduction_factor=reduction_factor, # Reduction factor for channel and spatial attention\n        kernel_size=kernel_size, # Dilation for spatial attention\n        )\n```\n\n### Gather-Excite (GE) | [Paper](https://arxiv.org/abs/1810.12348)\nGather-excite (GE) is accessed as follows.\n```python\nfrom bam import BAM\n\n# GE-θ-\nge_no_params = GENoParams(\n        extent=extent, # Extent factor, 0 for a global extent\n        )\n# GE-θ\nge_params = GEParams(\n        in_dim=in_dim, # Number of channels GE receives\n        extent=extent, # Extent factor, 0 for a global extent\n        spatial_dim=spatial_dim, # Spatial dimension GE receives, required for a global extent\n        )\n# GE-θ+\nge_params_plus = GEParamsPlus(\n        in_dim=in_dim, # Number of channels GE receives\n        extent=extent, # Extent factor, 0 for a global extent\n        spatial_dim=spatial_dim, # Spatial dimension GE receives, required for a global extent\n        )\n```\n\n### Selective Kernel (SK) | [Paper](https://arxiv.org/abs/1903.06586)\nSelective kernel module (SK) is accessed as follows.\n```python\nfrom sk import SK\n\nsk = SK(\n        in_dim=in_dim, # Number of channels SK receives\n        out_dim=out_dim, # Desired number of output channels\n        n_branches=n_branches, # Number of branches\n        stride=stride, # Stride of each branch\n        groups=groups, # Number of groups per branch\n        reduction_factor=reduction_factor, # Reduction factor for the MLP calculating attention values\n        )\n```\n\n### Split Attention (SplAt) | [Paper](https://arxiv.org/abs/2004.08955)\nSplit attention (SplAt) is accessed as follows.\n```python\nfrom splat import SplAt\n\nsplat = SplAt(\n        in_dim=in_dim, # Number of channels SplAt receives\n        out_dim=out_dim, # Desired number of output channels\n        kernel_size=kernel_size, # Kernel size of SplAt\n        stride=stride, # Stride of SplAt\n        cardinality=cardinality, # Number of cardinal groups\n        radix=radix, # Radix\n        reduction_factor=reduction_factor, # # Reduction factor for the MLP calculating attention values\n        )\n```\n\n### Conditionally Parameterized Convolution (CondConv) | [Paper](https://arxiv.org/abs/1904.04971)\nConditionally Parameterized Convolution (CondConv) is accessed as follows.\n```python\nfrom condconv import CondConv\n\ncondconv = CondConv(\n        in_dim=in_dim, # Number of channels CondConv receives\n        out_dim=out_dim, # Desired number of output channels\n        kernel_size=kernel_size, # Kernel size of each expert/convolultion\n        stride=stride, # Stride of each expert/convolultion\n        padding=padding, # Padding of each expert/convolution\n        dilation=dilation, # Dilation of each expert/convolution\n        groups=groups, # Number of groups per expert/convolultion\n        bias=bias, # Whether the experts/convolutions should contain bias terms\n        n_experts=n_experts, # Number of experts/convolutions\n        )\n```\n\n### Dynamic Convolution | [Paper](https://arxiv.org/abs/1912.03458)\nDynamic convolution is accessed as follows.\n```python\nfrom dynamic_conv import DynamicConv\n\ndynamic_conv = DynamicConv(\n        in_dim=in_dim, # Number of channels dynamic convolution receives\n        out_dim=out_dim, # Desired number of output channels\n        kernel_size=kernel_size, # Kernel size of each expert/convolultion\n        stride=stride, # Stride of each expert/convolultion\n        padding=padding, # Padding of each expert/convolution\n        dilation=dilation, # Dilation of each expert/convolution\n        groups=groups, # Number of groups per expert/convolultion\n        bias=bias, # Whether the experts/convolutions should contain bias terms\n        n_experts=n_experts, # Number of experts/convolutions\n        reduction_factor=reduction_factor, # Reduction factor in the MLP of the router\n        temperature=temperature, # Temperature coefficient for softmax in the router\n        )\n```\n\n### Multi-Headed Self-Attention (MHSA) | [Paper](https://arxiv.org/abs/1706.03762)\nMulti-headed self-attention is accessed as follows.\n```python\nfrom mhsa import MHSA\n\n# Input should be of shape [batch_size, n_tokens, token_dim]\nmhsa = MHSA(\n        in_dim=in_dim, # Dimension of input, i.e., embedding or token dimension\n        n_heads=n_heads, # Number of heads\n        )\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobmcdear%2Fattention-in-vision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbobmcdear%2Fattention-in-vision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbobmcdear%2Fattention-in-vision/lists"}