Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome_vision_transformer
Implementation of vision transformer. ⭐⭐⭐
https://github.com/murufeng/awesome_vision_transformer
Last synced: 5 days ago
JSON representation
-
Table of Contents
-
论文解读
- 分层级联Transformer!苏黎世联邦提出TransCNN: 显著降低了计算/空间复杂度!
- 超越Swin Transformer!谷歌提出了收敛更快、鲁棒性更强、性能更强的NesT
- 超越Swin Transformer!谷歌提出了收敛更快、鲁棒性更强、性能更强的NesT
- 一文总结微软研究院Transformer霸榜模型三部曲
- 颜水成团队提出VOLO屠榜CV任务,无需额外训练数据,首次在ImageNet 上达到87.1%
- 并非所有图像都值16x16个词--- 清华&华为提出一种自适应序列长度的动态ViT
- 登上更高峰!颜水成、程明明团队开源ViP,引入三维信息编码机制,无需卷积与注意力
- 超越Swin Transformer!谷歌提出了收敛更快、鲁棒性更强、性能更强的NesT
- 颜水成团队提出VOLO屠榜CV任务,无需额外训练数据,首次在ImageNet 上达到87.1%
- 登上更高峰!颜水成、程明明团队开源ViP,引入三维信息编码机制,无需卷积与注意力
- 超越Swin Transformer!谷歌提出了收敛更快、鲁棒性更强、性能更强的NesT
- 并非所有图像都值16x16个词--- 清华&华为提出一种自适应序列长度的动态ViT
- 注意力可以使MLP完全替代CNN吗? 未来有哪些研究方向?
- 超越Swin Transformer!谷歌提出了收敛更快、鲁棒性更强、性能更强的NesT
- 一文总结微软研究院Transformer霸榜模型三部曲
- 颜水成团队提出VOLO屠榜CV任务,无需额外训练数据,首次在ImageNet 上达到87.1%
- 分层级联Transformer!苏黎世联邦提出TransCNN: 显著降低了计算/空间复杂度!
- 登上更高峰!颜水成、程明明团队开源ViP,引入三维信息编码机制,无需卷积与注意力
- 清华鲁继文团队提出DynamicViT:一种高效的动态稀疏化Token的ViT
- 分层级联Transformer!苏黎世联邦提出TransCNN: 显著降低了计算/空间复杂度!
- 登上更高峰!颜水成、程明明团队开源ViP,引入三维信息编码机制,无需卷积与注意力
- 清华鲁继文团队提出DynamicViT:一种高效的动态稀疏化Token的ViT
- 并非所有图像都值16x16个词--- 清华&华为提出一种自适应序列长度的动态ViT
- 注意力可以使MLP完全替代CNN吗? 未来有哪些研究方向?
- 一文总结微软研究院Transformer霸榜模型三部曲
- 颜水成团队提出VOLO屠榜CV任务,无需额外训练数据,首次在ImageNet 上达到87.1%
- 超越Swin Transformer!谷歌提出了收敛更快、鲁棒性更强、性能更强的NesT
- 登上更高峰!颜水成、程明明团队开源ViP,引入三维信息编码机制,无需卷积与注意力
- 清华鲁继文团队提出DynamicViT:一种高效的动态稀疏化Token的ViT
- 并非所有图像都值16x16个词--- 清华&华为提出一种自适应序列长度的动态ViT
- 超越Swin Transformer!谷歌提出了收敛更快、鲁棒性更强、性能更强的NesT
- 超越Swin Transformer!谷歌提出了收敛更快、鲁棒性更强、性能更强的NesT
- 颜水成团队提出VOLO屠榜CV任务,无需额外训练数据,首次在ImageNet 上达到87.1%
- 清华鲁继文团队提出DynamicViT:一种高效的动态稀疏化Token的ViT
- 注意力可以使MLP完全替代CNN吗? 未来有哪些研究方向?
-
DETR变种
- paper
- UP-DETR
- paper
- code
- paper
- Video Swin Transformer
- MViT: Mask Vision Transformer for Facial Expression Recognition in the wild
- CPTR
- UP-DETR
- paper
- code
- paper
- code
- paper
- code
- DA-DETR
- code
- Pointformer
- ViT-FRCNN
- Oriented Object Detection with Transformer
- paper
- code
- COTR
- paper
- code
- CAT
- M2TR
- Transformer Transforms Salient Object Detection and Camouflaged Object Detection
- SSTN
- TSP-FCOS
- ACT
- PED
- paper
- code
- paper
- code
- DA-DETR
- paper
- code
- Pointformer
- ViT-FRCNN
- Oriented Object Detection with Transformer
- paper
- code
- COTR
- paper
- code
- CAT
- SSTN
- TSP-FCOS
- ACT
- PED
- paper
- code
- Fully Transformer Networks for Semantic ImageSegmentation
- TransVOS
- paper
- code
- VisTR
- paper
- code
- https://fudan-zvg.github.io/SETR/
- M2TR
- Transformer Transforms Salient Object Detection and Camouflaged Object Detection
- https://github.com/fudan-zvg/SETR
- https://arxiv.org/abs/2012.15840
- 链接
- code
- 链接
- Code
- paper
- code
- STGT
- Transformer Tracking
- TransCenter
- TrackFormer
- paper
- code
- paper
- code
- TimeSformer
- VidTr
- ViViT
- VTN
- paper
- code
- TransPose
- TFPose
- Lifting Transformer for 3D Human Pose Estimation in Video
- paper
- code
- Vision Transformer Architecture Search
- FaceT
- TransReID
- Transformer Tracking
- TransCenter
- TrackFormer
- paper
- code
- Fully Transformer Networks for Semantic ImageSegmentation
- TransVOS
- paper
- code
- VisTR
- paper
- code
- https://fudan-zvg.github.io/SETR/
- https://github.com/fudan-zvg/SETR
- https://arxiv.org/abs/2012.15840
- 链接
- code
- 链接
- Code
- paper
- code
- STGT
- code
- paper
- paper
- code
- SUNETR
- U-Transformer
- paper
- code
- code
- paper
- code
- paper
- IPT
- paper
- code
- paper
- Chasing Sparsity in Vision Transformers:An End-to-End Exploration
- paper
- code
- Deepfake Video Detection Using Convolutional Vision Transformer
- Training Vision Transformers for Image Retrieval
- CPTR
-
微软Transformer霸榜模型
- code
- paper
- code
- LeViT
- CrossViT
- CeiT
- DeepViT
- paper
- code
- paper
- code
- BoTNet
- paper
- paper
- code
- paper
- code
- paper
- code
- GasHis-Transformer
- paper
- code
- RegionViT
- PVT
- code
- paper
- code
- [paper
- code
- paper
- code
- paper
- code
- paper
- code
- paper
- code
- paper
- code
- ConViT
- paper
- code
- paper
- code
- paper
- DVT
- Scaling Vision Transformers
- Shuffle Transformer
- code
- VTs
- paper
- paper
- paper
- code
- paper
- VTs
- paper
- code
- paper
- code
- LeViT
- CrossViT
- CeiT
- DeepViT
- paper
- code
- paper
- code
- paper
- code
- BoTNet
- paper
- code
- paper
- code
- paper
- code
- paper
- code
- GasHis-Transformer
- paper
- code
- RegionViT
- PVT
- code
- paper
- code
- [paper
- paper
- code
- paper
- code
- paper
- code
- ConViT
- paper
- code
- paper
- code
- paper
- code
- paper
- code
- code
- code
- MViT
- code
- paper
-
Paper(最新,最受关注的)
- CMT: Convolutional Neural Networks Meet Vision Transformers
- Early Convolutions Help Transformers See Better
- https://arxiv.org/abs/2106.13112
- https://github.com/sail-sg/volo
- https://github.com/Andrew-Qibin/VisionPermutator
- CAT: Cross Attention in Vision Transformer
- CoAtNet: Marrying Convolution and Attention for All Data Sizes
- Container: Context Aggregation Network
- Aggregating Nested Transformers
- X-volution: On the unification of convolution and self-attention
- https://arxiv.org/abs/2106.12368
- CMT: Convolutional Neural Networks Meet Vision Transformers
- paper
- paper
- Early Convolutions Help Transformers See Better
- https://arxiv.org/abs/2106.13112
- https://github.com/sail-sg/volo
- https://github.com/Andrew-Qibin/VisionPermutator
- Scaling Vision Transformers
- CAT: Cross Attention in Vision Transformer
- CoAtNet: Marrying Convolution and Attention for All Data Sizes
- Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
- Container: Context Aggregation Network
- Aggregating Nested Transformers
- Dynamic Head: Unifying Object Detection Heads with Attentions
- https://github.com/Andrew-Qibin/VisionPermutator
-
[Papers](#paper)
- RepMLP
- gMLP
- Spatial shift($S^{2}$) MLP V1
- S^2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
- GFNet | MLP领域再发力,清华大学提出将FFT思想用于空间信息交互
- CycleMLP: A MLP-like Architecture for Dense Prediction
- AS-MLP
- RepMLP
- gMLP
- Spatial shift($S^{2}$) MLP V1
- S^2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
- ViP(Vision Permute-MLP)
- GFNet | MLP领域再发力,清华大学提出将FFT思想用于空间信息交互
- CycleMLP: A MLP-like Architecture for Dense Prediction
- AS-MLP
-
Awesome Survey
-
MLP
-
Programming Languages
Categories
Sub Categories
Keywords
transformer
15
pytorch
10
object-detection
5
imagenet
5
segmentation
5
computer-vision
4
vision-transformer
4
deep-learning
4
image-restoration
3
attention
3
image-classification
3
semantic-segmentation
3
transformers
3
ade20k
3
meta-learning
2
few-shot-object-detection
2
image-super-resolution
2
pvtv2
2
pvt
2
detection
2
backbone
2
vision-transformer-architectures
2
twins
2
downstream-tasks
2
convolution
2
vit
2
t2t-transformer
2
vision
2
lv-vit
2
multi-scale
2
image-inpainting
2
high-resolution
2
codebase
2
transformer-models
2
transformer-encoder
2
gan
2
scene-generation
2
image-generation
2
generative-adversarial-networks
2
gans
2
compositionality
2
cityscapes
2
image-deraining
1
image-denoising
1
image-demoireing
1
image-deblurring
1
medical-imaging
1
medical-image-analysis
1
self-supervised-learning
1
neural-architecture-search
1