Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lijiaman/awesome-transformer-for-vision


https://github.com/lijiaman/awesome-transformer-for-vision

List: awesome-transformer-for-vision

Last synced: 11 days ago
JSON representation

Awesome Lists containing this project

README

        

# Awesome Transformer for Vision Resources List [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

> A curated list of papers & resources linked to Transformer-based research mainly for vision and graphics tasks.

## Contents

- [Papers](#papers)
- [Original Paper](#papers-ori)
- [2D Vision Tasks](#papers-2d)
- [Classification](#papers-classification)
- [Detection](#papers-detection)
- [Segmentation](#papers-segmentation)
- [Tracking](#papers-tracking)
- [Image Synthesis](#papers-image-synthesis)
- [Action Understanding](#papers-action)
- [3D Vision Tasks](#papers-3d)
- [Point Cloud Processing](#papers-point-cloud)
- [Motion Modeling](#papers-motion)
- [Human Body Modeling](#papers-body)
- [Others](#papers-others)
- [Music Modeling](#papers-music)

- [Contributing](#contributing)


# Papers


## Original Paper

[Attention Is All You Need](https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf). Ashish Vaswani*, Noam Shazeer*, Niki Parmar*, Jakob Uszkoreit*, Llion Jones*, Aidan N. Gomez*, Łukasz Kaiser*, Illia Polosukhin*. NIPs 2017.


## 2D Vision Tasks


### Classification

[AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE](https://arxiv.org/pdf/2010.11929.pdf). Alexey Dosovitskiy∗, Lucas Beyer∗, Alexander Kolesnikov∗, Dirk Weissenborn∗, Xiaohua Zhai∗, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. Arxiv 2020.


### Detection

[Fast Convergence of DETR with Spatially Modulated Co-Attention](https://arxiv.org/pdf/2101.07448.pdf). Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li. Arxiv 2021.

[End-to-End Object Detection with Adaptive Clustering Transformer](https://arxiv.org/pdf/2011.09315.pdf). Minghang Zheng, Peng Gao, Xiaogang Wang, Hongsheng Li, Hao Dong. Arxiv 2020.

[Toward Transformer-Based Object Detection](https://arxiv.org/pdf/2012.09958.pdf). Josh Beal*, Eric Kim*, Eric Tzeng, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk. Arxiv 2020.

[Rethinking Transformer-based Set Prediction for Object Detection](https://arxiv.org/pdf/2011.10881.pdf). Zhiqing Sun*, Shengcao Cao*, Yiming Yang, Kris Kitani. Arxiv 2020.

[UP-DETR: Unsupervised Pre-training for Object Detection with Transformers](https://arxiv.org/pdf/2011.09094.pdf). Zhigang Dai1, Bolun Cai, Yugeng Lin, Junying Chen. Arxiv 2020.

[DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION](https://arxiv.org/pdf/2010.04159.pdf). Xizhou Zhu∗, Weijie Su∗, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai. Arxiv 2020.

[End-to-End Object Detection with Transformers](https://arxiv.org/pdf/2005.12872.pdf). Nicolas Carion*, Francisco Massa*, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. ECCV 2020.


### Segmentation

[Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers](https://arxiv.org/pdf/2012.15840.pdf). Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H.S. Torr, Li Zhang. Arxiv 2020.

[End-to-End Video Instance Segmentation with Transformers](https://arxiv.org/pdf/2011.14503.pdf). Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia. Arxiv 2020.


### Tracking

[TransTrack: Multiple-Object Tracking with Transformer](https://arxiv.org/pdf/2012.15460.pdf). Peize Sun, Yi Jiang, Rufeng Zhang, Enze Xie, Jinkun Cao, Xinting Hu, Tao Kong, Zehuan Yuan, Changhu Wang, Ping Luo. Arxiv 2020.


### Image Synthesis

[Taming Transformers for High-Resolution Image Synthesis](https://arxiv.org/pdf/2012.09841.pdf). Patrick Esser*, Robin Rombach*, Bjorn Ommer. Arxiv 2020.


### Action Understanding

[Video Action Transformer Network](https://arxiv.org/pdf/1812.02707.pdf). Rohit Girdhar, Joao Carreira, Carl Doersch, Andrew Zisserman. CVPR 2019.


## 3D Vision Tasks


### Point Cloud Processing

[PCT: Point Cloud Transformer](https://arxiv.org/pdf/2012.09688.pdf). Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu. Arxiv 2020.

[Point Transformer](https://arxiv.org/pdf/2012.09164.pdf). Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun. Arxiv 2020.


### Motion Modeling

[Learning to Generate Diverse Dance Motions with Transformer](https://arxiv.org/pdf/2008.08171.pdf). Jiaman Li, Yihang Yin, Hang Chu, Yi Zhou, Tingwu Wang, Sanja Fidler, Hao Li. Arxiv 2020.

[A Spatio-temporal Transformer for 3D Human Motion Prediction](https://arxiv.org/pdf/2004.08692.pdf). Emre Aksan*, Peng Cao*, Manuel Kaufmann, Otmar Hilliges. Arxiv 2020.


### Human Body Modeling

[End-to-End Human Pose and Mesh Reconstruction with Transformers](https://arxiv.org/pdf/2012.09760.pdf). Kevin Lin, Lijuan Wang, Zicheng Liu. Arxiv 2020.


## Others


### Music Modeling

[MUSIC TRANSFORMER: GENERATING MUSIC WITH LONG-TERM STRUCTURE](https://arxiv.org/pdf/1809.04281.pdf). Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck. Arxiv 2018.

# Contributing
Please see [CONTRIBUTING](https://github.com/openMVG/awesome_3DReconstruction_list/blob/master/contributing.md) for details.