An open API service indexing awesome lists of open source software.

https://github.com/lahoud/3d-vision-transformers

A list of 3D computer vision papers with Transformers
https://github.com/lahoud/3d-vision-transformers

Last synced: about 1 month ago
JSON representation

A list of 3D computer vision papers with Transformers

Awesome Lists containing this project

README

        

# This repo supplements our [3D Vision with Transformers Survey](https://arxiv.org/abs/2208.04309)
Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

This repo includes all the 3D computer vision papers with Transformers which are presented in our [paper](https://arxiv.org/abs/2208.04309), and we aim to frequently update the latest relevant papers.



#### Content
- [Object Classification](#object-classification)

- [3D Object Detection](#3d-object-detection)

- [3D Segmentation](#3d-segmentation)

- [Complete Scenes Segmentation](#complete-scenes-segmentation)

- [Point Cloud Video Segmentation](#point-cloud-video-segmentation)

- [Medical Imaging Segmentation](#medical-imaging-segmentation)

- [3D Point Cloud Completion](#3d-point-cloud-completion)

- [3D Pose Estimation](#3d-pose-estimation)

- [Other Tasks](#other-tasks)

- [3D Tracking](#3d-tracking)

- [3D Motion Prediction](#3d-motion-prediction)

- [3D Reconstruction](#3d-reconstruction)

- [Point Cloud Registration](#point-cloud-registration)

## Object Classification
Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning [**RS 2022**][[PDF](https://www.mdpi.com/2072-4292/14/7/1563/pdf?version=1648109597 )]

Masked Autoencoders for Point Cloud Self-supervised Learning [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2203.06604)][[Code](https://github.com/Pang-Yatian/Point-MAE )]

3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [**T-ITS 2022**][[PDF](https://arxiv.org/pdf/2203.00828 )]

LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [**T-ITS 2022**][[PDF](https://ieeexplore.ieee.org/document/9700748/ )]

Sewer defect detection from 3D point clouds using a transformer-based deep learning model [**Automation in Construction 2022**][[PDF](https://www.mdpi.com/1424-8220/22/12/4517/pdf?version=1655277701 )]

3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2112.04863)][[Code](https://github.com/crane-papercode/3dmedpt )]

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Yu_Point-BERT_Pre-Training_3D_Point_Cloud_Transformers_With_Masked_Point_Modeling_CVPR_2022_paper.pdf)][[Code](https://github.com/lulutang0608/Point-BERT )]

CpT: Convolutional Point Transformer for 3D Point Cloud Processing [**ACCVW 2022**][[PDF](https://arxiv.org/pdf/2111.10866 )]

PatchFormer: An Efficient Point Transformer With Patch Attention [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_PatchFormer_An_Efficient_Point_Transformer_With_Patch_Attention_CVPR_2022_paper.pdf)]

PVT: Point-Voxel Transformer for Point Cloud Learning [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2108.06076.pdf)][[Code](https://github.com/HaochengWan/PVT )]

Adaptive Wavelet Transformer Network for 3D Shape Representation Learning [**ICLR 2021**][[PDF](https://openreview.net/pdf?id=5MLb3cLCJY )]

Point cloud learning with transformer [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2104.13636 )]

3crossnet: Cross-level cross-scale cross-attention network for point cloud representation [**RA-L 2022**][[PDF](https://arxiv.org/pdf/2104.13053 )]

Dual Transformer for Point Cloud Analysis [**IEEE Trans Multimedia**][[PDF](https://arxiv.org/pdf/2104.13044 )]

Centroid transformers: Learning to abstract with attention [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2102.08606 )]

PCT: Point cloud transformer [**CVPR 2019**][[PDF](http://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_Modeling_Point_Clouds_With_Self-Attention_and_Gumbel_Subset_Sampling_CVPR_2019_paper.pdf)][[Code](https://github.com/MenghaoGuo/PCT )]

Point Transformer [**ICCV 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Point_Transformer_ICCV_2021_paper.pdf)][[Code](https://github.com/POSTECH-CVLab/point-transformer )]

Point Transformer [**IEEE Access 2021**][[PDF](https://arxiv.org/pdf/2011.00931)][[Code](https://github.com/engelnico/point-transformer )]

Modeling point clouds with self-attention and gumbel subset sampling [**CVPR 2019**][[PDF](https://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_Modeling_Point_Clouds_With_Self-Attention_and_Gumbel_Subset_Sampling_CVPR_2019_paper.pdf )]

Attentional shapecontextnet for point cloud recognition [**CVPR 2018**][[PDF](http://openaccess.thecvf.com/content_cvpr_2018/papers/Xie_Attentional_ShapeContextNet_for_CVPR_2018_paper.pdf)][[Code](https://github.com/umyta/A-SCN )]

## 3D Object Detection

Bridged Transformer for Vision and Point Cloud 3D Object Detection [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Bridged_Transformer_for_Vision_and_Point_Cloud_3D_Object_Detection_CVPR_2022_paper.pdf )]

Multimodal Token Fusion for Vision Transformers [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Multimodal_Token_Fusion_for_Vision_Transformers_CVPR_2022_paper.pdf )][[Code](https://github.com/yikaiw/TokenFusion )]

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_CAT-Det_Contrastively_Augmented_Transformer_for_Multi-Modal_3D_Object_Detection_CVPR_2022_paper.pdf )]

Focused Decoding Enables 3D Anatomical Detection by Transformers [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2207.10774.pdf)][[Code](https://github.com/bwittmann/transoar )]

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2203.13310)][[Code](https://github.com/ZrrSkywalker/MonoDETR )]

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Bai_TransFusion_Robust_LiDAR-Camera_Fusion_for_3D_Object_Detection_With_Transformers_CVPR_2022_paper.pdf)][[Code](https://github.com/XuyangBai/TransFusion )]

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/He_Voxel_Set_Transformer_A_Set-to-Set_Approach_to_3D_Object_Detection_CVPR_2022_paper.pdf)][[Code](https://github.com/skyhehe123/VoxSeT )]

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Deng_VISTA_Boosting_3D_Object_Detection_via_Dual_Cross-VIew_SpaTial_Attention_CVPR_2022_paper.pdf)][[Code](https://github.com/Gorilla-Lab-SCUT/VISTA )]

Point Density-Aware Voxels for LiDAR 3D Object Detection [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Hu_Point_Density-Aware_Voxels_for_LiDAR_3D_Object_Detection_CVPR_2022_paper.pdf)][[Code](https://github.com/TRAILab/PDV )]

PETR: Position Embedding Transformation for Multi-View 3D Object Detection [**ECCV 2022**][[PDF](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136870523.pdf)][[Code](https://github.com/megvii-research/PETR )]

ARM3D: Attention-based relation module for indoor 3D object detection [**Comput. Vis.**][[PDF](https://link.springer.com/content/pdf/10.1007/s41095-021-0252-6.pdf)][[Code](https://github.com/lanlan96/arm3d )]

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Huang_MonoDTR_Monocular_3D_Object_Detection_With_Depth-Aware_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/KuanchihHuang/MonoDTR )]

Attention-based Proposals Refinement for 3D Object Detection [**IV 2022**][[PDF](https://arxiv.org/pdf/2201.07070)][[Code](https://github.com/quan-dao/APRO3D-Net )]

Embracing Single Stride 3D Object Detector with Sparse Transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Fan_Embracing_Single_Stride_3D_Object_Detector_With_Sparse_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/tusen-ai/SST )]

Fast Point Transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Park_Fast_Point_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/POSTECH-CVLab/FastPointTransformer )]

BoxeR: Box-Attention for 2D and 3D Transformers [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Nguyen_BoxeR_Box-Attention_for_2D_and_3D_Transformers_CVPR_2022_paper.pdf)][[Code](https://github.com/kienduynguyen/BoxeR )]

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [**CoRL 2022**][[PDF](https://proceedings.mlr.press/v164/wang22b/wang22b.pdf)][[Code](https://github.com/WangYueFt/detr3d )]

An End-to-End Transformer Model for 3D Object Detection [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Misra_An_End-to-End_Transformer_Model_for_3D_Object_Detection_ICCV_2021_paper.pdf)][[Code](https://github.com/facebookresearch/3detr )]

Voxel Transformer for 3D Object Detection [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Mao_Voxel_Transformer_for_3D_Object_Detection_ICCV_2021_paper.pdf)][[Code](https://github.com/PointsCoder/VOTR )]

Improving 3D Object Detection with Channel-wise Transformer [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Sheng_Improving_3D_Object_Detection_With_Channel-Wise_Transformer_ICCV_2021_paper.pdf)][[Code](https://github.com/hlsheng1/CT3D )]

M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [**WACV 2022**][[PDF](https://openaccess.thecvf.com/content/WACV2022/papers/Guan_M3DETR_Multi-Representation_Multi-Scale_Mutual-Relation_3D_Object_Detection_With_Transformers_WACV_2022_paper.pdf)][[Code](https://github.com/rayguan97/M3DeTR )]

Group-Free 3D Object Detection via Transformers [**ICCV 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021/papers/Liu_Group-Free_3D_Object_Detection_via_Transformers_ICCV_2021_paper.pdf)][[Code](https://github.com/zeliu98/Group-Free-3D )]

SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [**ICCVW 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021W/AVVision/papers/Bhattacharyya_SA-Det3D_Self-Attention_Based_Context-Aware_3D_Object_Detection_ICCVW_2021_paper.pdf)][[Code](https://github.com/AutoVision-cloud/SA-Det3D )]

3D object detection with pointformer [**CVPR 2021**][[PDF](http://openaccess.thecvf.com/content/CVPR2021/papers/Pan_3D_Object_Detection_With_Pointformer_CVPR_2021_paper.pdf)][[Code](https://github.com/Vladimir2506/Pointformer )]

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving [**IEEE Trans. Circuits Syst.**][[PDF](https://arxiv.org/pdf/2011.13628 )]

MLCVNet: Multi-Level Context VoteNet for 3D Object Detection [**CVPR 2020**][[PDF](https://openaccess.thecvf.com/content_CVPR_2020/papers/Xie_MLCVNet_Multi-Level_Context_VoteNet_for_3D_Object_Detection_CVPR_2020_paper.pdf)][[Code](https://github.com/NUAAXQ/MLCVNet )]

LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention [**CVPR 2020**][[PDF](http://openaccess.thecvf.com/content_CVPR_2020/papers/Yin_LiDAR-Based_Online_3D_Video_Object_Detection_With_Graph-Based_Message_Passing_CVPR_2020_paper.pdf)][[Code](https://github.com/yinjunbo/3DVID )]

SCANet: Spatial-channel attention network for 3d object detection [**ICASSP 2019**][[PDF](https://ieeexplore.ieee.org/document/8682746)][[Code](https://github.com/zhouruqin/SCANet )]

## 3D Segmentation
For part segmentation, check [Object Classification](#object-classification)

#### Complete Scenes Segmentation

Stratified Transformer for 3D Point Cloud Segmentation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Lai_Stratified_Transformer_for_3D_Point_Cloud_Segmentation_CVPR_2022_paper.pdf)][[Code](https://github.com/dvlab-research/Stratified-Transformer )]

Multimodal Token Fusion for Vision Transformers [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Multimodal_Token_Fusion_for_Vision_Transformers_CVPR_2022_paper.pdf )][[Code](https://github.com/yikaiw/TokenFusion )]

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation [**AAAI 2022**][[PDF](https://www.aaai.org/AAAI22Papers/AAAI-5976.XuS.pdf )]

Fast Point Transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Park_Fast_Point_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/POSTECH-CVLab/FastPointTransformer )]

Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Thyagharajan_Segment-Fusion_Hierarchical_Context_Fusion_for_Robust_3D_Semantic_Segmentation_CVPR_2022_paper.pdf )]

#### Point Cloud Video Segmentation

Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling [**TPAMI 2022**][[PDF](https://ieeexplore.ieee.org/abstract/document/9740525 )]

Spatial-Temporal Transformer for 3D Point Cloud Sequences [**WACV 2022**][[PDF](https://openaccess.thecvf.com/content/WACV2022/papers/Wei_Spatial-Temporal_Transformer_for_3D_Point_Cloud_Sequences_WACV_2022_paper.pdf )]

Point 4D transformer networks for spatio-temporal modeling in point cloud videos [**CVPR 2021**][[PDF](http://openaccess.thecvf.com/content/CVPR2021/papers/Fan_Point_4D_Transformer_Networks_for_Spatio-Temporal_Modeling_in_Point_Cloud_CVPR_2021_paper.pdf)][[Code](https://github.com/hehefan/P4Transformer )]

#### Medical Imaging Segmentation
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images [**MICCAI BrainLes 2022**][[PDF](https://arxiv.org/pdf/2201.01266)][[Code](https://github.com/Project-MONAI/research-contributions/tree/master/SwinUNETR/BRATS21 )]

D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation [**Neural Comput Appl 2022**][[PDF](https://arxiv.org/pdf/2201.00462 )]

A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation [**MICCAI 2022**][[PDF](https://arxiv.org/pdf/2111.13300)][[Code](https://github.com/himashi92/VT-UNet )]

T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging [**ICCV 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_T-AutoML_Automated_Machine_Learning_for_Lesion_Segmentation_Using_Transformers_in_ICCV_2021_paper.pdf )]

After-unet: Axial fusion transformer unet for medical image segmentation [**WACV 2022**][[PDF](https://openaccess.thecvf.com/content/WACV2022/papers/Yan_AFTer-UNet_Axial_Fusion_Transformer_UNet_for_Medical_Image_Segmentation_WACV_2022_paper.pdf )]

Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation [**MICCAI BrainLes 2022**][[PDF](https://arxiv.org/pdf/2109.12271 )]

nnformer: Interleaved transformer for volumetric segmentation [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2109.03201)][[Code](https://github.com/282857341/nnFormer )]

UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [**MICCAI 2022**][[PDF](https://arxiv.org/abs/2107.00781)][[Code](https://github.com/yhygao/UTNet )]

Medical image segmentation using squeezeand-expansion transformers [**IJCAI 2021**][[PDF](https://arxiv.org/pdf/2105.09511)][[Code](https://github.com/askerlee/segtran )]

Unetr: Transformers for 3d medical image segmentation [**WACV 2022**][[PDF](https://openaccess.thecvf.com/content/WACV2022/papers/Hatamizadeh_UNETR_Transformers_for_3D_Medical_Image_Segmentation_WACV_2022_paper.pdf)][[Code](https://github.com/Project-MONAI/research-contributions/tree/master/UNETR/BTCV )]

Transbts: Multimodal brain tumor segmentation using transformer [**MICCAI 2021**][[PDF](https://arxiv.org/pdf/2103.04430)][[Code](https://github.com/Wenxuan-1119/TransBTS )]

Spectr: Spectral transformer for hyperspectral pathology image segmentation [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2103.03604)][[Code](https://github.com/hfut-xc-yun/SpecTr )]

Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [**MICCAI 2021**][[PDF](https://arxiv.org/pdf/2103.03024)][[Code](https://github.com/YtongXie/CoTr )]

Convolution-free medical image segmentation using transformers [**MICCAI 2021**][[PDF](https://arxiv.org/pdf/2102.13645 )]

Transfuse: Fusing transformers and cnns for medical image segmentation [**MICCAI 2021**][[PDF](https://arxiv.org/pdf/2102.08005)][[Code](https://github.com/Rayicer/TransFuse )]

## 3D Point Cloud Completion
Learning Local Displacements for Point Cloud Completion [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Learning_Local_Displacements_for_Point_Cloud_Completion_CVPR_2022_paper.pdf)][[Code](https://github.com/wangyida/disp3d )]

AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Mittal_AutoSDF_Shape_Priors_for_3D_Completion_Reconstruction_and_Generation_CVPR_2022_paper.pdf)][[Code](https://github.com/yccyenchicheng/AutoSDF )]

PointAttN: You Only Need Attention for Point Cloud Completion [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2203.08485)][[Code](https://github.com/ohhhyeahhh/PointAttN )]

Point cloud completion on structured feature map with feedback network [**CVM 2022**][[PDF](https://arxiv.org/pdf/2202.08583 )]

ShapeFormer: Transformer-based Shape Completion via Sparse Representation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Yan_ShapeFormer_Transformer-Based_Shape_Completion_via_Sparse_Representation_CVPR_2022_paper.pdf)][[Code](https://github.com/QhelDIV/ShapeFormer )]

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion [**ICLR 2021**][[PDF](https://arxiv.org/pdf/2112.03530)][[Code](https://github.com/ZhaoyangLyu/Point_Diffusion_Refinement )]

MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2111.11976 )]

PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion [**IROS 2021**][[PDF](https://www.researchgate.net/profile/Alexander-Perzylo/publication/353955048_PCTMA-Net_Point_Cloud_Transformer_with_Morphing_Atlas-based_Point_Generation_Network_for_Dense_Point_Cloud_Completion/links/611bd6930c2bfa282a50001d/PCTMA-Net-Point-Cloud-Transformer-with-Morphing-Atlas-based-Point-Generation-Network-for-Dense-Point-Cloud-Completion.pdf)][[Code](https://github.com/LinJianjie/PCTMA_Net )]

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Yu_PoinTr_Diverse_Point_Cloud_Completion_With_Geometry-Aware_Transformers_ICCV_2021_paper.pdf)][[Code](https://github.com/yuxumin/PoinTr )]

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Xiang_SnowflakeNet_Point_Cloud_Completion_by_Snowflake_Point_Deconvolution_With_Skip-Transformer_ICCV_2021_paper.pdf)][[Code](https://github.com/AllenXiangX/SnowflakeNet )]

## 3D Pose Estimation
Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2204.04913 )]

Zero-Shot Category-Level Object Pose Estimation [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2204.03635)][[Code](https://github.com/applied-ai-lab/zero-shot-pose )]

Efficient Virtual View Selection for 3D Hand Pose Estimation [**AAAI 2022**][[PDF](https://www.aaai.org/AAAI22Papers/AAAI-1352.ChengJ.pdf)][[Code](https://github.com/iscas3dv/handpose-virtualview )]

Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [**ECCV 2022**][[PDF](https://infoscience.epfl.ch/record/295132/files/ECCV2022_Match_Normalisation_Point_Cloud_Registration__New_.pdf)][[Code](https://github.com/dangzheng/matchnorm )]

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2203.13387)][[Code](https://github.com/mfawzy/CrossFormer )]

RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2203.13296 )]

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2203.07628)][[Code](https://github.com/paTRICK-swk/P-STMO )]

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_MixSTE_Seq2seq_Mixed_Spatio-Temporal_Encoder_for_3D_Human_Pose_Estimation_CVPR_2022_paper.pdf)][[Code](https://github.com/JinluZhang1126/MixSTE )]

6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning [**TIP 2022**][[PDF](https://arxiv.org/pdf/2110.04792 )]

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Hampali_Keypoint_Transformer_Solving_Joint_Identification_in_Challenging_Hands_and_Object_CVPR_2022_paper.pdf)][[Code](https://github.com/shreyashampali/kypt_transformer )]

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation [**IEEE Trans. Multimed. 2022**][[PDF](https://arxiv.org/pdf/2103.14304)][[Code](https://github.com/Vegetebird/StridedTransformer-Pose3D )]

3D Human Pose Estimation with Spatial and Temporal Transformers [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Zheng_3D_Human_Pose_Estimation_With_Spatial_and_Temporal_Transformers_ICCV_2021_paper.pdf)][[Code](https://github.com/zczcwh/PoseFormer )]

End-to-End Human Pose and Mesh Reconstruction with Transformers [**CVPR 2021**][[PDF](http://openaccess.thecvf.com/content/CVPR2021/papers/Lin_End-to-End_Human_Pose_and_Mesh_Reconstruction_with_Transformers_CVPR_2021_paper.pdf)][[Code](https://github.com/microsoft/MeshTransformer )]

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation [**WACV 2021**][[PDF](http://openaccess.thecvf.com/content/WACV2021/papers/Guo_PI-Net_Pose_Interacting_Network_for_Multi-Person_Monocular_3D_Pose_Estimation_WACV_2021_paper.pdf)][[Code](https://github.com/GUO-W/PI-Net )]

HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [**ACM MM 2020**][[PDF](https://dl.acm.org/doi/pdf/10.1145/3394171.3413775 )]

Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [**ECCV 2020**][[PDF](https://cse.buffalo.edu/~jsyuan/papers/2020/4836.pdf )]

Epipolar Transformer for Multi-view Human Pose Estimation [**CVPRW 2020**][[PDF](http://openaccess.thecvf.com/content_CVPRW_2020/papers/w70/He_Epipolar_Transformer_for_Multi-View_Human_Pose_Estimation_CVPRW_2020_paper.pdf)][[Code](https://github.com/yihui-he/epipolar-transformers )]

## Other Tasks

#### 3D Tracking
Pttr: Relational 3d point cloud object tracking with transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_PTTR_Relational_3D_Point_Cloud_Object_Tracking_With_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/Jasonkks/PTTR )]

3d object tracking with transformer [**BMVC 2021**][[PDF](https://arxiv.org/pdf/2110.14921 )]

#### 3D Motion Prediction
Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction [**CVPRW 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022W/Precognition/papers/Medjaouri_HR-STAN_High-Resolution_Spatio-Temporal_Attention_Network_for_3D_Human_Motion_Prediction_CVPRW_2022_paper.pdf )]

Gimo: Gaze-informed human motion prediction in context [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2204.09443)][[Code](https://github.com/y-zheng18/GIMO )]

Pose transformers (potr): Human motion prediction with non-autoregressive transformer [**ICCVW 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021W/SoMoF/papers/Martinez-Gonzalez_Pose_Transformers_POTR_Human_Motion_Prediction_With_Non-Autoregressive_Transformers_ICCVW_2021_paper.pdf)][[Code](https://github.com/idiap/potr)]

Learning progressive joint propagation for human motion prediction [**ECCV 2020**][[PDF](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123520222.pdf )]

History repeats itself: Human motion prediction via motion attention [**ECCV 2020**][[PDF](https://arxiv.org/pdf/2007.11755)][[Code](https://github.com/wei-mao-2019/HisRepItself )]

A spatio-temporal transformer for 3d human motion prediction [**3DV 2021**][[PDF](https://arxiv.org/pdf/2004.08692)][[Code](https://github.com/eth-ait/motion-transformer )]

#### 3D Reconstruction
Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2203.07553 )]

Thundr: Transformer-based 3d human reconstruction with marker [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Zanfir_THUNDR_Transformer-Based_3D_Human_Reconstruction_With_Markers_ICCV_2021_paper.pdf )]

Multi-view 3d reconstruction with transformer [**ICCV 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Multi-View_3D_Reconstruction_With_Transformers_ICCV_2021_paper.pdf )]

#### Point Cloud Registration
Regtr: End-to-end point cloud correspondences with transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Yew_REGTR_End-to-End_Point_Cloud_Correspondences_With_Transformers_CVPR_2022_paper.pdf)][[Code](https://github.com/yewzijian/RegTR )]

LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [**CVPR 2021**][[PDF](https://arxiv.org/abs/2106.12102)][[Code](https://github.com/faridyagubbayli/LegoFormer )]

Robust point cloud registra tion framework based on deep graph matching [**CVPR 2021**][[PDF](https://openaccess.thecvf.com/content/CVPR2021/papers/Fu_Robust_Point_Cloud_Registration_Framework_Based_on_Deep_Graph_Matching_CVPR_2021_paper.pdf)][[Code](https://github.com/fukexue/RGM )]

Deep closest point: Learning representations for point cloud registration [**ICCV 2019**][[PDF](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Deep_Closest_Point_Learning_Representations_for_Point_Cloud_Registration_ICCV_2019_paper.pdf)][[Code](https://github.com/WangYueFt/dcp )]

# Citation

If you find the listing or the survey useful for your work, please cite our paper:

```
@misc{lahoud20223d,
title={3D Vision with Transformers: A Survey},
author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
year={2022},
eprint={2208.04309},
archivePrefix={arXiv},
primaryClass={cs.CV}
}