https://github.com/lahoud/3d-vision-transformers
A list of 3D computer vision papers with Transformers
https://github.com/lahoud/3d-vision-transformers
Last synced: about 1 month ago
JSON representation
A list of 3D computer vision papers with Transformers
- Host: GitHub
- URL: https://github.com/lahoud/3d-vision-transformers
- Owner: lahoud
- Created: 2022-08-04T12:07:13.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-04T09:52:24.000Z (10 months ago)
- Last Synced: 2024-08-01T03:32:12.237Z (9 months ago)
- Size: 58.6 KB
- Stars: 388
- Watchers: 13
- Forks: 29
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Awesome-Transformer-Attention - 3D Vision with Transformers (GitHub)
README
# This repo supplements our [3D Vision with Transformers Survey](https://arxiv.org/abs/2208.04309)
Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan YangThis repo includes all the 3D computer vision papers with Transformers which are presented in our [paper](https://arxiv.org/abs/2208.04309), and we aim to frequently update the latest relevant papers.
![]()
#### Content
- [Object Classification](#object-classification)
- [3D Object Detection](#3d-object-detection)
- [3D Segmentation](#3d-segmentation)
- [Complete Scenes Segmentation](#complete-scenes-segmentation)
- [Point Cloud Video Segmentation](#point-cloud-video-segmentation)
- [Medical Imaging Segmentation](#medical-imaging-segmentation)
- [3D Point Cloud Completion](#3d-point-cloud-completion)
- [3D Pose Estimation](#3d-pose-estimation)
- [Other Tasks](#other-tasks)
- [3D Tracking](#3d-tracking)
- [3D Motion Prediction](#3d-motion-prediction)
- [3D Reconstruction](#3d-reconstruction)
- [Point Cloud Registration](#point-cloud-registration)## Object Classification
Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning [**RS 2022**][[PDF](https://www.mdpi.com/2072-4292/14/7/1563/pdf?version=1648109597 )]Masked Autoencoders for Point Cloud Self-supervised Learning [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2203.06604)][[Code](https://github.com/Pang-Yatian/Point-MAE )]
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [**T-ITS 2022**][[PDF](https://arxiv.org/pdf/2203.00828 )]
LFT-Net: Local Feature Transformer Network for Point Clouds Analysis [**T-ITS 2022**][[PDF](https://ieeexplore.ieee.org/document/9700748/ )]
Sewer defect detection from 3D point clouds using a transformer-based deep learning model [**Automation in Construction 2022**][[PDF](https://www.mdpi.com/1424-8220/22/12/4517/pdf?version=1655277701 )]
3d medical point transformer: Introducing convolution to attention networks for medical point cloud analysis [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2112.04863)][[Code](https://github.com/crane-papercode/3dmedpt )]
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Yu_Point-BERT_Pre-Training_3D_Point_Cloud_Transformers_With_Masked_Point_Modeling_CVPR_2022_paper.pdf)][[Code](https://github.com/lulutang0608/Point-BERT )]
CpT: Convolutional Point Transformer for 3D Point Cloud Processing [**ACCVW 2022**][[PDF](https://arxiv.org/pdf/2111.10866 )]
PatchFormer: An Efficient Point Transformer With Patch Attention [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_PatchFormer_An_Efficient_Point_Transformer_With_Patch_Attention_CVPR_2022_paper.pdf)]
PVT: Point-Voxel Transformer for Point Cloud Learning [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2108.06076.pdf)][[Code](https://github.com/HaochengWan/PVT )]
Adaptive Wavelet Transformer Network for 3D Shape Representation Learning [**ICLR 2021**][[PDF](https://openreview.net/pdf?id=5MLb3cLCJY )]
Point cloud learning with transformer [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2104.13636 )]
3crossnet: Cross-level cross-scale cross-attention network for point cloud representation [**RA-L 2022**][[PDF](https://arxiv.org/pdf/2104.13053 )]
Dual Transformer for Point Cloud Analysis [**IEEE Trans Multimedia**][[PDF](https://arxiv.org/pdf/2104.13044 )]
Centroid transformers: Learning to abstract with attention [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2102.08606 )]
PCT: Point cloud transformer [**CVPR 2019**][[PDF](http://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_Modeling_Point_Clouds_With_Self-Attention_and_Gumbel_Subset_Sampling_CVPR_2019_paper.pdf)][[Code](https://github.com/MenghaoGuo/PCT )]
Point Transformer [**ICCV 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Point_Transformer_ICCV_2021_paper.pdf)][[Code](https://github.com/POSTECH-CVLab/point-transformer )]
Point Transformer [**IEEE Access 2021**][[PDF](https://arxiv.org/pdf/2011.00931)][[Code](https://github.com/engelnico/point-transformer )]
Modeling point clouds with self-attention and gumbel subset sampling [**CVPR 2019**][[PDF](https://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_Modeling_Point_Clouds_With_Self-Attention_and_Gumbel_Subset_Sampling_CVPR_2019_paper.pdf )]
Attentional shapecontextnet for point cloud recognition [**CVPR 2018**][[PDF](http://openaccess.thecvf.com/content_cvpr_2018/papers/Xie_Attentional_ShapeContextNet_for_CVPR_2018_paper.pdf)][[Code](https://github.com/umyta/A-SCN )]
## 3D Object Detection
Bridged Transformer for Vision and Point Cloud 3D Object Detection [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Bridged_Transformer_for_Vision_and_Point_Cloud_3D_Object_Detection_CVPR_2022_paper.pdf )]
Multimodal Token Fusion for Vision Transformers [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Multimodal_Token_Fusion_for_Vision_Transformers_CVPR_2022_paper.pdf )][[Code](https://github.com/yikaiw/TokenFusion )]
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_CAT-Det_Contrastively_Augmented_Transformer_for_Multi-Modal_3D_Object_Detection_CVPR_2022_paper.pdf )]
Focused Decoding Enables 3D Anatomical Detection by Transformers [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2207.10774.pdf)][[Code](https://github.com/bwittmann/transoar )]
MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2203.13310)][[Code](https://github.com/ZrrSkywalker/MonoDETR )]
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Bai_TransFusion_Robust_LiDAR-Camera_Fusion_for_3D_Object_Detection_With_Transformers_CVPR_2022_paper.pdf)][[Code](https://github.com/XuyangBai/TransFusion )]
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/He_Voxel_Set_Transformer_A_Set-to-Set_Approach_to_3D_Object_Detection_CVPR_2022_paper.pdf)][[Code](https://github.com/skyhehe123/VoxSeT )]
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Deng_VISTA_Boosting_3D_Object_Detection_via_Dual_Cross-VIew_SpaTial_Attention_CVPR_2022_paper.pdf)][[Code](https://github.com/Gorilla-Lab-SCUT/VISTA )]
Point Density-Aware Voxels for LiDAR 3D Object Detection [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Hu_Point_Density-Aware_Voxels_for_LiDAR_3D_Object_Detection_CVPR_2022_paper.pdf)][[Code](https://github.com/TRAILab/PDV )]
PETR: Position Embedding Transformation for Multi-View 3D Object Detection [**ECCV 2022**][[PDF](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136870523.pdf)][[Code](https://github.com/megvii-research/PETR )]
ARM3D: Attention-based relation module for indoor 3D object detection [**Comput. Vis.**][[PDF](https://link.springer.com/content/pdf/10.1007/s41095-021-0252-6.pdf)][[Code](https://github.com/lanlan96/arm3d )]
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Huang_MonoDTR_Monocular_3D_Object_Detection_With_Depth-Aware_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/KuanchihHuang/MonoDTR )]
Attention-based Proposals Refinement for 3D Object Detection [**IV 2022**][[PDF](https://arxiv.org/pdf/2201.07070)][[Code](https://github.com/quan-dao/APRO3D-Net )]
Embracing Single Stride 3D Object Detector with Sparse Transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Fan_Embracing_Single_Stride_3D_Object_Detector_With_Sparse_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/tusen-ai/SST )]
Fast Point Transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Park_Fast_Point_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/POSTECH-CVLab/FastPointTransformer )]
BoxeR: Box-Attention for 2D and 3D Transformers [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Nguyen_BoxeR_Box-Attention_for_2D_and_3D_Transformers_CVPR_2022_paper.pdf)][[Code](https://github.com/kienduynguyen/BoxeR )]
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [**CoRL 2022**][[PDF](https://proceedings.mlr.press/v164/wang22b/wang22b.pdf)][[Code](https://github.com/WangYueFt/detr3d )]
An End-to-End Transformer Model for 3D Object Detection [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Misra_An_End-to-End_Transformer_Model_for_3D_Object_Detection_ICCV_2021_paper.pdf)][[Code](https://github.com/facebookresearch/3detr )]
Voxel Transformer for 3D Object Detection [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Mao_Voxel_Transformer_for_3D_Object_Detection_ICCV_2021_paper.pdf)][[Code](https://github.com/PointsCoder/VOTR )]
Improving 3D Object Detection with Channel-wise Transformer [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Sheng_Improving_3D_Object_Detection_With_Channel-Wise_Transformer_ICCV_2021_paper.pdf)][[Code](https://github.com/hlsheng1/CT3D )]
M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers [**WACV 2022**][[PDF](https://openaccess.thecvf.com/content/WACV2022/papers/Guan_M3DETR_Multi-Representation_Multi-Scale_Mutual-Relation_3D_Object_Detection_With_Transformers_WACV_2022_paper.pdf)][[Code](https://github.com/rayguan97/M3DeTR )]
Group-Free 3D Object Detection via Transformers [**ICCV 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021/papers/Liu_Group-Free_3D_Object_Detection_via_Transformers_ICCV_2021_paper.pdf)][[Code](https://github.com/zeliu98/Group-Free-3D )]
SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [**ICCVW 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021W/AVVision/papers/Bhattacharyya_SA-Det3D_Self-Attention_Based_Context-Aware_3D_Object_Detection_ICCVW_2021_paper.pdf)][[Code](https://github.com/AutoVision-cloud/SA-Det3D )]
3D object detection with pointformer [**CVPR 2021**][[PDF](http://openaccess.thecvf.com/content/CVPR2021/papers/Pan_3D_Object_Detection_With_Pointformer_CVPR_2021_paper.pdf)][[Code](https://github.com/Vladimir2506/Pointformer )]
Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving [**IEEE Trans. Circuits Syst.**][[PDF](https://arxiv.org/pdf/2011.13628 )]
MLCVNet: Multi-Level Context VoteNet for 3D Object Detection [**CVPR 2020**][[PDF](https://openaccess.thecvf.com/content_CVPR_2020/papers/Xie_MLCVNet_Multi-Level_Context_VoteNet_for_3D_Object_Detection_CVPR_2020_paper.pdf)][[Code](https://github.com/NUAAXQ/MLCVNet )]
LiDAR-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention [**CVPR 2020**][[PDF](http://openaccess.thecvf.com/content_CVPR_2020/papers/Yin_LiDAR-Based_Online_3D_Video_Object_Detection_With_Graph-Based_Message_Passing_CVPR_2020_paper.pdf)][[Code](https://github.com/yinjunbo/3DVID )]
SCANet: Spatial-channel attention network for 3d object detection [**ICASSP 2019**][[PDF](https://ieeexplore.ieee.org/document/8682746)][[Code](https://github.com/zhouruqin/SCANet )]
## 3D Segmentation
For part segmentation, check [Object Classification](#object-classification)#### Complete Scenes Segmentation
Stratified Transformer for 3D Point Cloud Segmentation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Lai_Stratified_Transformer_for_3D_Point_Cloud_Segmentation_CVPR_2022_paper.pdf)][[Code](https://github.com/dvlab-research/Stratified-Transformer )]
Multimodal Token Fusion for Vision Transformers [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Multimodal_Token_Fusion_for_Vision_Transformers_CVPR_2022_paper.pdf )][[Code](https://github.com/yikaiw/TokenFusion )]
Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation [**AAAI 2022**][[PDF](https://www.aaai.org/AAAI22Papers/AAAI-5976.XuS.pdf )]
Fast Point Transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Park_Fast_Point_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/POSTECH-CVLab/FastPointTransformer )]
Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Thyagharajan_Segment-Fusion_Hierarchical_Context_Fusion_for_Robust_3D_Semantic_Segmentation_CVPR_2022_paper.pdf )]
#### Point Cloud Video Segmentation
Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling [**TPAMI 2022**][[PDF](https://ieeexplore.ieee.org/abstract/document/9740525 )]
Spatial-Temporal Transformer for 3D Point Cloud Sequences [**WACV 2022**][[PDF](https://openaccess.thecvf.com/content/WACV2022/papers/Wei_Spatial-Temporal_Transformer_for_3D_Point_Cloud_Sequences_WACV_2022_paper.pdf )]
Point 4D transformer networks for spatio-temporal modeling in point cloud videos [**CVPR 2021**][[PDF](http://openaccess.thecvf.com/content/CVPR2021/papers/Fan_Point_4D_Transformer_Networks_for_Spatio-Temporal_Modeling_in_Point_Cloud_CVPR_2021_paper.pdf)][[Code](https://github.com/hehefan/P4Transformer )]
#### Medical Imaging Segmentation
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images [**MICCAI BrainLes 2022**][[PDF](https://arxiv.org/pdf/2201.01266)][[Code](https://github.com/Project-MONAI/research-contributions/tree/master/SwinUNETR/BRATS21 )]D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation [**Neural Comput Appl 2022**][[PDF](https://arxiv.org/pdf/2201.00462 )]
A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation [**MICCAI 2022**][[PDF](https://arxiv.org/pdf/2111.13300)][[Code](https://github.com/himashi92/VT-UNet )]
T-AutoML: Automated Machine Learning for Lesion Segmentation using Transformers in 3D Medical Imaging [**ICCV 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_T-AutoML_Automated_Machine_Learning_for_Lesion_Segmentation_Using_Transformers_in_ICCV_2021_paper.pdf )]
After-unet: Axial fusion transformer unet for medical image segmentation [**WACV 2022**][[PDF](https://openaccess.thecvf.com/content/WACV2022/papers/Yan_AFTer-UNet_Axial_Fusion_Transformer_UNet_for_Medical_Image_Segmentation_WACV_2022_paper.pdf )]
Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation [**MICCAI BrainLes 2022**][[PDF](https://arxiv.org/pdf/2109.12271 )]
nnformer: Interleaved transformer for volumetric segmentation [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2109.03201)][[Code](https://github.com/282857341/nnFormer )]
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [**MICCAI 2022**][[PDF](https://arxiv.org/abs/2107.00781)][[Code](https://github.com/yhygao/UTNet )]
Medical image segmentation using squeezeand-expansion transformers [**IJCAI 2021**][[PDF](https://arxiv.org/pdf/2105.09511)][[Code](https://github.com/askerlee/segtran )]
Unetr: Transformers for 3d medical image segmentation [**WACV 2022**][[PDF](https://openaccess.thecvf.com/content/WACV2022/papers/Hatamizadeh_UNETR_Transformers_for_3D_Medical_Image_Segmentation_WACV_2022_paper.pdf)][[Code](https://github.com/Project-MONAI/research-contributions/tree/master/UNETR/BTCV )]
Transbts: Multimodal brain tumor segmentation using transformer [**MICCAI 2021**][[PDF](https://arxiv.org/pdf/2103.04430)][[Code](https://github.com/Wenxuan-1119/TransBTS )]
Spectr: Spectral transformer for hyperspectral pathology image segmentation [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2103.03604)][[Code](https://github.com/hfut-xc-yun/SpecTr )]
Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [**MICCAI 2021**][[PDF](https://arxiv.org/pdf/2103.03024)][[Code](https://github.com/YtongXie/CoTr )]
Convolution-free medical image segmentation using transformers [**MICCAI 2021**][[PDF](https://arxiv.org/pdf/2102.13645 )]
Transfuse: Fusing transformers and cnns for medical image segmentation [**MICCAI 2021**][[PDF](https://arxiv.org/pdf/2102.08005)][[Code](https://github.com/Rayicer/TransFuse )]
## 3D Point Cloud Completion
Learning Local Displacements for Point Cloud Completion [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Learning_Local_Displacements_for_Point_Cloud_Completion_CVPR_2022_paper.pdf)][[Code](https://github.com/wangyida/disp3d )]AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Mittal_AutoSDF_Shape_Priors_for_3D_Completion_Reconstruction_and_Generation_CVPR_2022_paper.pdf)][[Code](https://github.com/yccyenchicheng/AutoSDF )]
PointAttN: You Only Need Attention for Point Cloud Completion [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2203.08485)][[Code](https://github.com/ohhhyeahhh/PointAttN )]
Point cloud completion on structured feature map with feedback network [**CVM 2022**][[PDF](https://arxiv.org/pdf/2202.08583 )]
ShapeFormer: Transformer-based Shape Completion via Sparse Representation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Yan_ShapeFormer_Transformer-Based_Shape_Completion_via_Sparse_Representation_CVPR_2022_paper.pdf)][[Code](https://github.com/QhelDIV/ShapeFormer )]
A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion [**ICLR 2021**][[PDF](https://arxiv.org/pdf/2112.03530)][[Code](https://github.com/ZhaoyangLyu/Point_Diffusion_Refinement )]
MFM-Net: Unpaired Shape Completion Network with Multi-stage Feature Matching [**arXiv 2021**][[PDF](https://arxiv.org/pdf/2111.11976 )]
PCTMA-Net: Point Cloud Transformer with Morphing Atlas-based Point Generation Network for Dense Point Cloud Completion [**IROS 2021**][[PDF](https://www.researchgate.net/profile/Alexander-Perzylo/publication/353955048_PCTMA-Net_Point_Cloud_Transformer_with_Morphing_Atlas-based_Point_Generation_Network_for_Dense_Point_Cloud_Completion/links/611bd6930c2bfa282a50001d/PCTMA-Net-Point-Cloud-Transformer-with-Morphing-Atlas-based-Point-Generation-Network-for-Dense-Point-Cloud-Completion.pdf)][[Code](https://github.com/LinJianjie/PCTMA_Net )]
PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Yu_PoinTr_Diverse_Point_Cloud_Completion_With_Geometry-Aware_Transformers_ICCV_2021_paper.pdf)][[Code](https://github.com/yuxumin/PoinTr )]
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Xiang_SnowflakeNet_Point_Cloud_Completion_by_Snowflake_Point_Deconvolution_With_Skip-Transformer_ICCV_2021_paper.pdf)][[Code](https://github.com/AllenXiangX/SnowflakeNet )]
## 3D Pose Estimation
Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2204.04913 )]Zero-Shot Category-Level Object Pose Estimation [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2204.03635)][[Code](https://github.com/applied-ai-lab/zero-shot-pose )]
Efficient Virtual View Selection for 3D Hand Pose Estimation [**AAAI 2022**][[PDF](https://www.aaai.org/AAAI22Papers/AAAI-1352.ChengJ.pdf)][[Code](https://github.com/iscas3dv/handpose-virtualview )]
Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World [**ECCV 2022**][[PDF](https://infoscience.epfl.ch/record/295132/files/ECCV2022_Match_Normalisation_Point_Cloud_Registration__New_.pdf)][[Code](https://github.com/dangzheng/matchnorm )]
CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2203.13387)][[Code](https://github.com/mfawzy/CrossFormer )]
RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2203.13296 )]
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2203.07628)][[Code](https://github.com/paTRICK-swk/P-STMO )]
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhang_MixSTE_Seq2seq_Mixed_Spatio-Temporal_Encoder_for_3D_Human_Pose_Estimation_CVPR_2022_paper.pdf)][[Code](https://github.com/JinluZhang1126/MixSTE )]
6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning [**TIP 2022**][[PDF](https://arxiv.org/pdf/2110.04792 )]
Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Hampali_Keypoint_Transformer_Solving_Joint_Identification_in_Challenging_Hands_and_Object_CVPR_2022_paper.pdf)][[Code](https://github.com/shreyashampali/kypt_transformer )]
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation [**IEEE Trans. Multimed. 2022**][[PDF](https://arxiv.org/pdf/2103.14304)][[Code](https://github.com/Vegetebird/StridedTransformer-Pose3D )]
3D Human Pose Estimation with Spatial and Temporal Transformers [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Zheng_3D_Human_Pose_Estimation_With_Spatial_and_Temporal_Transformers_ICCV_2021_paper.pdf)][[Code](https://github.com/zczcwh/PoseFormer )]
End-to-End Human Pose and Mesh Reconstruction with Transformers [**CVPR 2021**][[PDF](http://openaccess.thecvf.com/content/CVPR2021/papers/Lin_End-to-End_Human_Pose_and_Mesh_Reconstruction_with_Transformers_CVPR_2021_paper.pdf)][[Code](https://github.com/microsoft/MeshTransformer )]
PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation [**WACV 2021**][[PDF](http://openaccess.thecvf.com/content/WACV2021/papers/Guo_PI-Net_Pose_Interacting_Network_for_Multi-Person_Monocular_3D_Pose_Estimation_WACV_2021_paper.pdf)][[Code](https://github.com/GUO-W/PI-Net )]
HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [**ACM MM 2020**][[PDF](https://dl.acm.org/doi/pdf/10.1145/3394171.3413775 )]
Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [**ECCV 2020**][[PDF](https://cse.buffalo.edu/~jsyuan/papers/2020/4836.pdf )]
Epipolar Transformer for Multi-view Human Pose Estimation [**CVPRW 2020**][[PDF](http://openaccess.thecvf.com/content_CVPRW_2020/papers/w70/He_Epipolar_Transformer_for_Multi-View_Human_Pose_Estimation_CVPRW_2020_paper.pdf)][[Code](https://github.com/yihui-he/epipolar-transformers )]
## Other Tasks
#### 3D Tracking
Pttr: Relational 3d point cloud object tracking with transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_PTTR_Relational_3D_Point_Cloud_Object_Tracking_With_Transformer_CVPR_2022_paper.pdf)][[Code](https://github.com/Jasonkks/PTTR )]3d object tracking with transformer [**BMVC 2021**][[PDF](https://arxiv.org/pdf/2110.14921 )]
#### 3D Motion Prediction
Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction [**CVPRW 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022W/Precognition/papers/Medjaouri_HR-STAN_High-Resolution_Spatio-Temporal_Attention_Network_for_3D_Human_Motion_Prediction_CVPRW_2022_paper.pdf )]Gimo: Gaze-informed human motion prediction in context [**ECCV 2022**][[PDF](https://arxiv.org/pdf/2204.09443)][[Code](https://github.com/y-zheng18/GIMO )]
Pose transformers (potr): Human motion prediction with non-autoregressive transformer [**ICCVW 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021W/SoMoF/papers/Martinez-Gonzalez_Pose_Transformers_POTR_Human_Motion_Prediction_With_Non-Autoregressive_Transformers_ICCVW_2021_paper.pdf)][[Code](https://github.com/idiap/potr)]
Learning progressive joint propagation for human motion prediction [**ECCV 2020**][[PDF](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123520222.pdf )]
History repeats itself: Human motion prediction via motion attention [**ECCV 2020**][[PDF](https://arxiv.org/pdf/2007.11755)][[Code](https://github.com/wei-mao-2019/HisRepItself )]
A spatio-temporal transformer for 3d human motion prediction [**3DV 2021**][[PDF](https://arxiv.org/pdf/2004.08692)][[Code](https://github.com/eth-ait/motion-transformer )]
#### 3D Reconstruction
Vpfusion: Joint 3d volume and pixel-aligned feature fusion for single and multi-view 3d reconstruction [**arXiv 2022**][[PDF](https://arxiv.org/pdf/2203.07553 )]Thundr: Transformer-based 3d human reconstruction with marker [**ICCV 2021**][[PDF](http://openaccess.thecvf.com/content/ICCV2021/papers/Zanfir_THUNDR_Transformer-Based_3D_Human_Reconstruction_With_Markers_ICCV_2021_paper.pdf )]
Multi-view 3d reconstruction with transformer [**ICCV 2021**][[PDF](https://openaccess.thecvf.com/content/ICCV2021/papers/Wang_Multi-View_3D_Reconstruction_With_Transformers_ICCV_2021_paper.pdf )]
#### Point Cloud Registration
Regtr: End-to-end point cloud correspondences with transformer [**CVPR 2022**][[PDF](https://openaccess.thecvf.com/content/CVPR2022/papers/Yew_REGTR_End-to-End_Point_Cloud_Correspondences_With_Transformers_CVPR_2022_paper.pdf)][[Code](https://github.com/yewzijian/RegTR )]LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [**CVPR 2021**][[PDF](https://arxiv.org/abs/2106.12102)][[Code](https://github.com/faridyagubbayli/LegoFormer )]
Robust point cloud registra tion framework based on deep graph matching [**CVPR 2021**][[PDF](https://openaccess.thecvf.com/content/CVPR2021/papers/Fu_Robust_Point_Cloud_Registration_Framework_Based_on_Deep_Graph_Matching_CVPR_2021_paper.pdf)][[Code](https://github.com/fukexue/RGM )]
Deep closest point: Learning representations for point cloud registration [**ICCV 2019**][[PDF](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Deep_Closest_Point_Learning_Representations_for_Point_Cloud_Registration_ICCV_2019_paper.pdf)][[Code](https://github.com/WangYueFt/dcp )]
# Citation
If you find the listing or the survey useful for your work, please cite our paper:
```
@misc{lahoud20223d,
title={3D Vision with Transformers: A Survey},
author={Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
year={2022},
eprint={2208.04309},
archivePrefix={arXiv},
primaryClass={cs.CV}
}