DeepLearning-Paper-with-Code
There are paper with code for CV / AIGC / LLM / VLM.
https://github.com/Gojay001/DeepLearning-Paper-with-Code
Last synced: 7 days ago
JSON representation
-
3D Face Reconstruction and Facial Animation
- Displaced Dynamic Expression Regression forReal-time Facial Tracking and Animation
- Bilinear Models for 3-D Face andFacial Expression Recognition
- FaceWarehouse: a 3D Facial Expression Databasefor Visual Computing
- Face2Face: Real-Time Face Capture and Reenactment of RGB Videos
- Real-time Facial Animation with Image-based Dynamic Avatars
- Learning a model of facial shape and expression from 4D scans - fitting) [PyTorch](https://github.com/soubhiksanyal/FLAME_PyTorch)
- Nonlinear 3D Face Morphable Model
- Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set
- Face It!: A Pipeline for Real-Time Performance-Driven Facial Animation
- Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision
- To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision - gravis/Occlusion-Robust-MoFA)
- Towards Metrical Reconstruction of Human Faces
- A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images
- A Morphable Model For The Synthesis Of 3D Faces
- FaceWarehouse: a 3D Facial Expression Databasefor Visual Computing
- Nonlinear 3D Face Morphable Model
- Stabilized real-time face tracking via a learned dynamic rigidity prior
- Real-time Facial Animation with Image-based Dynamic Avatars
-
3D Object Detection
- PV-RCNN - RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection](http://openaccess.thecvf.com/content_CVPR_2020/papers/Shi_PV-RCNN_Point-Voxel_Feature_Set_Abstraction_for_3D_Object_Detection_CVPR_2020_paper.pdf) | CVPR(2020) | [PyTorch](https://github.com/sshaoshuai/PV-RCNN)
-
AIGC-Applications
-
Face Editing
- Towards Real-World Blind Face Restoration with Generative Facial Prior
- HairCLIP: Design Your Hair by Text and Reference Image
- HairMapper: Removing Hair from Portraits Using GANs
- LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
- LEDITS++: Limitless Image Editing using Text-to-Image Models - research/ledits_pp)
-
Face Swapping
- FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping
- DeepFaceLab: Integrated, flexible and extensible face-swapping framework
- SimSwap: An Efficient Framework For High Fidelity Face Swapping
- FaceController: Controllable Attribute Editing for Face in the Wild
- HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping - ai/hififace)
- GHOST—A New Face Swap Approach for Image and Video Domains - forever/ghost)
- MobileFaceSwap: A Lightweight Framework for Video Face Swapping
- Fine-Grained Face Swapping via Regional GAN Inversion
- SimSwap++: Towards Faster and High-Quality Identity Swapping
- DiffFace: Diffusion-based Face Swapping with Facial Guidance
- DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion - zhao/DiffSwap)
- DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning - 7/DreamID)
-
-
Attention or Transformer
-
Face Swapping
- Learning Deep Features for Discriminative Localization
- Squeeze-and-Excitation Networks - frank/SENet)
- Graph Attention Networks - /GAT)
- Non-local Neural Networks - nonlocal-net)
-
- Attention Is All You Need
- Non-local Neural Networks - nonlocal-net)
- Image Transformer
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale - research/vision_transformer)
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - Transformer)
- ResT: An Efficient Transformer for Visual Recognition
- Dual-stream Network for Visual Recognition
- Transformer in Convolutional Neural Networks - liu/TransCNN)
- Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
-
-
Diffusion Model
- Denoising Diffusion Probabilistic Models - diffusion-pytorch)
- High-Resolution Image Synthesis with Latent Diffusion Models - diffusion)
- Scalable Diffusion Models with Transformers
- Back to Basics: Let Denoising Generative Models Denoise
- PixelDiT: Pixel Diffusion Transformers for Image Generation
- Awesome-Diffusion-Models
-
Few-Shot Learning
- RN - Shot Learning](https://arxiv.org/abs/1711.06025) | CVPR(2018) | [PyTorch](https://github.com/Gojay001/LearningToCompare_FSL)
-
Few-Shot Segmentation
-
Visual Object Tracking
-
- OSLSM - Shot Learning for Semantic Segmentation](https://arxiv.org/abs/1709.03410) | BMVC(2017) | [Caffe](https://github.com/lzzcd001/OSLSM)
- co-FCN - Shot Semantic Segmentation](https://openreview.net/pdf?id=SkMjFKJwG) | ICLR(2018) | [code]
- AMP: Adaptive Masked Proxies for Few-Shot Segmentation
- SG-One - One: Similarity Guidance Network for One-Shot Semantic Segmentation](https://arxiv.org/abs/1810.09091) | arXiv(2018) / TCYB(2020) | [PyTorch](https://github.com/xiaomengyc/SG-One)
- Learning Combinatorial Embedding Networks for Deep Graph Matching - SJTU/PCA-GM)
- CANet - Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning](https://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_CANet_Class-Agnostic_Segmentation_Networks_With_Iterative_Refinement_and_Attentive_Few-Shot_CVPR_2019_paper.pdf) | CVPR(2019) | [PyTorch](https://github.com/icoz69/CaNet)
- PGNet - Based One-Shot Semantic Segmentation](https://openaccess.thecvf.com/content_ICCV_2019/papers/Zhang_Pyramid_Graph_Networks_With_Connection_Attentions_for_Region-Based_One-Shot_Semantic_ICCV_2019_paper.pdf) | ICCV(2019) | [code]
- CRNet - Reference Networks for Few-Shot Segmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Liu_CRNet_Cross-Reference_Networks_for_Few-Shot_Segmentation_CVPR_2020_paper.pdf) | CVPR(2020) | [code]
- FGN: Fully Guided Network for Few-Shot Instance Segmentation
- On the Texture Bias for Few-Shot CNN Segmentation - segmentation)
- LTM - Shot Segmentation](https://arxiv.org/abs/1910.05886) | MMMM(2020) | [code]
- SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation
- PPNet - aware Prototype Network for Few-shot Semantic Segmentation](https://arxiv.org/abs/2007.06309) | ECCV(2020) | [PyTorch](https://github.com/Xiangyi1996/PPNet-PyTorch)
- PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation - Research-Lab/PFENet)
- Prototype Mixture Models for Few-shot Semantic Segmentation - Bob/PMMs)
- Generalized Few-Shot Semantic Segmentation
- Self-Guided and Cross-Guided Learning for Few-Shot Segmentation
- Adaptive Prototype Learning and Allocation for Few-Shot Segmentation
- Hypercorrelation Squeeze for Few-Shot Segmenation
- Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
- Self-Guided and Cross-Guided Learning for Few-Shot Segmentation
- Adaptive Prototype Learning and Allocation for Few-Shot Segmentation
- Few-Shot-Semantic-Segmentation-Papers
-
-
Generative Adversarial Network
- Generative Adversarial Networks
- BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network - group.com/projects/BeautyGAN)
- BeautyGAN: Instance-level Facial Makeup Transfer with Deep Generative Adversarial Network - group.com/projects/BeautyGAN)
- Image-to-Image Translation with Conditional Adversarial Networks
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks - CycleGAN-and-pix2pix)
- High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
- A Style-Based Generator Architecture for Generative Adversarial Networks
- Analyzing and Improving the Image Quality of StyleGAN
- Training Generative Adversarial Networks with Limited Data - ada-pytorch)
- StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
- MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis - dev/MobileStyleGAN.pytorch)
- Alias-Free Generative Adversarial Networks
- PyTorch-GAN
-
Image Classification
- Gradient-based learning applied to document recognition
- NIN - pwcn/tree/master/Classification/NIN/Code)
- Very Deep Convolutional Networks for Large-Scale Image Recognition
- GoogLeNet - foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf) | CVPR(2015) | [PyTorch](https://github.com/Gojay001/DeepLearning-pwcn/tree/master/Classification/GoogLeNet/Code)
- ResNet - pwcn/tree/master/Classification/ResNet/Code)
- Deep Layer Aggregation
- Gradient-based learning applied to document recognition
- Awesome - Image Classification
- ImageNet Classification with Deep Convolutional Neural Networks
- Densely Connected Convolutional Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- Searching for MobileNetV3
-
Object Detection
- Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Fast R-CNN
- Faster R-CNN - CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497) | NIPS(2015) | [PyTorch](https://github.com/Gojay001/faster-rcnn.pytorch)
- SSD: Single Shot MultiBox Detector
- You Only Look Once: Unified, Real-Time Object Detection
- YOLO9000: Better, Faster, Stronger
- Focal Loss for Dense Object Detection
- YOLOv3: An Incremental Improvement
- CornerNet: Detecting Objects as Paired Keypoints - vl/CornerNet)
- Objects as Points
- YOLOv4: Optimal Speed and Accuracy of Object Detection
- You Only Look One-level Feature - model/YOLOF)
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- YOLO9000: Better, Faster, Stronger
- awesome-object-detection
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Feature Pyramid Networks for Object Detection
- Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- Fast R-CNN
-
Object Segmentation
- Fully convolutional networks for semantic segmentation
- U-Net: Convolutional Networks for Biomedical Image Segmentation
- SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling
- Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
- Pyramid Scene Parsing Network
- DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
- Mask R-CNN - CNN](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf) | ICCV / TPAMI(2017) | [PyTorch](https://github.com/facebookresearch/detectron2)
- Rethinking Atrous Convolution for Semantic Image Segmentation
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
- Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
- Dual Graph Convolutional Network for Semantic Segmentation - DGCNet)
- Segmenter: Transformer for Semantic Segmentation
- SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
- Fully Transformer Networks for Semantic ImageSegmentation
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers - zvg/SETR)
-
Object Tracking
-
Visual Object Tracking
-
- SORT
- DeepSORT
- Fully-Convolutional Siamese Networks for Object Tracking
- High Performance Visual Tracking with Siamese Region Proposal Network - pytorch)
- SiamRPN++
- SiamMask
- Tracktor
- GlobalTrack - term Tracking](https://arxiv.org/abs/1912.08531) | AAAI(2020) | [PyTorch](https://github.com/huanglianghua/GlobalTrack)
- SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking
- Siamese Box Adaptive Network for Visual Tracking
- Deformable Siamese Attention Networks for Visual Object Tracking - tech/research-siamattn)
- PAMCC-AOT - Assisted Multi-Camera Collaboration for Active Object Tracking](https://arxiv.org/abs/2001.05161) | AAAI(2020) | [code]
- FFT
- JRMOT - Time 3D Multi-Object Tracker and a New Large-Scale Dataset](https://arxiv.org/abs/2002.08397) | arXiv(2020) | [code]
- Tracklet - object Tracking via End-to-end Tracklet Searching and Ranking](https://arxiv.org/abs/2003.02795) | arXiv(2020) | [code]
- Real-time 3D Deep Multi-Camera Tracking
- FairMOT - Object Tracking](https://arxiv.org/abs/2004.01888) | arXiv(2020) | [PyTorch](https://github.com/Gojay001/FairMOT)
- TSDM - refiner and a Mask-generator](https://arxiv.org/abs/2005.04063) | arXiv(2020) | [PyTorch](https://github.com/Gojay001/TSDM)
- Graph Attention Tracking
- Rotation Equivariant Siamese Networks for Tracking - siamnet)
- Center-based 3D Object Detection and Tracking
-
-
Optimization
-
Visual Object Tracking
-
- Deep Sparse Rectifier Neural Networks
- On the importance of initialization and momentum in deep learning
- Adam: A Method for Stochastic Optimization
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- An overview of gradient descent optimization algorithms
- Deep Sparse Rectifier Neural Networks
- On the importance of initialization and momentum in deep learning
- Dropout: a simple way to prevent neural networks from overfitting
-
-
Salient Object Detection
- UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders
- JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection - scu/JL-DCF-pytorch)
- Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation
- Bilateral Attention Network for RGB-D Salient Object Detection
- Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion
-
Survey
- A Survey on 3D Object Detection Methods for Autonomous Driving Applications
- FSL-Survey-2019 - Shot Learning](https://arxiv.org/abs/1904.05046) | CSUR(2019)
- Deep Learning in Video Multi-Object Tracking: A Survey
- A Survey of Transformers
- A Survey on 3D Object Detection Methods for Autonomous Driving Applications
-
Unsupervised Learning
-
Variational Auto-Encoder
-
Vision Transformer
Categories
Object Detection
32
Few-Shot Segmentation
24
Object Tracking
23
3D Face Reconstruction and Facial Animation
18
AIGC-Applications
17
Object Segmentation
16
Generative Adversarial Network
13
Attention or Transformer
13
Image Classification
12
Optimization
9
Vision Transformer
7
Diffusion Model
6
Salient Object Detection
5
Survey
5
Variational Auto-Encoder
2
Few-Shot Learning
1
Unsupervised Learning
1
3D Object Detection
1
Sub Categories
Keywords
computer-vision
3
deep-learning
3
artificial-intelligence
2
image-classification
2
score-matching
1
score-based
1
machine-learning
1
generative-model
1
diffusion-models
1
papers
1
awesome-list
1
awesome
1
object-localisation
1
object-detection
1
detection
1
transformers
1
wae
1
vqvae
1
variational-autoencoders
1
vae-implementation
1
vae
1
reproducible-research
1
pytorch-vae
1
pytorch-implementation
1
pytorch
1
paper-implementations
1
iwae
1
gumbel-softmax
1
dfc-vae
1
celeba-dataset
1
beta-vae
1
architecture
1
attention-mechanism
1