awesome-masked-autoencoder
A curated list of awesome masked autoencoder in self-supervised learning
https://github.com/chaoningzhang/awesome-masked-autoencoder
Last synced: 9 days ago
JSON representation
-
Beyond Images
-
Videos
- BEVT: BERT Pretraining of Video Transformers
- VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
- VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
- MaskViT: Masked Visual Pre-Training for Video Prediction
- BEVT: BERT Pretraining of Video Transformers
- VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
- VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
- MaskViT: Masked Visual Pre-Training for Video Prediction
- Masked autoencoders as spatiotemporal learners
- Masked autoencoders as spatiotemporal learners
-
Vision and Language
- VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
- An Empirical Study of Training End-to-End Vision-and-Language Transformers
- Data Efficient Masked Language Modeling for Vision and Language
- VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
- An Empirical Study of Training End-to-End Vision-and-Language Transformers
- Data Efficient Masked Language Modeling for Vision and Language
- Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
- Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
- VL-BEiT: Generative Vision-Language Pretraining
- VL-BEiT: Generative Vision-Language Pretraining
-
Point Clouds
- Masked Autoencoders for Point Cloud Self-supervised Learning
- Masked Autoencoders for Point Cloud Self-supervised Learning
- Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
- Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
- Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds
- Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds
-
Graph
- MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs
- Graph Masked Autoencoders with Transformers
- GraphMAE: Self-Supervised Masked Graph Autoencoders
- MaskGAE: Masked Graph Modeling Meets Graph Autoencoders
- MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs
- Graph Masked Autoencoders with Transformers
- GraphMAE: Self-Supervised Masked Graph Autoencoders
- MaskGAE: Masked Graph Modeling Meets Graph Autoencoders
-
Audio
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
- MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
- Masked Autoencoders that Listen
- MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
- Masked Autoencoders that Listen
- Group masked autoencoder based density estimator for audio anomaly detection
- Group masked autoencoder based density estimator for audio anomaly detection
-
Reinforcement Learning
-
Others
-
-
Masked Image Modeling
- Masked Autoencoders Pre-training in Multiple Instance Learning for Whole Slide Image Classification
- Masked Siamese ConvNets
- Masked Autoencoders Are Scalable Vision Learners
- SimMIM: A Simple Framework for Masked Image Modeling
- mc-BEiT: Multi-choice discretization for image bert pre-training
- PeCo: Perceptual codebook for bert pre-training of vision transformers
- Context autoencoder for self-supervised representation learning
- Green hierarchical vision transformer for masked image modeling
- Masked Autoencoders Are Scalable Vision Learners
- Hivit: Hierarchical vision transformer meets masked image modeling
- Efficient self-supervised vision pretraining with local masked reconstruction
- Object-wise Masked Autoencoders for Fast Pre-training
- SimMIM: A Simple Framework for Masked Image Modeling
- MixMIM: Mixed and masked image modeling for efficient visual representation learning
- mc-BEiT: Multi-choice discretization for image bert pre-training
- PeCo: Perceptual codebook for bert pre-training of vision transformers
- Context autoencoder for self-supervised representation learning
- Green hierarchical vision transformer for masked image modeling
- ConvMAE: Masked Convolution Meets Masked Autoencoders
- Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
- Hivit: Hierarchical vision transformer meets masked image modeling
- Efficient self-supervised vision pretraining with local masked reconstruction
- Object-wise Masked Autoencoders for Fast Pre-training
- MixMIM: Mixed and masked image modeling for efficient visual representation learning
- ConvMAE: Masked Convolution Meets Masked Autoencoders
- Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
- Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
- On Data Scaling in Masked Image Modeling
- Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
- MultiMAE: Multi-modal Multi-task Masked Autoencoders
- Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers
- Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
- Corrupted Image Modeling for Self-Supervised Visual Pre-Training
- Masked frequency modeling for self-supervised visual pre-training
- How to understand masked autoencoders?
- Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks
- Siamese Image Modeling for Self-Supervised Vision Representation Learning
- iBOT: Image BERT Pre-Training with Online Tokenizer
- Masked Autoencoders are Robust Data Augmentors
- Self Pre-training with Masked Autoencoders for Medical Image Analysis
- Masked Image Modeling Advances 3D Medical Image Analysis
- Student Collaboration Improves Self-Supervised Learning: Dual-Loss Adaptive Masked Autoencoder for Brain Cell Image Analysis
- Global Contrast Masked Autoencoders Are Powerful Pathological Representation Learners
- Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN
- Corrupted Image Modeling for Self-Supervised Visual Pre-Training
- Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
- On Data Scaling in Masked Image Modeling
- Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
- MultiMAE: Multi-modal Multi-task Masked Autoencoders
- Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers
- Masked frequency modeling for self-supervised visual pre-training
- How to understand masked autoencoders?
- Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks
- Siamese Image Modeling for Self-Supervised Vision Representation Learning
- Masked Autoencoders are Robust Data Augmentors
- Self Pre-training with Masked Autoencoders for Medical Image Analysis
- Masked Image Modeling Advances 3D Medical Image Analysis
- Student Collaboration Improves Self-Supervised Learning: Dual-Loss Adaptive Masked Autoencoder for Brain Cell Image Analysis
- Global Contrast Masked Autoencoders Are Powerful Pathological Representation Learners
- Masked Autoencoders Pre-training in Multiple Instance Learning for Whole Slide Image Classification
- Masked Siamese ConvNets
- BEiT: Bert pre-training of image transformers
Categories