Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-MIM
Reading list for research topics in Masked Image Modeling
https://github.com/ucasligang/awesome-MIM
Last synced: about 4 hours ago
JSON representation
-
Backbone models.
- Generative Pretraining from Pixels - gpt)
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale - research/vision_transformer)
- SiT: Self-supervised vIsion Transformer
- MST: Masked Self-Supervised Transformer for Visual Representation
- BEiT: BERT Pre-Training of Image Transformers
- Masked Autoencoders Are Scalable Vision Learners
- iBOT: Image BERT Pre-Training with Online Tokenizer
- SimMIM: A Simple Framework for Masked Image Modeling
- PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
- MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
- Masked Feature Prediction for Self-Supervised Visual Pre-Training
- Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
- Adversarial Masking for Self-Supervised Learning
- Context Autoencoder for Self-Supervised Representation Learning
- Corrupted Image Modeling for Self-Supervised Visual Pre-Training
- MVP: Multimodality-guided Visual Pre-training
- What to Hide from Your Students: Attention-Guided Masked Image Modeling
- mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
- The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training
- MCMAE: Masked Convolution Meets Masked Autoencoders - vl/convmae)
- Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality - MAE](https://github.com/implus/UM-MAE)
- Green Hierarchical Vision Transformer for Masked Image Modeling
- MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning - X/MixMIM)
- SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners - enyac/supmae)
- HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
- Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
- SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders
- MILAN: Masked Image Pretraining on Language Assisted Representation
- EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
- Good helper is around you: Attention-driven Masked Image Modeling
- TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
- PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling - mmlab/mmselfsup)
- Masked Image Modeling with Local Multi-Scale Reconstruction - noah/Efficient-Computing/tree/master/Self-supervised/LocalMIM)
- Improving Masked Autoencoders by Learning Where to Mask
- DeepMIM: Deep Supervision for Masked Image Modeling
- Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders
- Masked Image Modeling via Dynamic Token Morphing
- iBOT: Image BERT Pre-Training with Online Tokenizer
- PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
-
Object detection.
-
3D.
- Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling - BERT](https://github.com/lulutang0608/Point-BERT)
- Masked Autoencoders for Point Cloud Self-supervised Learning - MAE](https://github.com/Pang-Yatian/Point-MAE)
- Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training - M2AE](https://github.com/ZrrSkywalker/Point-M2AE)
- Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders - MAE](https://github.com/ZrrSkywalker/I2P-MAE)
-
Image generation.
-
Unsupervised Domain Adaptation.
-
Video.
-
Multi-modal.
- MultiMAE: Multi-modal Multi-task Masked Autoencoders - VILAB/MultiMAE)
- Multimodal Masked Autoencoders Learn Transferable Representations
- Masked Vision and Language Modeling for Multi-modal Representation Learning
- Scaling Language-Image Pre-training via Masking
- MultiMAE: Multi-modal Multi-task Masked Autoencoders - VILAB/MultiMAE)
-
Medical.
-
Analysis.
-
Survey.