https://github.com/ucasligang/awesome-MIM

Reading list for research topics in Masked Image Modeling
https://github.com/ucasligang/awesome-MIM
Last synced: about 10 hours ago
JSON representation
Reading list for research topics in Masked Image Modeling
Host: GitHub
URL: https://github.com/ucasligang/awesome-MIM
Owner: ucasligang
Created: 2022-02-07T09:14:32.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-12-03T12:51:24.000Z (5 months ago)
Last Synced: 2025-05-08T01:01:59.944Z (4 days ago)
Size: 109 KB
Stars: 333
Watchers: 18
Forks: 32
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

Awesome-Computer-Vision - 25
ultimate-awesome - awesome-MIM - Reading list for research topics in Masked Image Modeling. (Other Lists / Julia Lists)
README

        # awesome-MIM

We have listed the most popular methods in the field of Masked Image Modeling (MIM). If there are any omissions, please feel free to submit a request for additions.

(Note: The dates shown correspond to the first submission of the papers to arXiv, but the provided links may point to the latest versions.)

Additionally, we encourage you to cite our work, [SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders](https://arxiv.org/pdf/2206.10207.pdf).

## Backbone models.

Date|Method|Conference|Title|Code

-----|----|-----|-----|-----

2020-xx-xx(maybe 2019)|iGPT|ICML 2020|[Generative Pretraining from Pixels](http://proceedings.mlr.press/v119/chen20s/chen20s.pdf)|[iGPT](https://github.com/openai/image-gpt)

2020-10-22|ViT|ICLR 2021 (Oral)|[An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)|[ViT](https://github.com/google-research/vision_transformer)

2021-04-08|SiT|Arxiv 2021|[SiT: Self-supervised vIsion Transformer](https://arxiv.org/pdf/2104.03602.pdf)|None

2021-06-10|MST|NeurIPS 2021|[MST: Masked Self-Supervised Transformer for Visual Representation](https://arxiv.org/pdf/2106.05656.pdf)|None

2021-06-14|BEiT|ICLR 2022 (Oral)|[BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254)|[BEiT](https://github.com/microsoft/unilm/tree/master/beit)

2021-11-11|MAE|Arxiv 2021|[Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/pdf/2111.06377.pdf)|[MAE](https://github.com/facebookresearch/mae)

2021-11-15|iBoT|ICLR 2022|[iBOT: Image BERT Pre-Training with Online Tokenizer](https://arxiv.org/pdf/2111.07832.pdf)|[iBoT](https://github.com/bytedance/ibot)

2021-11-18|SimMIM|Arxiv 2021|[SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/pdf/2111.09886.pdf)|[SimMIM](https://github.com/microsoft/SimMIM)

2021-11-24|PeCo|Arxiv 2021|[PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers](https://arxiv.org/pdf/2111.12710.pdf)|None

2021-11-30|MC-SSL0.0|Arxiv 2021|[MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning](https://arxiv.org/pdf/2111.15340.pdf)|None

2021-12-16|MaskFeat|Arxiv 2021|[Masked Feature Prediction for Self-Supervised Visual Pre-Training](https://arxiv.org/pdf/2112.09133.pdf)|None

2021-12-20|SplitMask|Arxiv 2021|[Are Large-scale Datasets Necessary for Self-Supervised Pre-training?](https://arxiv.org/pdf/2112.10740.pdf)|None

2022-01-31|ADIOS|Arxiv 2022|[Adversarial Masking for Self-Supervised Learning](https://arxiv.org/pdf/2201.13100.pdf)|None

2022-02-07|CAE|Arxiv 2022|[Context Autoencoder for Self-Supervised Representation Learning](https://arxiv.org/pdf/2202.03026.pdf)|[CAE](https://github.com/lxtGH/CAE)

2022-02-07|CIM|Arxiv 2022|[Corrupted Image Modeling for Self-Supervised Visual Pre-Training](https://arxiv.org/pdf/2202.03382.pdf)|None

2022-03-10|MVP|Arxiv 2022|[MVP: Multimodality-guided Visual Pre-training](https://arxiv.org/pdf/2203.05175.pdf)|None

2022-03-23|AttMask|ECCV 2022|[What to Hide from Your Students: Attention-Guided Masked Image Modeling](https://arxiv.org/pdf/2203.12719.pdf)|[AttMask](https://github.com/gkakogeorgiou/attmask)

2022-03-29|mc-BEiT|Arxiv 2022|[mc-BEiT: Multi-choice Discretization for Image BERT Pre-training](https://arxiv.org/pdf/2203.15371.pdf)|None

2022-04-18|Ge2-AE|Arxiv 2022|[The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training](https://arxiv.org/pdf/2204.08227.pdf)|None

2022-05-08|MCMAE|NeurIPS 2022|[MCMAE: Masked Convolution Meets Masked Autoencoders](https://arxiv.org/pdf/2205.03892.pdf)|[MCMAE](https://github.com/alpha-vl/convmae)

2022-05-20|UM-MAE|Arxiv 2022|[Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality](https://arxiv.org/pdf/2205.10063.pdf)|[UM-MAE](https://github.com/implus/UM-MAE)

2022-05-26|GreenMIM|Arxiv 2022|[Green Hierarchical Vision Transformer for Masked Image Modeling](https://arxiv.org/pdf/2205.13515.pdf)|[GreenMIM](https://github.com/LayneH/GreenMIM)

2022-05-26|MixMIM|Arxiv 2022|[MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning](https://arxiv.org/pdf/2205.13137.pdf)|[Code is Opening](https://github.com/Sense-X/MixMIM)

2022-05-28|SupMAE|Arxiv 2022|[SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners](https://arxiv.org/pdf/2205.14540.pdf)|[SupMAE](https://github.com/cmu-enyac/supmae)

2022-05-30|HiViT|Arxiv 2022|[HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling](https://arxiv.org/abs/2205.14949.pdf)|None

2022-06-01|LoMaR|Arxiv 2022|[Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction](https://arxiv.org/pdf/2206.00790.pdf)|[LoMaR](https://github.com/junchen14/LoMaR)

2022-06-22|SemMAE|NeurIPS 2022|[SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders](https://arxiv.org/pdf/2206.10207.pdf)|[SemMAE](https://github.com/ucasligang/SemMAE)

2022-08-11|MILAN|Arxiv 2022|[MILAN: Masked Image Pretraining on Language Assisted Representation](https://arxiv.org/pdf/2208.06049.pdf)|[MILAN](https://github.com/zejiangh/MILAN)

2022-11-14|EVA|Arxiv 2022|[EVA: Exploring the Limits of Masked Visual Representation Learning at Scale](https://arxiv.org/pdf/2211.07636.pdf)|[EVA](https://github.com/baaivision/EVA)

2022-11-28|AMT|AAAI 2023|[Good helper is around you: Attention-driven Masked Image Modeling](https://arxiv.org/pdf/2211.15362.pdf)|[AMT](https://github.com/guijiejie/AMT)

2023-01-03|TinyMIM|CVPR 2023|[TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models](https://arxiv.org/abs/2301.01296)|[TinyMIM](https://github.com/OliverRensu/TinyMIM)

2023-03-04|PixMIM|Arxiv 2023|[PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling](https://arxiv.org/pdf/2303.02416.pdf)|[PixMIM](https://github.com/open-mmlab/mmselfsup)

2023-03-09|LocalMIM|CVPR 2023|[Masked Image Modeling with Local Multi-Scale Reconstruction](https://arxiv.org/abs/2303.05251)|[LocalMIM](https://github.com/huawei-noah/Efficient-Computing/tree/master/Self-supervised/LocalMIM)

2023-03-12|AutoMAE|Arxiv 2023|[Improving Masked Autoencoders by Learning Where to Mask](https://arxiv.org/abs/2303.06583)|AutoMAE

2023-03-15|DeepMIM|Arxiv 2023|[DeepMIM: Deep Supervision for Masked Image Modeling](https://arxiv.org/abs/2303.08817)|[DeepMIM](https://github.com/OliverRensu/DeepMIM)

2023-04-25|Img2Vec|Arxiv 2023|[Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders](https://arxiv.org/pdf/2304.12535.pdf)|None

2023-12-30|DTM|Arxiv 2023|[Masked Image Modeling via Dynamic Token Morphing](https://arxiv.org/pdf/2401.00254.pdf)|None

2024-11-24|PR-MIM|Arxiv 2024|[PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling](https://arxiv.org/pdf/2411.15746)|None

# Others:

## Object detection.

Date|Method|Conference|Title|Code

-----|----|-----|-----|-----

2022-04-06|MIMDet|Arxiv 2022|[Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection](https://arxiv.org/pdf/2204.02964.pdf)|[MIMDet](https://github.com/hustvl/MIMDet)

## 3D.

Date|Method|Conference|Title|Code

-----|----|-----|-----|-----

2021-11-29|Point-BERT|CVPR 2022|[Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling](https://arxiv.org/pdf/2111.14819.pdf)|[Point-BERT](https://github.com/lulutang0608/Point-BERT)

2022-03-28|Point-MAE|ECCV 2022|[Masked Autoencoders for Point Cloud Self-supervised Learning](https://arxiv.org/pdf/2203.06604.pdf)|[Point-MAE](https://github.com/Pang-Yatian/Point-MAE)

2022-05-28|Point-M2AE|NeurIPS 2022|[Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training](https://arxiv.org/pdf/2205.14401.pdf)|[Point-M2AE](https://github.com/ZrrSkywalker/Point-M2AE)

2022-12-13|I2P-MAE|CVPR 2023|[Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders](https://arxiv.org/pdf/2212.06785.pdf)|[I2P-MAE](https://github.com/ZrrSkywalker/I2P-MAE)

2024-04-01|NeRF-MAE|ECCV 2024|[NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields](https://arxiv.org/pdf/2404.01300)|[NeRF-MAE](https://github.com/zubair-irshad/NeRF-MAE)

## Image generation.

Date|Method|Conference|Title|Code

-----|----|-----|-----|-----

2022-02-08|MaskGIT|Arxiv 2022|[MaskGIT: Masked Generative Image Transformer](https://arxiv.org/pdf/2202.04200.pdf)|None

## Unsupervised Domain Adaptation.

Date|Method|Conference|Title|Code

-----|----|-----|-----|-----

2023-06-18|MIC|CVPR 2023|[MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation](https://arxiv.org/pdf/2212.01322.pdf)|None

## Video.

Date|Method|Conference|Title|Code

-----|----|-----|-----|-----

2021-12-02|BEVT|Arxiv 2021|[BEVT: BERT Pretraining of Video Transformers](https://arxiv.org/pdf/2112.01529.pdf)|[BEVT](https://github.com/xyzforever/BEVT)

2022-03-23|VideoMAE|NeurIPS 2022|[VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602)|[VideoMAE](https://github.com/MCG-NJU/VideoMAE)

2022-05-18|MAE_ST|NeurIPS 2022|[Masked Autoencoders As Spatiotemporal Learners](https://arxiv.org/pdf/2205.09113.pdf)|[MAE_ST](https://github.com/facebookresearch/mae_st)

2023-03-29|VideoMAE v2|CVPR 2023|[VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking](https://arxiv.org/pdf/2303.16727)|None

## Multi-modal.

Date|Method|Conference|Title|Code

-----|----|-----|-----|-----

2022-04-04|MultiMAE|Arxiv 2022|[MultiMAE: Multi-modal Multi-task Masked Autoencoders](https://arxiv.org/pdf/2204.01678.pdf)|[MultiMAE](https://github.com/EPFL-VILAB/MultiMAE)

2022-05-27|M3AE|Arxiv 2022|[Multimodal Masked Autoencoders Learn Transferable Representations](https://arxiv.org/pdf/2205.14204.pdf)| None

2022-08-03|xxx|Arxiv 2022|[Masked Vision and Language Modeling for Multi-modal Representation Learning](https://arxiv.org/pdf/2208.02131.pdf)| None

2022-12-01|FLIP|Arxiv 2022|[Scaling Language-Image Pre-training via Masking](https://arxiv.org/pdf/2212.00794.pdf)| None

## Medical.

Date|Method|Conference|Title|Code

-----|----|-----|-----|-----

2022-03-10|MedMAE|Arxiv 2022|[Self Pre-training with Masked Autoencoders for Medical Image Analysis](https://arxiv.org/pdf/2203.05573.pdf)|None

## Analysis.

Date|Method|Conference|Title

-----|-----|-----|-----

2022-08-08|RelaxMIM|Arxiv 2022|[Understanding Masked Image Modeling via Learning Occlusion Invariant Feature](https://arxiv.org/pdf/2208.04164.pdf)

## Survey.

Date|Conference|Title

-----|-----|-----

2022-07-30|Arxiv 2022|[A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond](https://arxiv.org/pdf/2208.00173.pdf)

2023-12-31|Arxiv 2023|[Masked Modeling for Self-supervised Representation Learning on Vision and Beyond](https://arxiv.org/pdf/2401.00897.pdf)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ucasligang/awesome-MIM

Awesome Lists containing this project

README