https://github.com/facebookresearch/mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
https://github.com/facebookresearch/mae

Last synced: 2 months ago
JSON representation

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Host: GitHub
URL: https://github.com/facebookresearch/mae
Owner: facebookresearch
License: other
Created: 2021-12-06T21:29:09.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-07-23T18:15:40.000Z (12 months ago)
Last Synced: 2025-04-09T02:15:27.149Z (3 months ago)
Language: Python
Homepage:
Size: 787 KB
Stars: 7,708
Watchers: 56
Forks: 1,263
Open Issues: 130
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

awesome-multi-modal - https://github.com/facebookresearch/mae
awesome-multi-modal - https://github.com/facebookresearch/mae

README

## Masked Autoencoders: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377):
```
@Article{MaskedAutoencoders2021,
author = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
journal = {arXiv:2111.06377},
title = {Masked Autoencoders Are Scalable Vision Learners},
year = {2021},
}
```

* The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.

* This repo is a modification on the [DeiT repo](https://github.com/facebookresearch/deit). Installation and preparation follow that repo.

* This repo is based on [`timm==0.3.2`](https://github.com/rwightman/pytorch-image-models), for which a [fix](https://github.com/rwightman/pytorch-image-models/issues/420#issuecomment-776459842) is needed to work with PyTorch 1.8.1+.

### Catalog

- [x] Visualization demo
- [x] Pre-trained checkpoints + fine-tuning code
- [x] Pre-training code

### Visualization demo

Run our interactive visualization demo using [Colab notebook](https://colab.research.google.com/github/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb) (no GPU needed):

### Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

ViT-Base
ViT-Large
ViT-Huge

pre-trained checkpoint
download
download
download

md5
8cad7c
b8b06e
9bdbb0

The fine-tuning instruction is in [FINETUNE.md](FINETUNE.md).

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

ViT-B
ViT-L
ViT-H
ViT-H₄₄₈
prev best

ImageNet-1K (no external data)
83.6
85.9
86.9
87.8
87.1

following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):

ImageNet-Corruption (error rate)
51.7
41.8
33.8
36.8
42.5

ImageNet-Adversarial
35.9
57.1
68.2
76.7
35.8

ImageNet-Rendition
48.3
59.9
64.4
66.5
48.7

ImageNet-Sketch
34.5
45.3
49.6
50.9
36.0

following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:

iNaturalists 2017
70.5
75.7
79.3
83.4
75.4

iNaturalists 2018
75.4
80.1
83.0
86.8
81.2

iNaturalists 2019
80.5
83.4
85.7
88.3
84.1

Places205
63.9
65.8
65.9
66.8
66.0

Places365
57.9
59.4
59.8
60.3
58.0

### Pre-training

The pre-training instruction is in [PRETRAIN.md](PRETRAIN.md).

### License

This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/facebookresearch/mae

Awesome Lists containing this project

README