Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/facebookresearch/mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
https://github.com/facebookresearch/mae

Last synced: 3 days ago
JSON representation

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Awesome Lists containing this project

README

        

## Masked Autoencoders: A PyTorch Implementation



This is a PyTorch/GPU re-implementation of the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377):
```
@Article{MaskedAutoencoders2021,
author = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
journal = {arXiv:2111.06377},
title = {Masked Autoencoders Are Scalable Vision Learners},
year = {2021},
}
```

* The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.

* This repo is a modification on the [DeiT repo](https://github.com/facebookresearch/deit). Installation and preparation follow that repo.

* This repo is based on [`timm==0.3.2`](https://github.com/rwightman/pytorch-image-models), for which a [fix](https://github.com/rwightman/pytorch-image-models/issues/420#issuecomment-776459842) is needed to work with PyTorch 1.8.1+.

### Catalog

- [x] Visualization demo
- [x] Pre-trained checkpoints + fine-tuning code
- [x] Pre-training code

### Visualization demo

Run our interactive visualization demo using [Colab notebook](https://colab.research.google.com/github/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb) (no GPU needed):



### Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

ViT-Base
ViT-Large
ViT-Huge

pre-trained checkpoint
download
download
download

md5
8cad7c
b8b06e
9bdbb0

The fine-tuning instruction is in [FINETUNE.md](FINETUNE.md).

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

ViT-B
ViT-L
ViT-H
ViT-H448
prev best

ImageNet-1K (no external data)
83.6
85.9
86.9
87.8
87.1

following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):

ImageNet-Corruption (error rate)
51.7
41.8
33.8
36.8
42.5

ImageNet-Adversarial
35.9
57.1
68.2
76.7
35.8

ImageNet-Rendition
48.3
59.9
64.4
66.5
48.7

ImageNet-Sketch
34.5
45.3
49.6
50.9
36.0

following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:

iNaturalists 2017
70.5
75.7
79.3
83.4
75.4

iNaturalists 2018
75.4
80.1
83.0
86.8
81.2

iNaturalists 2019
80.5
83.4
85.7
88.3
84.1

Places205
63.9
65.8
65.9
66.8
66.0

Places365
57.9
59.4
59.8
60.3
58.0

### Pre-training

The pre-training instruction is in [PRETRAIN.md](PRETRAIN.md).

### License

This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.