Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/facebookresearch/mae
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
https://github.com/facebookresearch/mae
Last synced: 5 days ago
JSON representation
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
- Host: GitHub
- URL: https://github.com/facebookresearch/mae
- Owner: facebookresearch
- License: other
- Created: 2021-12-06T21:29:09.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-07-23T18:15:40.000Z (6 months ago)
- Last Synced: 2024-12-31T08:03:26.488Z (12 days ago)
- Language: Python
- Homepage:
- Size: 787 KB
- Stars: 7,464
- Watchers: 57
- Forks: 1,230
- Open Issues: 126
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-multi-modal - https://github.com/facebookresearch/mae
- awesome-multi-modal - https://github.com/facebookresearch/mae
README
## Masked Autoencoders: A PyTorch Implementation
This is a PyTorch/GPU re-implementation of the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377):
```
@Article{MaskedAutoencoders2021,
author = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
journal = {arXiv:2111.06377},
title = {Masked Autoencoders Are Scalable Vision Learners},
year = {2021},
}
```* The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.
* This repo is a modification on the [DeiT repo](https://github.com/facebookresearch/deit). Installation and preparation follow that repo.
* This repo is based on [`timm==0.3.2`](https://github.com/rwightman/pytorch-image-models), for which a [fix](https://github.com/rwightman/pytorch-image-models/issues/420#issuecomment-776459842) is needed to work with PyTorch 1.8.1+.
### Catalog
- [x] Visualization demo
- [x] Pre-trained checkpoints + fine-tuning code
- [x] Pre-training code### Visualization demo
Run our interactive visualization demo using [Colab notebook](https://colab.research.google.com/github/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb) (no GPU needed):
### Fine-tuning with pre-trained checkpoints
The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:
ViT-Base
ViT-Large
ViT-Hugepre-trained checkpoint
download
download
downloadmd5
8cad7c
b8b06e
9bdbb0The fine-tuning instruction is in [FINETUNE.md](FINETUNE.md).
By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):
ViT-B
ViT-L
ViT-H
ViT-H448
prev bestImageNet-1K (no external data)
83.6
85.9
86.9
87.8
87.1following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):
ImageNet-Corruption (error rate)
51.7
41.8
33.8
36.8
42.5ImageNet-Adversarial
35.9
57.1
68.2
76.7
35.8ImageNet-Rendition
48.3
59.9
64.4
66.5
48.7ImageNet-Sketch
34.5
45.3
49.6
50.9
36.0following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:
iNaturalists 2017
70.5
75.7
79.3
83.4
75.4iNaturalists 2018
75.4
80.1
83.0
86.8
81.2iNaturalists 2019
80.5
83.4
85.7
88.3
84.1Places205
63.9
65.8
65.9
66.8
66.0Places365
57.9
59.4
59.8
60.3
58.0### Pre-training
The pre-training instruction is in [PRETRAIN.md](PRETRAIN.md).
### License
This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.