https://github.com/arig23498/mae-scalable-vision-learners

A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners
https://github.com/arig23498/mae-scalable-vision-learners

autoencoder keras masked-image-modeling self-supervised-learning tensorflow2

Last synced: about 1 month ago
JSON representation

A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners

Host: GitHub
URL: https://github.com/arig23498/mae-scalable-vision-learners
Owner: ariG23498
License: mit
Created: 2021-11-16T05:52:40.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2022-05-10T07:27:34.000Z (about 3 years ago)
Last Synced: 2025-04-30T15:46:06.854Z (about 1 month ago)
Topics: autoencoder, keras, masked-image-modeling, self-supervised-learning, tensorflow2
Language: Jupyter Notebook
Homepage: https://keras.io/examples/vision/masked_image_modeling/
Size: 38.7 MB
Stars: 78
Watchers: 4
Forks: 15
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Masked Autoencoders Are Scalable Vision Learners

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ariG23498/mae-scalable-vision-learners/blob/master/mae-pretraining.ipynb) [![](https://img.shields.io/badge/blog-keras.io-%23d00000)](https://keras.io/examples/vision/masked_image_modeling/)

A TensorFlow implementation of Masked Autoencoders Are Scalable Vision Learners [1]. Our implementation of the proposed method is  available in

[`mae-pretraining.ipynb`](https://github.com/ariG23498/mae-scalable-vision-learners/blob/master/mae-pretraining.ipynb) notebook. It includes evaluation with **linear probing** as well. Furthermore, the notebook can be fully executed on Google Colab. Our main objective is to present the core idea of the proposed method in a minimal and readable manner. We have also [prepared a blog](https://keras.io/examples/vision/masked_image_modeling/) for getting started with Masked Autoencoder easily. 



  


  Source: Masked Autoencoders Are Scalable Vision Learners




With just **100 epochs** of pre-training and a fairly lightweight and asymmetric Autoencoder architecture we achieve **49.33%%** accuracy

with linear probing on the **CIFAR-10** dataset. Our training logs and encoder weights are released in [`Weights and Logs`](https://github.com/ariG23498/mae-scalable-vision-learners/releases/tag/v1.0.0). 

For comparison, we took the encoder architecture and trained it from scratch (refer to [`regular-classification.ipynb`](https://github.com/ariG23498/mae-scalable-vision-learners/blob/master/regular-classification.ipynb)) in a fully supervised manner. This gave us ~76% test top-1 accuracy.

_We note that with further hyperparameter tuning and more epochs of pre-training, we can achieve a better performance

with linear-probing._  Below we present some more results:

| Config | Masking
proportion | LP
performance | Encoder weights
& logs |

|:---:|:---:|:---:|:---:|

| Encoder & decoder layers: 3 & 1
Batch size: 256 | 0.6 | 44.25% | [Link](https://github.com/ariG23498/mae-scalable-vision-learners/releases/download/v1.0.0/44_25.zip) |

| Do | 0.75 | 46.84% | [Link](https://github.com/ariG23498/mae-scalable-vision-learners/releases/download/v1.0.0/46_84.zip) |

| Encoder & decoder layers: 6 & 2
Batch size: 256 | 0.75 | 48.16% | [Link](https://github.com/ariG23498/mae-scalable-vision-learners/releases/download/v1.0.0/48_16.zip) |

| Encoder & decoder layers: 9 & 3
Batch size: 256
Weight deacy: 1e-5 | 0.75 | 49.33% | [Link](https://github.com/ariG23498/mae-scalable-vision-learners/releases/download/v1.0.0/49_33.zip) |

^{LP denotes linear-probing. Config is mostly based on what we define in the hyperparameters

section of this notebook: `mae-pretraining.ipynb`.}

## Notes

* This project received the [Google OSS Expert Prize](https://www.kaggle.com/general/316181) (March 2022).

## Acknowledgements

* [Xinlei Chen](http://xinleic.xyz/) (one of the authors of the original paper)

* [Google Developers Experts Program](https://developers.google.com/programs/experts/) and [JarvisLabs](https://jarvislabs.ai/) for providing credits to perform extensive experimentation on A100 GPUs.

## References

[1] Masked Autoencoders Are Scalable Vision Learners; He et al.; arXiv 2021; https://arxiv.org/abs/2111.06377.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arig23498/mae-scalable-vision-learners

Awesome Lists containing this project

README