Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Sense-X/MixMIM
MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
https://github.com/Sense-X/MixMIM
masked-image-modeling transformer
Last synced: 3 months ago
JSON representation
MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning
- Host: GitHub
- URL: https://github.com/Sense-X/MixMIM
- Owner: Sense-X
- License: mit
- Created: 2022-05-19T07:59:57.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-07-02T11:28:41.000Z (over 1 year ago)
- Last Synced: 2024-05-29T17:34:44.185Z (5 months ago)
- Topics: masked-image-modeling, transformer
- Language: Python
- Homepage:
- Size: 649 KB
- Stars: 123
- Watchers: 8
- Forks: 6
- Open Issues: 19
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- Awesome-Mixup - [Code
README
## Pytorch implementation of [MixMAE](https://arxiv.org/abs/2205.13137) (CVPR 2023)
![tenser](figures/mixmae.png)
This repo is the offcial implementation of the paper [MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers](https://arxiv.org/abs/2205.13137)
```
@article{MixMAE,
author = {Jihao Liu, Xin Huang, Jinliang Zheng, Yu Liu, Hongsheng Li},
journal = {arXiv:2205.13137},
title = {MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers},
year = {2022},
}
```### Availble pretrained models
|Models | Params (M) | FLOPs (G) | Pretrain Epochs | Top-1 Acc. | Pretrain_ckpt | Finetune_ckpt |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Swin-B/W14 | 88 | 16.3 | 600 | 85.1 | [base_600ep](https://drive.google.com/file/d/1pZYmTv08xK_kOe2kk6ahuvgJVkHm-ZIa/view?usp=sharing) | [base_600ep_ft](https://drive.google.com/file/d/1zkOyh8jnFW7iYG3sOfp6LLG5wu4VbiXb/view?usp=sharing)|
| Swin-B/W16-384x384 | 89.6 | 52.6 | 600 | 86.3 | [base_600ep](https://drive.google.com/file/d/1pZYmTv08xK_kOe2kk6ahuvgJVkHm-ZIa/view?usp=sharing) | [base_600ep_ft_384x384](https://drive.google.com/file/d/1MIng19USn5T770YZ6mFfqTgNCCz_kEGL/view?usp=sharing)|
| Swin-L/W14 | 197 | 35.9 | 600 | 85.9 | [large_600ep](https://drive.google.com/file/d/1dM8Lu2nVEukxPwn7PLmDmRAYwQV59ttx/view?usp=sharing) | [large_600ep_ft](https://drive.google.com/file/d/1b1BxGAewK1ICxxCEwF24YEDSjlQ9Ts9n/view?usp=sharing) |
| Swin-L/W16-384x384 | 199 | 112 | 600 | 86.9 | [large_600ep](https://drive.google.com/file/d/1dM8Lu2nVEukxPwn7PLmDmRAYwQV59ttx/view?usp=sharing) | [large_600ep_ft_384x384](https://drive.google.com/file/d/1_IfqoQvAe2Z2jC7HBKi3umKD6c8qOu0P/view?usp=sharing)|### Training and evaluation
We use [Slurm](https://slurm.schedmd.com/documentation.html) for multi-node distributed pretraining and finetuning.
#### Pretrain
```
sh exp/base_600ep/pretrain.sh partition 16 /path/to/imagenet
```
- Training with 16 GPUs on your partition.
- Batch size is 128 * 16 = 2048.
- Default setting is to train for 600 epochs with mask ratio of 0.5.#### Finetune
```
sh exp/base_600ep/finetune.sh partition 8 /path/to/imagenet
```
- Training with 8 GPUs on your partition.
- Batch size is 128 * 8 = 1024.
- Default setting is to finetune for 100 epochs.