Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/LayneH/GreenMIM
[NeurIPS2022] Official implementation of the paper 'Green Hierarchical Vision Transformer for Masked Image Modeling'.
https://github.com/LayneH/GreenMIM
efficient-deep-learning masked-image-modeling pytorch self-supervised-learning vision-transformer
Last synced: about 2 months ago
JSON representation
[NeurIPS2022] Official implementation of the paper 'Green Hierarchical Vision Transformer for Masked Image Modeling'.
- Host: GitHub
- URL: https://github.com/LayneH/GreenMIM
- Owner: LayneH
- License: other
- Created: 2022-05-26T15:06:28.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-16T08:17:25.000Z (over 1 year ago)
- Last Synced: 2023-11-07T17:06:33.784Z (8 months ago)
- Topics: efficient-deep-learning, masked-image-modeling, pytorch, self-supervised-learning, vision-transformer
- Language: Python
- Homepage:
- Size: 1.39 MB
- Stars: 152
- Watchers: 3
- Forks: 6
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- Awesome-MIM - [Code
README
# GreenMIM
This is the official PyTorch implementation of the NeurIPS 2022 paper [Green Hierarchical Vision Transformer for Masked Image Modeling](https://arxiv.org/abs/2205.13515). GreenMIM consists of two key desgins, `Group Window Attention` and `Sparse Convolution`. It offers 2.7x faster pre-training and competitive performance on hierarchical vision transformers, e.g., Swin/Twins Transformers.
![]()
Group Attention Scheme.
![]()
Method Overview.## Citation
If you find our work interesting or use our code/models, please cite:```bibtex
@article{huang2022green,
title={Green Hierarchical Vision Transformer for Masked Image Modeling},
author={Huang, Lang and You, Shan and Zheng, Mingkai and Wang, Fei and Qian, Chen and Yamasaki, Toshihiko},
journal={Thirty-Sixth Conference on Neural Information Processing Systems},
year={2022}
}
```## News
- 2023.01: We have refactor the structure of this codebase, supporting *most*, if not any, vision transformer backbones with various input resolutions. Checkout our implementation of GreenMIM with Twins Transformer [here](modeling/green_twins_models.py).## Catalogs
- [x] Pre-trained checkpoints
- [x] Pre-training code for `Swin Transformer` and `Twins Transformer`
- [x] Fine-tuning code## Pre-trained Models
Swin-Base (Window 7x7)
Swin-Base (Window 14x14)
Swin-Large (Window 14x14)pre-trained checkpoint
Download
Download
Download## Pre-training
The pre-training scripts are given in the `scripts/` folder. The scripts with names start with 'run*' are for non-slurm users while the others are for slurm users.
#### For Non-Slurm Users
To train a Swin-B with on a single node with 8 GPUs.
```bash
PORT=23456 NPROC=8 bash scripts/run_greenmim_swin_base.sh
```#### For Slurm Users
To train a Swin-B with on a single node with 8 GPUs.
```bash
bash scripts/srun_greenmim_swin_base.sh [Partition] [NUM_GPUS]
```## Fine-tuning on ImageNet-1K
| Model | #Params | Pre-train Resolution | Fine-tune Resolution | Config | Acc@1 (%) |
| :---- | ------- | -------------------- | -------------------- | ------ | --------- |
| Swin-B (Window 7x7) | 88M | 224x224 | 224x224 | [Config](ft_configs/greenmim_finetune_swin_base_img224_win7.yaml) | 83.8 |
| Swin-L (Window 14x14) | 197M | 224x224 | 224x224 | [Config](ft_configs/greenmim_finetune_swin_large_img224_win14.yaml) | 85.1 |Currently, we directly use the code of [SimMIM](https://github.com/microsoft/SimMIM) for fine-tuning, please follow [their instructions](https://github.com/microsoft/SimMIM#fine-tuning-pre-trained-models) to use the configs. NOTE that, due to the limited computing resource, we use a batch size of a batch size of 768 (48 x 16) for fine-tuning.
# Acknowledgement
This code is based on the implementations of [MAE](https://github.com/facebookresearch/mae), [SimMIM](https://github.com/microsoft/SimMIM), [BEiT](https://github.com/microsoft/unilm/tree/master/beit), [SwinTransformer](https://github.com/microsoft/Swin-Transformer), [Twins Transformer](https://github.com/Meituan-AutoML/Twins), and [DeiT](https://github.com/facebookresearch/deit).# License
This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.