Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/LayneH/GreenMIM

[NeurIPS2022] Official implementation of the paper 'Green Hierarchical Vision Transformer for Masked Image Modeling'.
https://github.com/LayneH/GreenMIM

efficient-deep-learning masked-image-modeling pytorch self-supervised-learning vision-transformer

Last synced: about 2 months ago
JSON representation

[NeurIPS2022] Official implementation of the paper 'Green Hierarchical Vision Transformer for Masked Image Modeling'.

Host: GitHub
URL: https://github.com/LayneH/GreenMIM
Owner: LayneH
License: other
Created: 2022-05-26T15:06:28.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-01-16T08:17:25.000Z (over 1 year ago)
Last Synced: 2023-11-07T17:06:33.784Z (8 months ago)
Topics: efficient-deep-learning, masked-image-modeling, pytorch, self-supervised-learning, vision-transformer
Language: Python
Homepage:
Size: 1.39 MB
Stars: 152
Watchers: 3
Forks: 6
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

Awesome-MIM - [Code

README

        # GreenMIM

This is the official PyTorch implementation of the NeurIPS 2022 paper [Green Hierarchical Vision Transformer for Masked Image Modeling](https://arxiv.org/abs/2205.13515). GreenMIM consists of two key desgins, `Group Window Attention` and `Sparse Convolution`. It offers 2.7x faster pre-training and competitive performance on hierarchical vision transformers, e.g., Swin/Twins Transformers.



  





  Group Attention Scheme.





  





  Method Overview.



## Citation

If you find our work interesting or use our code/models, please cite:

```bibtex

@article{huang2022green,

  title={Green Hierarchical Vision Transformer for Masked Image Modeling},

  author={Huang, Lang and You, Shan and Zheng, Mingkai and Wang, Fei and Qian, Chen and Yamasaki, Toshihiko},

  journal={Thirty-Sixth Conference on Neural Information Processing Systems},

  year={2022}

}

```

## News

- 2023.01: We have refactor the structure of this codebase, supporting *most*, if not any, vision transformer backbones with various input resolutions. Checkout our implementation of GreenMIM with Twins Transformer [here](modeling/green_twins_models.py).

## Catalogs

- [x] Pre-trained checkpoints

- [x] Pre-training code for `Swin Transformer` and `Twins Transformer`

- [x] Fine-tuning code

## Pre-trained Models

Swin-Base (Window 7x7)

Swin-Base (Window 14x14)

Swin-Large (Window 14x14)

pre-trained checkpoint

Download

Download

Download

## Pre-training

The pre-training scripts are given in the `scripts/` folder. The scripts with names start with 'run*' are for non-slurm users while the others are for slurm users.

#### For Non-Slurm Users

To train a Swin-B with on a single node with 8 GPUs.

```bash

PORT=23456 NPROC=8 bash scripts/run_greenmim_swin_base.sh

```

#### For Slurm Users

To train a Swin-B with on a single node with 8 GPUs.

```bash

bash scripts/srun_greenmim_swin_base.sh [Partition] [NUM_GPUS] 

```

## Fine-tuning on ImageNet-1K

| Model | #Params | Pre-train Resolution | Fine-tune Resolution | Config | Acc@1 (%) |

| :---- | ------- | -------------------- | -------------------- | ------ | --------- |

| Swin-B (Window 7x7) | 88M | 224x224 | 224x224 |  [Config](ft_configs/greenmim_finetune_swin_base_img224_win7.yaml)  | 83.8 |

| Swin-L (Window 14x14) | 197M | 224x224 | 224x224 | [Config](ft_configs/greenmim_finetune_swin_large_img224_win14.yaml)  | 85.1 |

Currently, we directly use the code of [SimMIM](https://github.com/microsoft/SimMIM) for fine-tuning, please follow [their instructions](https://github.com/microsoft/SimMIM#fine-tuning-pre-trained-models) to use the configs. NOTE that, due to the limited computing resource, we use a batch size of a batch size of 768 (48 x 16) for fine-tuning.

# Acknowledgement

This code is based on the implementations of [MAE](https://github.com/facebookresearch/mae), [SimMIM](https://github.com/microsoft/SimMIM), [BEiT](https://github.com/microsoft/unilm/tree/master/beit), [SwinTransformer](https://github.com/microsoft/Swin-Transformer), [Twins Transformer](https://github.com/Meituan-AutoML/Twins), and [DeiT](https://github.com/facebookresearch/deit).

# License

This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.