https://github.com/SwinTransformer/Feature-Distillation

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/SwinTransformer/Feature-Distillation
Owner: SwinTransformer
License: mit
Created: 2022-05-27T07:57:57.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-11-30T08:49:53.000Z (over 2 years ago)
Last Synced: 2025-03-24T11:54:52.425Z (3 months ago)
Language: Python
Size: 175 KB
Stars: 253
Watchers: 8
Forks: 11
Open Issues: 18
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-colab-project - Feature-Distillation

README

        # Feature-Distillation

By [Yixuan Wei](https://scholar.google.com/citations?user=xwudKb4AAAAJ&hl=en)\*, [Han Hu](https://ancientmooner.github.io/)\*, [Zhenda Xie](https://zdaxie.github.io), [Zheng Zhang](https://stupidzz.github.io/), [Yue Cao](http://yue-cao.me), [Jianmin Bao](https://jianminbao.github.io/), [Dong Chen](http://www.dongchen.pro) and [Baining Guo](https://scholar.google.com/citations?user=h4kYmRYAAAAJ&hl=en&oi=ao).

This repo is the official implementation of ["Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation"](https://arxiv.org/abs/2205.14141).

## Updates

***11/30/2022***

1. Distilled and fine-tuned models on ImageNet-1K (`ViT Large`) are provided.

***11/28/2022***

Initial commits:

1. Distilled and fine-tuned models on ImageNet-1K (`Swin Base`, and `ViT Base`) are provided.

2. The supported code for ImageNet-1K distillation and fine-tuning is provided.

## Introduction

**FD** is initially described in [arxiv](https://arxiv.org/abs/2205.14141), which is a simple framework to convert the traditional pre-training models, such as image classification (DeiT), instance contrastive learning (DINO) and image-text alignment (CLIP) into new models with better fine-tuning performances. Through a set of diagosing tools, we find that the models distilled with feature map are endowed with following good properties which are also revealed in masked image modeling models: 1) more diverse attention heads; 2) more diagonal attention patterns; 3) flatten loss landscapes. 



    



## Main Results on ImageNet

### Swin Transformer

**ImageNet-1K Distilled and Fine-tuned Models**

| name | distillation epochs | teacher model | image resolution | acc@1 | distilled model | fine-tuned model |

| :---: | :---: | :---: | :---: | :---: | :---: | :---: |

| Swin-Base | 300 | [EsViT-Base](https://github.com/microsoft/esvit) | 224x224 | 85.1 | [google](https://drive.google.com/file/d/11_GQUHgcrUO8PMzl73eJmLSa7f3c5dZY/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__esvit_swin_base__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1criliGcjpEJxqlsYRGBERBAMYrFYFW--/view?usp=sharing)/[config](configs/finetune/fd_finetune__esvit_swin_base__img224__300ep.yaml) |

### Vision Transformer

**ImageNet-1K Distilled and Fine-tuned Models**

| name | distillation epochs | teacher model | image resolution | acc@1 | distilled model | fine-tuned model |

| :---: | :---: | :---: | :---: | :---: | :---: | :---: |

| ViT-Base | 300 | [CLIP-Base](https://github.com/openai/CLIP) | 224x224 | 84.9 | [google](https://drive.google.com/file/d/1XFOZ6rJkv5X08Bu5d04_Xy3iJOj6SLc7/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__clip_vit_base__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1mP_JESmcdFeIkpB4aYyFzALtkydy_9qN/view?usp=sharing)/[config](configs/finetune/fd_finetune__clip_vit_base__img224__300ep.yaml) |

| ViT-Base | 300 | [DINO-Base](https://github.com/facebookresearch/dino) | 224x224 | 83.8 | [google](https://drive.google.com/file/d/1fwBINMxpv5zFOI7Ye6l9msI8GzocpA3z/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__dino_vit_base__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1Mn_GgepfZXOe7W0UqEQMFo5MjJpMwM_i/view?usp=sharing)/[config](configs/finetune/fd_finetune__dino_vit_base__img224__300ep.yaml) |

| ViT-Base | 300 | [DeiT-Base](https://github.com/facebookresearch/deit) | 224x224 | 83.0 | [google](https://drive.google.com/file/d/1yPezioDc4O6hdfD6VSAIU9DvJiXG4ZSJ/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__deit_vit_base__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1pb0KUlVcCaEGT-xnx6ookrqcC-88Ori5/view?usp=sharing)/[config](configs/finetune/fd_finetune__deit_vit_base__img224__300ep.yaml) |

| ViT-Large | 300 | [CLIP-Large](https://github.com/openai/CLIP) | 224x224 | 87.7 | [google](https://drive.google.com/file/d/1H5USyzqwoS31JHDX874q8a70LdVD9zNY/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__clip_vit_large__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1XDDbDl9jzt8H2Fy6iZNfNA7Yjepf_MGx/view?usp=sharing)/[config](configs/finetune/fd_finetune__clip_vit_large__img224__300ep.yaml) |

## Citation

If you find our work useful in your research, please cite:

```

@article{wei2022FD,

  title={Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation},

  author={Yixuan Wei and Han Hu and Zhenda Xie and Zheng Zhang and Yue Cao and Jianmin Bao and Dong Chen and Baining Guo},

  journal={Tech Report},

  year={2022}

}

```

## Getting Started

### Installation

- Install `CUDA 11.3` with `cuDNN 8` following the official installation guide of [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) and [cuDNN](https://developer.nvidia.com/rdp/cudnn-archive).

- Setup conda environment:

```bash

# Create environment

conda create -n FD python=3.8 -y

conda activate FD

# Install requirements

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113

# Clone codes

git clone https://github.com/SwinTransformer/Feature-Distillation

cd Feature-Distillation

# Install other requirements

pip install -r requirements.txt

```

### Feature-Distillation

To distill models, run:

```bash

python -m torch.distributed.launch --nproc_per_node  main_fd.py \ 

--cfg  --data-path /train [--batch-size  --output  --tag ]

```

For example, to distill `CLIP-Base` for 300 epochs on one DGX-2 server, run:

```bash

python -m torch.distributed.launch --nproc_per_node=16 main_fd.py --cfg configs/pretrain/fd_pretrain__clip_vit_base__img224__300ep.yaml --batch-size 128 --data-path /train [--output  --tag ]

```

If you want to save gpu memory consumption, add `--use-checkpoint`.

### Fine-tuning distilled models

To fine-tune distilled models, run:

```bash

python -m torch.distributed.launch --nproc_per_node  main_finetune.py \ 

--cfg  --data-path  --pretrained  [--batch-size  --output  --tag ]

```

For example, to fine-tune `Distilled-CLIP-Base` on one DGX-2 server, run:

```bash

python -m torch.distributed.launch --nproc_per_node 16 main_finetune.py \ 

--cfg configs/finetune/fd_finetune__clip_vit_base__img224__300ep.yaml --batch-size 128 --data-path  --pretrained  [--output  --tag ]

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/SwinTransformer/Feature-Distillation

Awesome Lists containing this project

README