Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/SwinTransformer/Feature-Distillation
https://github.com/SwinTransformer/Feature-Distillation
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/SwinTransformer/Feature-Distillation
- Owner: SwinTransformer
- License: mit
- Created: 2022-05-27T07:57:57.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-11-30T08:49:53.000Z (about 2 years ago)
- Last Synced: 2024-08-01T18:26:18.666Z (4 months ago)
- Language: Python
- Size: 175 KB
- Stars: 231
- Watchers: 9
- Forks: 11
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-colab-project - Feature-Distillation
README
# Feature-Distillation
By [Yixuan Wei](https://scholar.google.com/citations?user=xwudKb4AAAAJ&hl=en)\*, [Han Hu](https://ancientmooner.github.io/)\*, [Zhenda Xie](https://zdaxie.github.io), [Zheng Zhang](https://stupidzz.github.io/), [Yue Cao](http://yue-cao.me), [Jianmin Bao](https://jianminbao.github.io/), [Dong Chen](http://www.dongchen.pro) and [Baining Guo](https://scholar.google.com/citations?user=h4kYmRYAAAAJ&hl=en&oi=ao).
This repo is the official implementation of ["Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation"](https://arxiv.org/abs/2205.14141).
## Updates
***11/30/2022***1. Distilled and fine-tuned models on ImageNet-1K (`ViT Large`) are provided.
***11/28/2022***
Initial commits:
1. Distilled and fine-tuned models on ImageNet-1K (`Swin Base`, and `ViT Base`) are provided.
2. The supported code for ImageNet-1K distillation and fine-tuning is provided.## Introduction
**FD** is initially described in [arxiv](https://arxiv.org/abs/2205.14141), which is a simple framework to convert the traditional pre-training models, such as image classification (DeiT), instance contrastive learning (DINO) and image-text alignment (CLIP) into new models with better fine-tuning performances. Through a set of diagosing tools, we find that the models distilled with feature map are endowed with following good properties which are also revealed in masked image modeling models: 1) more diverse attention heads; 2) more diagonal attention patterns; 3) flatten loss landscapes.
## Main Results on ImageNet
### Swin Transformer
**ImageNet-1K Distilled and Fine-tuned Models**
| name | distillation epochs | teacher model | image resolution | acc@1 | distilled model | fine-tuned model |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Swin-Base | 300 | [EsViT-Base](https://github.com/microsoft/esvit) | 224x224 | 85.1 | [google](https://drive.google.com/file/d/11_GQUHgcrUO8PMzl73eJmLSa7f3c5dZY/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__esvit_swin_base__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1criliGcjpEJxqlsYRGBERBAMYrFYFW--/view?usp=sharing)/[config](configs/finetune/fd_finetune__esvit_swin_base__img224__300ep.yaml) |### Vision Transformer
**ImageNet-1K Distilled and Fine-tuned Models**
| name | distillation epochs | teacher model | image resolution | acc@1 | distilled model | fine-tuned model |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ViT-Base | 300 | [CLIP-Base](https://github.com/openai/CLIP) | 224x224 | 84.9 | [google](https://drive.google.com/file/d/1XFOZ6rJkv5X08Bu5d04_Xy3iJOj6SLc7/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__clip_vit_base__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1mP_JESmcdFeIkpB4aYyFzALtkydy_9qN/view?usp=sharing)/[config](configs/finetune/fd_finetune__clip_vit_base__img224__300ep.yaml) |
| ViT-Base | 300 | [DINO-Base](https://github.com/facebookresearch/dino) | 224x224 | 83.8 | [google](https://drive.google.com/file/d/1fwBINMxpv5zFOI7Ye6l9msI8GzocpA3z/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__dino_vit_base__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1Mn_GgepfZXOe7W0UqEQMFo5MjJpMwM_i/view?usp=sharing)/[config](configs/finetune/fd_finetune__dino_vit_base__img224__300ep.yaml) |
| ViT-Base | 300 | [DeiT-Base](https://github.com/facebookresearch/deit) | 224x224 | 83.0 | [google](https://drive.google.com/file/d/1yPezioDc4O6hdfD6VSAIU9DvJiXG4ZSJ/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__deit_vit_base__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1pb0KUlVcCaEGT-xnx6ookrqcC-88Ori5/view?usp=sharing)/[config](configs/finetune/fd_finetune__deit_vit_base__img224__300ep.yaml) |
| ViT-Large | 300 | [CLIP-Large](https://github.com/openai/CLIP) | 224x224 | 87.7 | [google](https://drive.google.com/file/d/1H5USyzqwoS31JHDX874q8a70LdVD9zNY/view?usp=sharing)/[config](configs/pretrain/fd_pretrain__clip_vit_large__img224__300ep.yaml) | [google](https://drive.google.com/file/d/1XDDbDl9jzt8H2Fy6iZNfNA7Yjepf_MGx/view?usp=sharing)/[config](configs/finetune/fd_finetune__clip_vit_large__img224__300ep.yaml) |## Citation
If you find our work useful in your research, please cite:
```
@article{wei2022FD,
title={Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation},
author={Yixuan Wei and Han Hu and Zhenda Xie and Zheng Zhang and Yue Cao and Jianmin Bao and Dong Chen and Baining Guo},
journal={Tech Report},
year={2022}
}
```## Getting Started
### Installation
- Install `CUDA 11.3` with `cuDNN 8` following the official installation guide of [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) and [cuDNN](https://developer.nvidia.com/rdp/cudnn-archive).
- Setup conda environment:
```bash
# Create environment
conda create -n FD python=3.8 -y
conda activate FD# Install requirements
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113# Clone codes
git clone https://github.com/SwinTransformer/Feature-Distillation
cd Feature-Distillation# Install other requirements
pip install -r requirements.txt
```### Feature-Distillation
To distill models, run:
```bash
python -m torch.distributed.launch --nproc_per_node main_fd.py \
--cfg --data-path /train [--batch-size --output --tag ]
```For example, to distill `CLIP-Base` for 300 epochs on one DGX-2 server, run:
```bash
python -m torch.distributed.launch --nproc_per_node=16 main_fd.py --cfg configs/pretrain/fd_pretrain__clip_vit_base__img224__300ep.yaml --batch-size 128 --data-path /train [--output --tag ]
```If you want to save gpu memory consumption, add `--use-checkpoint`.
### Fine-tuning distilled models
To fine-tune distilled models, run:
```bash
python -m torch.distributed.launch --nproc_per_node main_finetune.py \
--cfg --data-path --pretrained [--batch-size --output --tag ]
```For example, to fine-tune `Distilled-CLIP-Base` on one DGX-2 server, run:
```bash
python -m torch.distributed.launch --nproc_per_node 16 main_finetune.py \
--cfg configs/finetune/fd_finetune__clip_vit_base__img224__300ep.yaml --batch-size 128 --data-path --pretrained [--output --tag ]
```