https://github.com/tencentarc/dtn

Official code for "Dynamic Token Normalization Improves Vision Transformer", ICLR 2022.
https://github.com/tencentarc/dtn

Last synced: about 1 year ago
JSON representation

Official code for "Dynamic Token Normalization Improves Vision Transformer", ICLR 2022.

Host: GitHub
URL: https://github.com/tencentarc/dtn
Owner: TencentARC
License: other
Created: 2021-12-04T09:21:11.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2022-05-22T06:11:36.000Z (about 4 years ago)
Last Synced: 2025-03-21T13:23:19.668Z (over 1 year ago)
Language: Python
Homepage:
Size: 366 KB
Stars: 28
Watchers: 4
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: License

Awesome Lists containing this project

README

          # Dynamic Token Normalization Improves Vision Transfromers, ICLR 2022

This is the PyTorch implementation of the paper [Dynamic Token Normalization Improves Vision Transformers](https://arxiv.org/abs/2112.02624) 

in ICLR 2022.

## Dynamic Token Normalization

We design a novel normalization method, termed Dynamic Token Normalization (DTN), which inherits the advantages from LayerNorm and InstanceNorm. DTN can be seamlessly plugged into various transformer models, consistenly improving the performance.



## News

**2022-5-20** We release the code of DTN in training ViT and PVT. More models with DTN will be released soon.

## Main Results

**1. Performance** on ImageNet with ViT and its variants in terms of FLOPs, Parameters, Top-1, and Top-5 accuracies. H and C denote head number and embedding.

| Model | Norm | H | C | FLOPs | Params | Top-1 | Top-5 | 

| :-----| :----: | :----: | :----: | :----: | :----: | :----: | :----: |

| ViT-T | LN | 3 | 192| 1.26G| 5.7M| 72.2|91.3|

| ViT-T* | LN | 4 | 192| 1.26G| 5.7M| 72.3|91.4|

| ViT-T* | **DTN** | 4 | 192| 1.26G| 5.7M| 73.2|91.7|

| ViT-S* | LN | 6 | 384| 4.60G| 22.1M| 79.9|95.0|

| ViT-S* | **DTN** | 6 | 384| 4.88G| 22.1M| 80.6|95.3|

| ViT-B* | LN | 16 | 768| 17.58G| 86.5M| 81.7|95.0|

| ViT-B* | **DTN** | 16 | 768| 18.13G| 86.5M| 82.5|96.1|

**2. Comparison** between various normalizers in terms of Top-1 accuracy on ImageNet. ScN and PN denote ScaleNorm and PowerNorm, respectively.

| Model | LN | BN | IN | GN | SN | ScN| PN | **DTN**|

| :-----| :----: | :----: | :----: | :----: | :----: | :----: | :----: |:----: 

| ViT-S | 79.9 | 77.3 | 77.7| 78.3| 80.1| 80.0|79.8|**80.6**|

| ViT-S* | 80.6 | 77.2 | 77.6| 79.5| 81.0| 80.6|80.4|**81.7**|

**3. Visualization** of attention distance for each head in ViT-S. Many heads in ViT-S with DTN have a small mean

attention distance. Hence, DTN can capture local context well.



## Getting Started

* Install [PyTorch](http://pytorch.org/)

### Requirements

- Install `CUDA==10.1` with `cudnn7` following

  the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)

- Install `PyTorch==1.7.1` and `torchvision==0.8.2` with `CUDA==10.1`:

```bash

conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

```

- Install `timm==0.4.9`:

```bash

pip install timm==0.4.9

```

### Data Preparation

- Download the ImageNet dataset which should contain train and val directionary and the txt file for correspondings between images and labels.

### Training a model from scratch

An example to train our DTN is given in DTN/scripts/train.sh. To train ViT-S* with our DTN, 

```

cd DTN/scripts   

sh train.sh layer vit_norm_s_star configs/ViT/vit.yaml

```

Number of GPUs and configuration file to use can be modified in train.sh

## License

DTN is released under BSD 3-Clause License.

## Acknowledgement

Our code is based on the implementation of timm package in PyTorch Image Models, https://github.com/rwightman/pytorch-image-models.

## Citation

If our code is helpful to your work, please cite:

```

@article{shao2021dynamic,

  title={Dynamic Token Normalization Improves Vision Transformer},

  author={Shao, Wenqi and Ge, Yixiao and Zhang, Zhaoyang and Xu, Xuyuan and Wang, Xiaogang and Shan, Ying and Luo, Ping},

  journal={arXiv preprint arXiv:2112.02624},

  year={2021}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tencentarc/dtn

Awesome Lists containing this project

README