{"id":18322526,"url":"https://github.com/tencentarc/dtn","last_synced_at":"2025-04-05T23:31:02.444Z","repository":{"id":109180316,"uuid":"434857130","full_name":"TencentARC/DTN","owner":"TencentARC","description":"Official code for \"Dynamic Token Normalization Improves Vision Transformer\", ICLR 2022.","archived":false,"fork":false,"pushed_at":"2022-05-22T06:11:36.000Z","size":375,"stargazers_count":28,"open_issues_count":1,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-21T13:23:19.668Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-04T09:21:11.000Z","updated_at":"2024-09-01T11:54:26.000Z","dependencies_parsed_at":"2023-04-06T16:39:26.315Z","dependency_job_id":null,"html_url":"https://github.com/TencentARC/DTN","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FDTN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FDTN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FDTN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FDTN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/DTN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247415783,"owners_count":20935383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T18:25:01.328Z","updated_at":"2025-04-05T23:31:02.439Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dynamic Token Normalization Improves Vision Transfromers, ICLR 2022\r\n\r\nThis is the PyTorch implementation of the paper [Dynamic Token Normalization Improves Vision Transformers](https://arxiv.org/abs/2112.02624) \r\nin ICLR 2022.\r\n\r\n\r\n## Dynamic Token Normalization\r\nWe design a novel normalization method, termed Dynamic Token Normalization (DTN), which inherits the advantages from LayerNorm and InstanceNorm. DTN can be seamlessly plugged into various transformer models, consistenly improving the performance.\r\n\u003cdiv align=center\u003e\u003cimg src=\"DTN_token.png\" width=\"1080\" height=\"210\"\u003e\u003c/div\u003e\r\n\r\n\r\n## News\r\n**2022-5-20** We release the code of DTN in training ViT and PVT. More models with DTN will be released soon.\r\n\r\n## Main Results\r\n**1. Performance** on ImageNet with ViT and its variants in terms of FLOPs, Parameters, Top-1, and Top-5 accuracies. H and C denote head number and embedding.\r\n\r\n| Model | Norm | H | C | FLOPs | Params | Top-1 | Top-5 | \r\n| :-----| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\r\n| ViT-T | LN | 3 | 192| 1.26G| 5.7M| 72.2|91.3|\r\n| ViT-T* | LN | 4 | 192| 1.26G| 5.7M| 72.3|91.4|\r\n| ViT-T* | **DTN** | 4 | 192| 1.26G| 5.7M| 73.2|91.7|\r\n| ViT-S* | LN | 6 | 384| 4.60G| 22.1M| 79.9|95.0|\r\n| ViT-S* | **DTN** | 6 | 384| 4.88G| 22.1M| 80.6|95.3|\r\n| ViT-B* | LN | 16 | 768| 17.58G| 86.5M| 81.7|95.0|\r\n| ViT-B* | **DTN** | 16 | 768| 18.13G| 86.5M| 82.5|96.1|\r\n\r\n\r\n\r\n**2. Comparison** between various normalizers in terms of Top-1 accuracy on ImageNet. ScN and PN denote ScaleNorm and PowerNorm, respectively.\r\n\r\n| Model | LN | BN | IN | GN | SN | ScN| PN | **DTN**|\r\n| :-----| :----: | :----: | :----: | :----: | :----: | :----: | :----: |:----: \r\n| ViT-S | 79.9 | 77.3 | 77.7| 78.3| 80.1| 80.0|79.8|**80.6**|\r\n| ViT-S* | 80.6 | 77.2 | 77.6| 79.5| 81.0| 80.6|80.4|**81.7**|\r\n\r\n**3. Visualization** of attention distance for each head in ViT-S. Many heads in ViT-S with DTN have a small mean\r\nattention distance. Hence, DTN can capture local context well.\r\n\r\n\u003cdiv align=center\u003e\u003cimg src=\"DTN_Head.png\" width=\"1080\" height=\"210\"\u003e\u003c/div\u003e\r\n\r\n## Getting Started\r\n* Install [PyTorch](http://pytorch.org/)\r\n\r\n### Requirements\r\n\r\n- Install `CUDA==10.1` with `cudnn7` following\r\n  the [official installation instructions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)\r\n- Install `PyTorch==1.7.1` and `torchvision==0.8.2` with `CUDA==10.1`:\r\n\r\n```bash\r\nconda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch\r\n```\r\n\r\n- Install `timm==0.4.9`:\r\n\r\n```bash\r\npip install timm==0.4.9\r\n```\r\n\r\n### Data Preparation\r\n- Download the ImageNet dataset which should contain train and val directionary and the txt file for correspondings between images and labels.\r\n\r\n### Training a model from scratch\r\nAn example to train our DTN is given in DTN/scripts/train.sh. To train ViT-S* with our DTN, \r\n```\r\ncd DTN/scripts   \r\nsh train.sh layer vit_norm_s_star configs/ViT/vit.yaml\r\n```\r\nNumber of GPUs and configuration file to use can be modified in train.sh\r\n\r\n## License\r\nDTN is released under BSD 3-Clause License.\r\n\r\n## Acknowledgement\r\nOur code is based on the implementation of timm package in PyTorch Image Models, https://github.com/rwightman/pytorch-image-models.\r\n\r\n## Citation\r\nIf our code is helpful to your work, please cite:\r\n```\r\n@article{shao2021dynamic,\r\n  title={Dynamic Token Normalization Improves Vision Transformer},\r\n  author={Shao, Wenqi and Ge, Yixiao and Zhang, Zhaoyang and Xu, Xuyuan and Wang, Xiaogang and Shan, Ying and Luo, Ping},\r\n  journal={arXiv preprint arXiv:2112.02624},\r\n  year={2021}\r\n}\r\n```\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fdtn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencentarc%2Fdtn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fdtn/lists"}