{"id":20663696,"url":"https://github.com/vita-group/diverse-vit","last_synced_at":"2025-04-19T15:56:00.577Z","repository":{"id":50559866,"uuid":"467290575","full_name":"VITA-Group/Diverse-ViT","owner":"VITA-Group","description":"[CVPR 2022] \"The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy\" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang","archived":false,"fork":false,"pushed_at":"2022-03-09T16:27:20.000Z","size":177,"stargazers_count":25,"open_issues_count":2,"forks_count":3,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-29T09:42:02.659Z","etag":null,"topics":["diversity","oversmoothing","regularization","training-techniques","transformer","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VITA-Group.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-07T23:17:08.000Z","updated_at":"2024-04-27T11:04:50.000Z","dependencies_parsed_at":"2022-08-31T21:21:07.846Z","dependency_job_id":null,"html_url":"https://github.com/VITA-Group/Diverse-ViT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDiverse-ViT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDiverse-ViT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDiverse-ViT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDiverse-ViT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VITA-Group","download_url":"https://codeload.github.com/VITA-Group/Diverse-ViT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249731218,"owners_count":21317341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diversity","oversmoothing","regularization","training-techniques","transformer","vision-transformer"],"created_at":"2024-11-16T19:19:20.628Z","updated_at":"2025-04-19T15:56:00.560Z","avatar_url":"https://github.com/VITA-Group.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n\nCodes for this paper: [CVPR 2022] [The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy](). \n\nTianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang.\n\n\n\n## Overview\n\nVision transformers (ViTs) have gained increasing popularity as they are commonly believed to own higher modeling capacity and representation flexibility, than traditional convolutional networks. However, it is questionable whether such potential has been fully unleashed in practice, as the learned ViTs often suffer from over-smoothening, yielding likely redundant models. \n\nRecent works made preliminary attempts to identify and alleviate such redundancy, e.g., via regularizing embedding similarity or re-injecting convolution-like structures. However, a “head-to-toe assessment” regarding the extent of redundancy in ViTs, and how much we could gain by thoroughly mitigating such, has been absent for this field. \n\nThis paper, for the first time, systematically studies the ubiquitous existence of redundancy at all three levels: patch embedding, attention map, and weight space. In view of them, we advocate a principle of diversity for training ViTs, by presenting corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information. \n\nExtensive experiments on ImageNet with a number of ViT backbones validate the effectiveness of our proposals, largely eliminating the observed ViT redundancy and significantly boosting the model generalization. For example, our diversified DeiT obtains 0.70% ∼1.76% accuracy boosts on ImageNet with highly reduced similarity.\n\n\u003cimg src = \"Figs/Diversity_overview.png\" align = \"center\" width=\"100%\" hight=\"60%\"\u003e\n\n\n\n## Prerequisites\n\nInstall PyTorch 1.7.0+ and torchvision 0.8.1+ and [pytorch-image-models 0.3.2](https://github.com/rwightman/pytorch-image-models):\n\n```\nconda install -c pytorch torchvision\npip install timm==0.3.2\n```\n\n\n\n## Training on ImageNet\n\n```\n./script/run_deit_small_diverse.sh [data/imagenet] (Deit-Small-12layers)\n./script/run_deit_small_24layer_diverse.sh [data/imagenet] (Deit-Small-24layers)\n```\n\n\n\n## Citation\n\n```\nTBD\n```\n\n\n\n## Acknowledgement\n\nhttps://github.com/facebookresearch/deit\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvita-group%2Fdiverse-vit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvita-group%2Fdiverse-vit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvita-group%2Fdiverse-vit/lists"}