{"id":24293557,"url":"https://github.com/zihangjiang/tokenlabeling","last_synced_at":"2025-04-04T10:04:08.364Z","repository":{"id":40378811,"uuid":"359763441","full_name":"zihangJiang/TokenLabeling","owner":"zihangJiang","description":"Pytorch implementation of \"All Tokens Matter: Token Labeling for Training Better Vision Transformers\"","archived":false,"fork":false,"pushed_at":"2023-09-05T07:44:23.000Z","size":809,"stargazers_count":427,"open_issues_count":7,"forks_count":35,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-04-04T09:43:38.828Z","etag":null,"topics":["imagenet","lv-vit","pytorch","segmentation","transformer","vision"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zihangJiang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-04-20T09:47:19.000Z","updated_at":"2025-04-02T00:45:34.000Z","dependencies_parsed_at":"2024-01-11T12:27:56.065Z","dependency_job_id":null,"html_url":"https://github.com/zihangJiang/TokenLabeling","commit_stats":{"total_commits":41,"total_committers":5,"mean_commits":8.2,"dds":0.4390243902439024,"last_synced_commit":"9dbfd59aedecfe83f6f3253db4e99b82359d48ac"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zihangJiang%2FTokenLabeling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zihangJiang%2FTokenLabeling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zihangJiang%2FTokenLabeling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zihangJiang%2FTokenLabeling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zihangJiang","download_url":"https://codeload.github.com/zihangJiang/TokenLabeling/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247157280,"owners_count":20893220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["imagenet","lv-vit","pytorch","segmentation","transformer","vision"],"created_at":"2025-01-16T16:30:07.384Z","updated_at":"2025-04-04T10:04:08.333Z","avatar_url":"https://github.com/zihangJiang.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# All Tokens Matter: Token Labeling for Training Better Vision Transformers ([arxiv](https://arxiv.org/abs/2104.10858))\n\nThis is a Pytorch implementation of our paper. \n\n![Compare](figures/Compare.png)\n\nComparison between the proposed LV-ViT and other recent works based on transformers. Note that we only show models whose model sizes are under 100M.\n\nOur codes are based on the [pytorch-image-models](https://github.com/rwightman/pytorch-image-models) by [Ross Wightman](https://github.com/rwightman).\n\n### Update\n**2021.7: Add script to generate label data.**\n\n**2021.6: Support `pip install tlt` to use our Token Labeling Toolbox for image models.**\n\n**2021.6: Release training code and segmentation model.**\n\n**2021.4: Release LV-ViT models.**\n\n#### LV-ViT Models\n\n| Model                           | layer | dim  | Image resolution |  Param  | Top 1 |Download |\n| :------------------------------ | :---- | :--- | :--------------: |-------: | ----: |   ----: |\n| LV-ViT-T                        | 12    | 240  |       224        |  8.53M |  79.1 |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/v0.2.0/lvvit_t.pth) |\n| LV-ViT-S                        | 16    | 384  |       224        |  26.15M |  83.3 |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/1.0/lvvit_s-26M-224-83.3.pth.tar) |\n| LV-ViT-S                        | 16    | 384  |       384        |  26.30M |  84.4 |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/1.0/lvvit_s-26M-384-84.4.pth.tar) |\n| LV-ViT-M                        | 20    | 512  |       224        |  55.83M |  84.0 |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/1.0/lvvit_m-56M-224-84.0.pth.tar) |\n| LV-ViT-M                        | 20    | 512  |       384        |  56.03M |  85.4 |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/1.0/lvvit_m-56M-384-85.4.pth.tar) |\n| LV-ViT-M                        | 20    | 512  |       448        |  56.13M |  85.5 |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/1.0/lvvit_m-56M-448-85.5.pth.tar) |\n| LV-ViT-L                        | 24    | 768  |       448        | 150.47M |  86.2 |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/1.0/lvvit_l-150M-448-86.2.pth.tar) |\n| LV-ViT-L                        | 24    | 768  |       512        | 150.66M |  86.4 |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/1.0/lvvit_l-150M-512-86.4.pth.tar) |\n\n#### Requirements\n\ntorch\u003e=1.4.0\ntorchvision\u003e=0.5.0\npyyaml\nscipy\ntimm==0.4.5\n\ndata prepare: ImageNet with the following folder structure, you can extract imagenet by this [script](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4).\n\n```\n│imagenet/\n├──train/\n│  ├── n01440764\n│  │   ├── n01440764_10026.JPEG\n│  │   ├── n01440764_10027.JPEG\n│  │   ├── ......\n│  ├── ......\n├──val/\n│  ├── n01440764\n│  │   ├── ILSVRC2012_val_00000293.JPEG\n│  │   ├── ILSVRC2012_val_00002138.JPEG\n│  │   ├── ......\n│  ├── ......\n```\n\n#### Validation\nReplace DATA_DIR with your imagenet validation set path and MODEL_DIR with the checkpoint path\n```\nCUDA_VISIBLE_DEVICES=0 bash eval.sh /path/to/imagenet/val /path/to/checkpoint\n```\n\n#### Label data\n\nWe provide NFNet-F6 generated dense label map in [Google Drive](https://drive.google.com/file/d/1Cat8HQPSRVJFPnBLlfzVE0Exe65a_4zh/view?usp=sharing) and [BaiDu Yun](https://pan.baidu.com/s/1YBqiNN9dAzhEXtPl61bZJw) (password: y6j2). As NFNet-F6 are based on pure ImageNet data, no extra training data is involved.\n\n\n#### Training\n\nTrain the LV-ViT-S: \n\nIf only 4 GPUs are available,\n\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3 ./distributed_train.sh 4 /path/to/imagenet --model lvvit_s -b 256 --apex-amp --img-size 224 --drop-path 0.1 --token-label --token-label-data /path/to/label_data --token-label-size 14 --model-ema\n```\n\nIf 8 GPUs are available: \n```\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model lvvit_s -b 128 --apex-amp --img-size 224 --drop-path 0.1 --token-label --token-label-data /path/to/label_data --token-label-size 14 --model-ema\n```\n\n\nTrain the LV-ViT-M and LV-ViT-L (run on 8 GPUs):\n\n\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model lvvit_m -b 128 --apex-amp --img-size 224 --drop-path 0.2 --token-label --token-label-data /path/to/label_data --token-label-size 14 --model-ema\n```\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model lvvit_l -b 128 --lr 1.e-3 --aa rand-n3-m9-mstd0.5-inc1 --apex-amp --img-size 224 --drop-path 0.3 --token-label --token-label-data /path/to/label_data --token-label-size 14 --model-ema\n```\nIf you want to train our LV-ViT on images with 384x384 resolution, please use `--img-size 384 --token-label-size 24`.\n\n#### Fine-tuning\n\nTo Fine-tune the pre-trained LV-ViT-S on images with 384x384 resolution:\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet --model lvvit_s -b 64 --apex-amp --img-size 384 --drop-path 0.1 --token-label --token-label-data /path/to/label_data --token-label-size 24 --lr 5.e-6 --min-lr 5.e-6 --weight-decay 1.e-8 --finetune /path/to/checkpoint\n```\n\nTo Fine-tune the pre-trained LV-ViT-S on other datasets without token labeling:\n```\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/dataset --model lvvit_s -b 64 --apex-amp --img-size 224 --drop-path 0.1 --token-label --token-label-size 14 --dense-weight 0.0 --num-classes $NUM_CLASSES --finetune /path/to/checkpoint\n```\n\n### Segmentation\n\nOur Segmentation model are fully based upon the [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) Toolkit. The model and config files are under `seg/` folder which follow the same folder structure. You can simply drop in these file to get start.\n\n```shell\ngit clone https://github.com/open-mmlab/mmsegmentation # and install\n\ncp seg/mmseg/models/backbones/vit.py mmsegmentation/mmseg/models/backbones/\ncp -r seg/configs/lvvit mmsegmentation/configs/\n\n# test upernet+lvvit_s (add --aug-test to test on multi scale)\ncd mmsegmentation\n./tools/dist_test.sh configs/lvvit/upernet_lvvit_s_512x512_160k_ade20k.py /path/to/checkpoint 8 --eval mIoU [--aug-test]\n```\n\n| Backbone                        | Method  | Crop size | Lr Schd |  mIoU   |  mIoU(ms) | Pixel Acc.| Param |Download |\n| :------------------------------ | :------ | :-------- | :------ |:------- |:--------- | :-------- | :---- | :------ |\n| LV-ViT-S                        | UperNet |  512x512  |   160k  |  47.9   |    48.6   |   83.1    |  44M  |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/v1.1-seg/upernet_lvvit_s.pth) |\n| LV-ViT-M                        | UperNet |  512x512  |   160k  |  49.4   |    50.6   |   83.5    |  77M  |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/v1.1-seg/upernet_lvvit_m.pth) |\n| LV-ViT-L                        | UperNet |  512x512  |   160k  |  50.9   |    51.8   |   84.1    |  209M |[link](https://github.com/zihangJiang/TokenLabeling/releases/download/v1.1-seg/upernet_lvvit_l.pth) |\n\n\n### Visualization\n\nWe apply the visualization method in this [repo](https://github.com/hila-chefer/Transformer-Explainability) to visualize the parts of the image that led to a certain classification for DeiT-Base and our LV-ViT-S. The parts of the image that used by the network to make the decision are highlighted in red.\n\n![Compare](figures/Top1.jpg)\n\n### Label generation\nTo generate token label data for training:\n```bash\npython3 generate_label.py /path/to/imagenet/train /path/to/save/label_top5_train_nfnet --model dm_nfnet_f6 --pretrained --img-size 576 -b 32 --crop-pct 1.0\n```\n\n#### Reference\nIf you use this repo or find it useful, please consider citing:\n```\n@inproceedings{NEURIPS2021_9a49a25d,\n author = {Jiang, Zi-Hang and Hou, Qibin and Yuan, Li and Zhou, Daquan and Shi, Yujun and Jin, Xiaojie and Wang, Anran and Feng, Jiashi},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},\n pages = {18590--18602},\n publisher = {Curran Associates, Inc.},\n title = {All Tokens Matter: Token Labeling for Training Better Vision Transformers},\n url = {https://proceedings.neurips.cc/paper/2021/file/9a49a25d845a483fae4be7e341368e36-Paper.pdf},\n volume = {34},\n year = {2021}\n}\n```\n\n#### Related projects\n[T2T-ViT](https://github.com/yitu-opensource/T2T-ViT/), [Re-labeling ImageNet](https://github.com/naver-ai/relabel_imagenet), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformer Explainability](https://github.com/hila-chefer/Transformer-Explainability).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzihangjiang%2Ftokenlabeling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzihangjiang%2Ftokenlabeling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzihangjiang%2Ftokenlabeling/lists"}