{"id":20291604,"url":"https://github.com/hila-chefer/robustvit","last_synced_at":"2025-10-13T20:14:20.644Z","repository":{"id":40555765,"uuid":"496588142","full_name":"hila-chefer/RobustViT","owner":"hila-chefer","description":"[NeurIPS 2022] Official PyTorch implementation of Optimizing Relevance Maps of Vision Transformers Improves Robustness. This code allows to finetune the explainability maps of Vision Transformers to enhance robustness.","archived":false,"fork":false,"pushed_at":"2022-11-22T08:57:48.000Z","size":17385,"stargazers_count":131,"open_issues_count":1,"forks_count":13,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-31T11:36:35.146Z","etag":null,"topics":["explainability","neurips","neurips-2022","robustness","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hila-chefer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-05-26T11:11:08.000Z","updated_at":"2025-06-25T09:39:41.000Z","dependencies_parsed_at":"2023-01-22T21:30:28.365Z","dependency_job_id":null,"html_url":"https://github.com/hila-chefer/RobustViT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hila-chefer/RobustViT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hila-chefer%2FRobustViT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hila-chefer%2FRobustViT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hila-chefer%2FRobustViT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hila-chefer%2FRobustViT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hila-chefer","download_url":"https://codeload.github.com/hila-chefer/RobustViT/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hila-chefer%2FRobustViT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279016939,"owners_count":26085906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["explainability","neurips","neurips-2022","robustness","vision-transformer"],"created_at":"2024-11-14T15:13:07.171Z","updated_at":"2025-10-13T20:14:20.584Z","avatar_url":"https://github.com/hila-chefer.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Official PyTorch implementation of **[Optimizing Relevance Maps of Vision Transformers Improves Robustness](https://arxiv.org/abs/2206.01161) [NeurIPS 2022]**\n\nThis code allows to  finetune the explainability maps of Vision Transformers to enhance robustness.\n\n## HuggingFace space + Colab notebook to run examples of the finetuned vs the original models:\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hila-chefer/RobustViT/blob/master/RobustViT.ipynb)[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Hila/RobustViT)\n[![Open In YouTube](https://img.shields.io/static/v1?label=NeurIPS2022\u0026message=5MinuteVideo\u0026color=red)](https://www.youtube.com/watch?v=i_bY-IDyPD8)\n## Updates:\n06/05/2022 **Added a [HuggingFace Spaces demo](https://huggingface.co/spaces/Hila/RobustViT)**:\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"hf_spaces.png\"\u003e\n\u003c/p\u003e\n\n## Method overview:\nThe method employs loss functions directly to the explainability maps to ensure that the model is focused mostly on the foreground of the image:\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"500\" height=\"400\" src=\"teaser.png\"\u003e\n\u003c/p\u003e\nUsing a short finetuning process with only 3 labeled examples from 500 classes, our method improves robustness of ViT models across different model sizes and training techniques, even when data augmentations/ regularization are applied.\n\n## Model zoo\nBelow are links to download finetuned models for the base models of [ViT AugReg](https://arxiv.org/abs/2106.10270) (this is also the model that appears on [timm](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py)), vanilla ViT, and DeiT. \nThese are also the weights used in our [colab notebook](https://colab.research.google.com/github/hila-chefer/RobustViT/blob/master/RobustViT.ipynb).\n| Path | Description\n| :--- | :----------\n|[AugReg-B](https://drive.google.com/file/d/1jbWiuBrL4sKpAjG3x4oGbs3WOC2UdbIb/view?usp=sharing) | Finetuned ViT Augreg base model.\n|[ViT-B](https://drive.google.com/file/d/1vDmuvbdLbYVAqWz6yVM4vT1Wdzt8KV-g/view?usp=sharing) | Finetuned vanilla ViT base model.\n|[DeiT-B](https://drive.google.com/file/d/1DHKX_s8rVCDiX4pwnuCCZdGWsOl4SFMn/view?usp=sharing)| Finetuned DeiT base model.\n\n## Requirements\n* `pytorch==1.7.1`\n* `torchvision==0.8.2`\n* `timm==0.4.12`\n\n## Producing Segmentation Data\n### Using ImageNet-S\nTo use the ImageNet-S labeled data, [download the `ImageNetS919` dataset](https://github.com/UnsupervisedSemanticSegmentation/ImageNet-S)\n\n### Using TokenCut for unsupervised segmentation\n1.  Clone the TokenCut project\n    ```\n    git clone https://github.com/YangtaoWANG95/TokenCut.git\n    ```\n2.  Install the dependencies\n    Python 3.7, PyTorch 1.7.1, and CUDA 11.2. Please refer to the official installation. If CUDA 10.2 has been properly installed:\n    ```\n    pip install torch==1.7.1 torchvision==0.8.2\n    ```\n    Followed by:\n    ```\n    pip install -r TokenCut/requirements.txt\n    \n3. Use the following command to extract the segmentation maps:\n    ```\n   python tokencut_generate_segmentation.py --img_path \u003cPATH_TO_IMAGE\u003e --out_dir \u003cPATH_TO_OUTPUT_DIRECTORY\u003e    \n   ```\n\n\n## Finetuning ViT models\n\nTo finetune a pretrained ViT model use the `imagenet_finetune.py` script. Notice to uncomment the import line containing the pretrained model you \nwish to finetune.\n\nUsage example:\n\n```bash\npython imagenet_finetune.py --seg_data \u003cPATH_TO_SEGMENTATION_DATA\u003e --data \u003cPATH_TO_IMAGENET\u003e --gpu 0  --lr \u003cLR\u003e --lambda_seg \u003cSEG\u003e --lambda_acc \u003cACC\u003e --lambda_background \u003cBACK\u003e --lambda_foreground \u003cFORE\u003e\n```\n\nNotes:\n\n* For all models we use :\n    * `lambda_seg=0.8`\n    * `lambda_acc=0.2`\n    * `lambda_background=2`\n    * `lambda_foreground=0.3`\n * For **DeiT** models, a temperature is required as follows:\n    * `temperature=0.65` for DeiT-B\n    * `temperature=0.55` for DeiT-S\n * The learning rates per model are:\n    * ViT-B: 3e-6\n    * ViT-L: 9e-7\n    * AR-S: 2e-6\n    * AR-B: 6e-7\n    * AR-L: 9e-7\n    * DeiT-S: 1e-6\n    * DeiT-B: 8e-7\n\n## Baseline methods\nNotice to uncomment the import line containing the pretrained model you wish to finetune in the code.\n\n### GradMask\nRun the following command: \n```bash\npython imagenet_finetune_gradmask.py --seg_data \u003cPATH_TO_SEGMENTATION_DATA\u003e --data \u003cPATH_TO_IMAGENET\u003e --gpu 0  --lr \u003cLR\u003e --lambda_seg \u003cSEG\u003e --lambda_acc \u003cACC\u003e\n```\nAll hyperparameters for the different models can be found in section D of the supplementary material.\n\n### Right for the Right Reasons\nRun the following command: \n```bash\npython imagenet_finetune_rrr.py --seg_data \u003cPATH_TO_SEGMENTATION_DATA\u003e --data \u003cPATH_TO_IMAGENET\u003e --gpu 0  --lr \u003cLR\u003e --lambda_seg \u003cSEG\u003e --lambda_acc \u003cACC\u003e\n```\nAll hyperparameters for the different models can be found in section D of the supplementary material.\n\n## Evaluation\n\n### Robustness Evaluation\n\n1. Download the evaluation datasets: \n    * [INet-A](https://github.com/hendrycks/natural-adv-examples)\n    * [INet-R](https://github.com/hendrycks/imagenet-r)\n    * [INet-v2](https://github.com/modestyachts/ImageNetV2)\n    * [ObjectNet](https://objectnet.dev/)\n    * [SI-Score](https://github.com/google-research/si-score)\n\n2. Run the following script to evaluate:\n \n```bash\npython imagenet_eval_robustness.py --data \u003cPATH_TO_ROBUSTNESS_DATASET\u003e --batch-size \u003cBATCH_SIZE\u003e --evaluate --checkpoint \u003cPATH_TO_FINETUNED_CHECKPOINT\u003e\n```\n* Notice to uncomment the import line containing the pretrained model you wish to evaluate in the code.\n* To evaluate the original model simply omit the `checkpoint` parameter.\n* For the INet-v2 dataset add `--isV2`.\n* For the ObjectNet dataset add `--isObjectNet`.\n* For the SI datasets add `--isSI`.\n\n### Segmentation Evaluation\nOur segmentation tests are based on the test in the official implementation of [Transformer Interpretability Beyond Attention Visualization](https://github.com/hila-chefer/Transformer-Explainability).\n1. [Download the ImageNet segmentation test set](https://github.com/hila-chefer/Transformer-Explainability#section-a-segmentation-results).\n2. Run the following script to evaluate:\n \n ```bash\nPYTHONPATH=./:$PYTHONPATH python SegmentationTest/imagenet_seg_eval.py  --imagenet-seg-path \u003cPATH_TO_gtsegs_ijcv.mat\u003e\n```\n* Notice to uncomment the import line containing the pretrained model you wish to evaluate in the code.\n\n### Credits\n* The TokenCut code is built on top of [LOST](https://github.com/valeoai/LOST), [DINO](https://github.com/facebookresearch/dino), [Segswap](https://github.com/XiSHEN0220/SegSwap), and [Bilateral_Sovlver](https://github.com/poolio/bilateral_solver). \n* Our ViT code is based on the [pytorch-image-models](https://github.com/rwightman/pytorch-image-models) repository.\n* Our ImageNet finetuning code is based on [code from the official PyTorch repo](https://github.com/pytorch/examples/blob/main/imagenet/main.py).\n* The code to convert ObjectNet classes to ImageNet classes was taken from [the torchprune repo](https://github.com/lucaslie/torchprune/blob/b753745b773c3ed259bf819d193ce8573d89efbb/src/torchprune/torchprune/util/datasets/objectnet.py).\n* The code to convert SI-Score classes to ImageNet classes was taken from [the official implementation](https://github.com/google-research/si-score).\n\nWe would like to sincerely thank the authors for their great works. \n\n## Citing our paper\nIf you make use of our work, please cite our paper:\n```\n@inproceedings{\nchefer2022optimizing,\ntitle={Optimizing Relevance Maps of Vision Transformers Improves Robustness},\nauthor={Hila Chefer and Idan Schwartz and Lior Wolf},\nbooktitle={Thirty-Sixth Conference on Neural Information Processing Systems},\nyear={2022},\nurl={https://openreview.net/forum?id=upuYKQiyxa_}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhila-chefer%2Frobustvit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhila-chefer%2Frobustvit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhila-chefer%2Frobustvit/lists"}