https://github.com/sayakpaul/robustness-vit

Contains code for the paper "Vision Transformers are Robust Learners" (AAAI 2022).
https://github.com/sayakpaul/robustness-vit

computer-vision jax pytorch robustness self-attention tensorflow transformers

Last synced: about 1 month ago
JSON representation

Contains code for the paper "Vision Transformers are Robust Learners" (AAAI 2022).

Host: GitHub
URL: https://github.com/sayakpaul/robustness-vit
Owner: sayakpaul
License: mit
Created: 2021-03-12T08:59:12.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2022-12-03T04:54:01.000Z (almost 3 years ago)
Last Synced: 2025-07-11T00:21:16.360Z (3 months ago)
Topics: computer-vision, jax, pytorch, robustness, self-attention, tensorflow, transformers
Language: Jupyter Notebook
Homepage: https://arxiv.org/abs/2105.07581
Size: 4.22 MB
Stars: 125
Watchers: 4
Forks: 19
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Vision Transformers are Robust Learners

This repository contains the code for the paper [Vision Transformers are Robust Learners](https://arxiv.org/abs/2105.07581) by
Sayak Paul^\* and Pin-Yu Chen^\* (AAAI 2022).

^\*Equal contribution.

**Update December 2022**: We won the [ML Research Spotlight from Kaggle](https://www.kaggle.com/discussions/general/370095).

**Update July 2022**: The publication is now available as a part of the [AAAI-22 proceedings](https://ojs.aaai.org/index.php/AAAI/article/view/20103). It's also archived in the [IBM Research repository](https://research.ibm.com/publications/vision-transformers-are-robust-learners).

### Abstract

Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to
different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy with better
parameter efficiency. Since self-attention helps a model systematically align different components present inside the input data, it leaves grounds
to investigate its performance under model robustness benchmarks. In this work, we study the robustness of the Vision Transformer (ViT) against common
corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse ImageNet datasets concerning robust
classification to conduct a comprehensive performance comparison of ViT models and SOTA convolutional neural networks (CNNs), Big-Transfer. Through a
series of six systematically designed experiments, we then present analyses that provide both quantitative and qualitative indications to explain why
ViTs are indeed more robust learners. For example, with fewer parameters and similar dataset and pre-training combinations, ViT gives a top-1 accuracy
of 28.10% on ImageNet-A which is 4.3x higher than a comparable variant of BiT. Our analyses on image masking, Fourier spectrum sensitivity, and spread
on discrete cosine energy spectrum reveal intriguing properties of ViT attributing to improved robustness.

## Structure and Navigation

All the results related to the ImageNet datasets (ImageNet-C, ImageNet-P, ImageNet-R, ImageNet-A, ImageNet-O, and ImageNet-9)
can be derived from the notebooks contained in the [`imagenet_results/`](https://github.com/sayakpaul/robustness-vit/tree/master/imagenet_results)
directory. Many notebooks inside that directory can be executed with [Google Colab](https://colab.research.google.com/). When that is not the
case, we provide execution instructions explicitly. This is followed for the rest of the directories present inside this repository.

[`analysis/`](https://github.com/sayakpaul/robustness-vit/tree/master/analysis) directory contains code used to generate results for Section 4 in the paper.

[`misc/`](https://github.com/sayakpaul/robustness-vit/tree/master/misc) directory contains code for various utilities.

For any questions, please open an issue and tag @sayakpaul.

## About our dev environment

We use Python 3.8. As for the hardware setup (when not using Colab), we use [GCP Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench) with
4 V100s, 60 GBs of RAM with 16 vCPUs (`n1-standard-16` [machine type](https://cloud.google.com/compute/docs/machine-types)).

## Citation

```
@article{paul2021vision,
title={Vision Transformers are Robust Learners},
author={Sayak Paul and Pin-Yu Chen},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2022}
}
```

## Acknowledgements

We are thankful to the [Google Developers Experts program](https://developers.google.com/programs/experts/) (specifically
Soonson Kwon and Karl Weinmeister) for providing Google Cloud Platform credits to support the experiments. We also
thank Justin Gilmer (of Google), Guillermo Ortiz-Jimenez (of EPFL, Switzerland), and Dan Hendrycks (of UC Berkeley)
for fruitful discussions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sayakpaul/robustness-vit

Awesome Lists containing this project

README