Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/reshalfahsi/image-classification-vit
Image Classification With Vision Transformer
https://github.com/reshalfahsi/image-classification-vit
cifar-100 image-classification pytorch pytorch-lightning vision-transformer
Last synced: about 6 hours ago
JSON representation
Image Classification With Vision Transformer
- Host: GitHub
- URL: https://github.com/reshalfahsi/image-classification-vit
- Owner: reshalfahsi
- Created: 2024-03-02T04:57:13.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2024-04-14T03:59:41.000Z (7 months ago)
- Last Synced: 2024-04-14T18:59:23.163Z (7 months ago)
- Topics: cifar-100, image-classification, pytorch, pytorch-lightning, vision-transformer
- Language: Jupyter Notebook
- Homepage:
- Size: 379 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Image Classification With Vision Transformer
Vision Transformer, or ViT, tries to perform image recognition in a sequence modeling way. By dividing the image into patches and feeding them to the revered model in language modeling, a.k.a. Transformer, ViT shows that over-reliance on the spatial assumption is not obligatory. However, a study shows that giving more cues about spatial information, i.e., subjecting consecutive convolutions to the image before funneling it to the Transformer, aids the ViT in learning better. Since ViT employs the Transformer block, we can easily receive the attention map _explaining_ what the network sees. In this project, we will be using the CIFAR-100 dataset to examine ViT performance. Here, the validation set is fixed to be the same as the test set of CIFAR-100. Online data augmentations, e.g., RandAugment, CutMix, and MixUp, are utilized during training. The learning rate is adjusted by following the triangular cyclical policy.
## Experiment
Experience the journey of training, testing, and inference image classification with ViT by jumping to this [notebook](https://github.com/reshalfahsi/image-classification-vit/blob/master/Image_Classification_With_Vision_Transformer.ipynb).
## Result
## Quantitative Result
Here are the quantitative results of ViT performance:
Test Metric | Score
------------ | -------------
Loss | 1.353
Top1-Acc. | 64.92%
Top5-Acc. | 87.29%## Accuracy and Loss Curve
Accuracy curves of ViT on the CIFAR-100 train and validation sets.
Loss curves of ViT on the CIFAR-100 train and validation sets.## Qualitative Result
The predictions and the corresponding attention maps are served in this collated image.
Several prediction results of ViT and their attention map.## Credit
- [An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/pdf/2010.11929.pdf)
- [TorchVision's ViT](https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py)
- [Image classification with Vision Transformer](https://keras.io/examples/vision/image_classification_with_vision_transformer/)
- [Train a Vision Transformer on small datasets](https://keras.io/examples/vision/vit_small_ds/)
- [Learning Multiple Layers of Features from Tiny Images](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf)
- [The CIFAR-100 dataset](https://www.cs.toronto.edu/~kriz/cifar.html)
- [RandAugment: Practical automated data augmentation with a reduced search space](https://arxiv.org/pdf/1909.13719.pdf)
- [RandAugment for Image Classification for Improved Robustness](https://keras.io/examples/vision/randaugment/)
- [CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features](https://arxiv.org/pdf/1905.04899.pdf)
- [CutMix data augmentation for image classification](https://keras.io/examples/vision/cutmix/)
- [mixup: Beyond Empirical Risk Minimization](https://arxiv.org/pdf/1710.09412.pdf)
- [MixUp augmentation for image classification](https://keras.io/examples/vision/mixup/)
- [Early Convolutions Help Transformers See Better](https://arxiv.org/pdf/2106.14881.pdf)
- [Cyclical Learning Rates for Training Neural Networks](https://arxiv.org/pdf/1506.01186.pdf)
- [How to use CutMix and MixUp](https://pytorch.org/vision/main/auto_examples/transforms/plot_cutmix_mixup.html)
- [PyTorch Lightning](https://lightning.ai/docs/pytorch/latest/)