https://github.com/sayannath/vit-tf-hub-application

Build and fine-tune your Image Classifier using a Vision Transformer Model from TensorFlow Hub
https://github.com/sayannath/vit-tf-hub-application

fine-tuning jax keras tensorflow tf2 tfhub transformers vision-transformer vit

Last synced: 5 months ago
JSON representation

Build and fine-tune your Image Classifier using a Vision Transformer Model from TensorFlow Hub

Host: GitHub
URL: https://github.com/sayannath/vit-tf-hub-application
Owner: sayannath
License: mit
Created: 2021-09-26T12:12:30.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2021-09-27T13:32:33.000Z (about 4 years ago)
Last Synced: 2025-02-26T12:22:55.766Z (8 months ago)
Topics: fine-tuning, jax, keras, tensorflow, tf2, tfhub, transformers, vision-transformer, vit
Language: Jupyter Notebook
Homepage: https://tfhub.dev/sayakpaul/collections/vision_transformer/1
Size: 13.3 MB
Stars: 10
Watchers: 2
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          ![GitHub forks](https://img.shields.io/github/forks/sayannath/ViT-TF-Hub-Application?style=for-the-badge)

![GitHub Repo stars](https://img.shields.io/github/stars/sayannath/ViT-TF-Hub-Application?style=for-the-badge)

![GitHub last commit](https://img.shields.io/github/last-commit/sayannath/ViT-TF-Hub-Application?style=for-the-badge)

![Twitter Follow](https://img.shields.io/twitter/follow/sayannath2350?style=for-the-badge)

[![Ask Me Anything !](https://img.shields.io/badge/Ask%20me-anything-1abc9c.svg?style=for-the-badge)](https://gitHub.com/sayannath)

# Vision Transformer TF-Hub Application

![PngItem_3011351 (1)](https://user-images.githubusercontent.com/72073401/134901679-918e04ca-2e70-4847-8e15-98003ff878ae.png)

## Description

This repositories show how to `fine-tune` a Vision Transformer model from [TensorFlow Hub](https://www.tfhub.dev) on the Image Scene Detection dataset.

## Dataset Used

A newly collected Camera Scene Classification dataset consisting of images belonging to 30 different classes. This dataset is the part of the competition which is [Mobile AI Workshop @ CVPR 2021](https://competitions.codalab.org/competitions/28113).

You can find the dataset details [here](https://competitions.codalab.org/competitions/28113#participate).

## Models

These models are available on [TensorFlow Hub](https://www.tfhub.dev) for Vision Transformer.

### Image Classifiers

* [ViT-S16](https://tfhub.dev/sayakpaul/vit_s16_classification/1)

* [ViT-B8](https://tfhub.dev/sayakpaul/vit_b8_classification/1)

* [ViT-B16](https://tfhub.dev/sayakpaul/vit_b16_classification/1)

* [ViT-B32](https://tfhub.dev/sayakpaul/vit_b32_classification/1)

* [ViT-L16](https://tfhub.dev/sayakpaul/vit_l16_classification/1)

* [ViT-R26-S32 (light augmentation)](https://tfhub.dev/sayakpaul/vit_r26_s32_lightaug_classification/1)

* [ViT-R26-S32 (medium augmentation)](https://tfhub.dev/sayakpaul/vit_r26_s32_medaug_classification/1)

* [ViT-R50-L32](https://tfhub.dev/sayakpaul/vit_r50_l32_classification/1)

### Feature Extractors

* [ViT-S16](https://tfhub.dev/sayakpaul/vit_s16_fe/1)

* [ViT-B8](https://tfhub.dev/sayakpaul/vit_b8_fe/1)

* [ViT-B16](https://tfhub.dev/sayakpaul/vit_b16_fe/1)

* [ViT-B32](https://tfhub.dev/sayakpaul/vit_b32_fe/1)

* [ViT-L16](https://tfhub.dev/sayakpaul/vit_l16_fe/1)

* [ViT-R26-S32 (light augmentation)](https://tfhub.dev/sayakpaul/vit_r26_s32_lightaug_fe/1)

* [ViT-R26-S32 (medium augmentation)](https://tfhub.dev/sayakpaul/vit_r26_s32_medaug_fe/1)

* [ViT-R50-L32](https://tfhub.dev/sayakpaul/vit_r50_l32_fe/1)

> Note: As we want to fine-tune our model so we used the feature-extractor model and build the image classifier.

## Benchmark Results

| Sl No |          Models          | No of Parameters | Accuracy | Validation Accuracy |

|:-----:|:------------------------:|:----------------:|:--------:|:-------------------:|

|   1   |         ViT-S/16         |    21,677,214    |  99.73%  |        96.87%       |

|   2   |  ViT R26-S/32(light aug) |    36,058,462    |  99.70%  |        96.67%       |

|   3   | ViT R26-S/32(medium aug) |    36,058,462    |  99.80%  |        97.17%       |

|   4   |         ViT B/32         |    87,478,302    |  99.43%  |        96.87%       |

|   5   |     MobileNetV3Small     |     2,070,158    |  95.20%  |        92.73%       |

|   6   |        MobileNetV2       |     2,929,246    |  95.06%  |        88.89%       |

|   7   |     BigTransfer (BiT)    |                  |  99.53%  |        96.97%       |

> Note: Last three results are benchmarked during thr CVPR Competition. You can find the repository [here](https://github.com/sayannath/Image-Scene-Classification).

## Notebooks

:white_check_mark: **ViT** **S/16** 


:white_check_mark: **ViT** **R26-S/32 (Light Augmentation)** 


:white_check_mark: **ViT** **R26-S/32 (Medium Augmentation)** 


:white_check_mark: **ViT** **B/32** 


:white_large_square: **ViT R50-L/32** 


:white_large_square: **ViT B/16** 


:white_large_square: **ViT L/16** 


:white_large_square: **ViT B/8** 


## Links

| Sl No | Models                   | Colab Notebook | TensorBoard |

|----|--------------------------|----------------|-------------|

| 1  | ViT-S/16                 | [Link](https://colab.research.google.com/drive/1ISB3E5_wjojRjhbCjRLaKLCPxUHqtxd1?usp=sharing)       | [Link](https://tensorboard.dev/experiment/m9OMnYIzTw66LWXvyXCYgg/)    |

| 2  | ViT R26-S/32(light aug)  | [Link](https://colab.research.google.com/drive/14Ms__eAJOD0jdDLlHxmIawcQET_GyQjz?usp=sharing)       | [Link](https://tensorboard.dev/experiment/myd5IEZtRjWEmAQQ9lSolA/)    |

| 3  | ViT R26-S/32(medium aug) | [Link](https://colab.research.google.com/drive/1xuQTvl5lYqR3tn_17d7_WeDrdj76ieIl?usp=sharing)       | [Link](https://tensorboard.dev/experiment/35bwOLWxQLqO0E11sdveDQ/)    |

| 4  | ViT B/32                 | [Link](https://colab.research.google.com/drive/1-9mo1H8tOHOjqunF317a-I1B4vbX6yeC?usp=sharing)       | [Link](https://tensorboard.dev/experiment/H2QSxurmQt6YNVVWSlTaUA/)    |

> Each directory of model contains the particular notebook, python script, metric graph, train-logs(in .csv) and TensorBoard callbacks.

## References

[1] [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al.](https://arxiv.org/abs/2010.11929)

[2] [How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers by Steiner et al.](https://arxiv.org/abs/2106.10270)

[3] [Vision Transformer GitHub](https://github.com/google-research/vision_transformer)

[4] [jax2tf tool](https://github.com/google/jax/tree/main/jax/experimental/jax2tf/)

[5] [Image Classification with Vision Transformer in Keras](https://keras.io/examples/vision/image_classification_with_vision_transformer/)

[6] [ViT-jax2tf](https://github.com/sayakpaul/ViT-jax2tf)

[7] [Vision Transformers are Robust Learners](https://arxiv.org/abs/2105.07581), [Repository](https://github.com/sayakpaul/robustness-vit)

[8] [Vision Transformer TF-Hub Model Collection](https://tfhub.dev/sayakpaul/collections/vision_transformer/1)

## Acknowledgements

* Thanks to [Sayak Paul](https://sayak.dev) for building the models of ViT so that we can use Vision Transformer in a straight way.

* Thanks to the authors of Vision Transformers for their efforts put into open-sourcing the models.

## Contributors

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sayannath/vit-tf-hub-application

Awesome Lists containing this project

README