Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lim-calculus/implementation-of-vit-from-scratch-for-mnist-classification
Implemetation of Vision Transformer from Scratch for MNIST Classification
https://github.com/lim-calculus/implementation-of-vit-from-scratch-for-mnist-classification
Last synced: 3 days ago
JSON representation
Implemetation of Vision Transformer from Scratch for MNIST Classification
- Host: GitHub
- URL: https://github.com/lim-calculus/implementation-of-vit-from-scratch-for-mnist-classification
- Owner: Lim-Calculus
- License: apache-2.0
- Created: 2023-04-20T10:00:55.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-04-25T17:03:21.000Z (almost 2 years ago)
- Last Synced: 2024-11-21T06:34:49.243Z (2 months ago)
- Language: Jupyter Notebook
- Size: 840 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Implementation-of-ViT-from-Scratch-for-MNIST-Classification
The implementation of the Vision Transformer for MNIST classification uses an image size of 224 x 224 x 3, with a patch size of 16 x 16, a hidden size (latent vector) of 128, 12 attention heads, and 12 transformer layers. The model is trained using the Adam optimizer with a learning rate of 1e-3, weight decay of 1e-5, and a batch size of 32 for 20 epochs. The resulting **test accuracy is 95.86%, with a loss of 0.1337**.1. `Image Size : 224 x 224 x 3`
2. `Patch Size : 16 x 16`
3. `Hidden Size : 128`
4. `Number of attention heads : 12`
5. `Number of transformer layers : 12`While Convolutional Neural Networks (CNNs) have been proven to achieve high accuracy on the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits, it is worth noting that Vision Transformers have shown impressive performance on other image classification tasks, particularly on large-scale datasets where they have outperformed traditional CNNs. **Therefore, while it is unlikely that Vision Transformers will significantly outperform CNNs on MNIST due to the dataset's simplicity and small size, it's still an interesting approach to explore**.
### Reference
1. [Image classification with Vision Transformer]( https://keras.io/examples/vision/image_classification_with_vision_transformer/)
2. [Vision Transformer in Tensorflow](https://dzlab.github.io/notebooks/tensorflow/vision/classification/2021/10/01/vision_transformer.html)
3. [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)