https://github.com/mjahmadee/vision_transformers
Vision Transformers
https://github.com/mjahmadee/vision_transformers
cait image-classification transformer vision-transformer vision-transformers vit
Last synced: over 1 year ago
JSON representation
Vision Transformers
- Host: GitHub
- URL: https://github.com/mjahmadee/vision_transformers
- Owner: MJAHMADEE
- License: mit
- Created: 2023-07-17T05:16:18.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-03-16T13:17:16.000Z (about 2 years ago)
- Last Synced: 2025-01-11T08:51:39.020Z (over 1 year ago)
- Topics: cait, image-classification, transformer, vision-transformer, vision-transformers, vit
- Language: Jupyter Notebook
- Homepage:
- Size: 905 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Vision Transformers with PyTorch ๐ผ๏ธ๐ค



This project implements Vision Transformers (ViT) using PyTorch to classify images from the CIFAR-10 dataset. It includes pre-trained models like ViT and CaiT, fine-tuned on CIFAR-10, demonstrating how transformers can be adapted for image classification.
## Features ๐
- Utilizes pre-trained Vision Transformer (ViT) and Class-Attention in Image Transformers (CaiT) models.
- Supports fine-tuning of transformer models on the CIFAR-10 dataset.
- Visualizes training and validation loss, accuracy, and confusion matrices.
- Demonstrates data preprocessing and augmentation techniques for image data.
- Evaluates model performance with metrics such as F1-score, recall, accuracy, and precision.
## Setup and Installation ๐ ๏ธ
1. Clone the repository from GitHub.
2. Navigate to the project directory.
3. Install the required dependencies listed in the `requirements.txt` file.
## Dataset ๐
The CIFAR-10 dataset is used, consisting of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. The dataset is automatically downloaded and pre-processed for training and testing.
## Training the Model ๐
The training process involves fine-tuning the pre-trained Vision Transformer models on the CIFAR-10 dataset. The models are adjusted to work with the smaller image size and class count of CIFAR-10.
## Testing the Model ๐งช
After training, the model's performance is evaluated on the test set of CIFAR-10. Metrics like accuracy, F1-score, recall, and precision are computed to assess the model.
## Results and Evaluation ๐
Results are documented through confusion matrices, loss, and accuracy plots. These visualizations help in understanding the model's performance and areas of improvement.
## License ๐
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgements ๐
- Thanks to the creators of the CIFAR-10 dataset for providing the resources necessary for training and testing the model.
- PyTorch and timm library documentation for providing comprehensive guides and tutorials.
@misc{MJVisionTransformers2023,
author = {Mohammad Javad (MJ) Ahmadi},
title = {Vision Transformers},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/MJAHMADEE/Vision_Transformers}}
}
---
For more information, please refer to the [official repository](https://github.com/MJAHMADEE/Vision_Transformers).