An open API service indexing awesome lists of open source software.

https://github.com/mjahmadee/vision_transformers

Vision Transformers
https://github.com/mjahmadee/vision_transformers

cait image-classification transformer vision-transformer vision-transformers vit

Last synced: over 1 year ago
JSON representation

Vision Transformers

Awesome Lists containing this project

README

          

# Vision Transformers with PyTorch ๐Ÿ–ผ๏ธ๐Ÿค–

![Python](https://img.shields.io/badge/Python-3.8-blue.svg)
![PyTorch](https://img.shields.io/badge/PyTorch-1.8.1-orange.svg)
![License](https://img.shields.io/badge/License-MIT-green.svg)

This project implements Vision Transformers (ViT) using PyTorch to classify images from the CIFAR-10 dataset. It includes pre-trained models like ViT and CaiT, fine-tuned on CIFAR-10, demonstrating how transformers can be adapted for image classification.

## Features ๐ŸŒŸ
- Utilizes pre-trained Vision Transformer (ViT) and Class-Attention in Image Transformers (CaiT) models.
- Supports fine-tuning of transformer models on the CIFAR-10 dataset.
- Visualizes training and validation loss, accuracy, and confusion matrices.
- Demonstrates data preprocessing and augmentation techniques for image data.
- Evaluates model performance with metrics such as F1-score, recall, accuracy, and precision.

## Setup and Installation ๐Ÿ› ๏ธ
1. Clone the repository from GitHub.
2. Navigate to the project directory.
3. Install the required dependencies listed in the `requirements.txt` file.

## Dataset ๐Ÿ“
The CIFAR-10 dataset is used, consisting of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. The dataset is automatically downloaded and pre-processed for training and testing.

## Training the Model ๐Ÿš€
The training process involves fine-tuning the pre-trained Vision Transformer models on the CIFAR-10 dataset. The models are adjusted to work with the smaller image size and class count of CIFAR-10.

## Testing the Model ๐Ÿงช
After training, the model's performance is evaluated on the test set of CIFAR-10. Metrics like accuracy, F1-score, recall, and precision are computed to assess the model.

## Results and Evaluation ๐Ÿ“Š
Results are documented through confusion matrices, loss, and accuracy plots. These visualizations help in understanding the model's performance and areas of improvement.

## License ๐Ÿ“œ
This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgements ๐Ÿ™Œ
- Thanks to the creators of the CIFAR-10 dataset for providing the resources necessary for training and testing the model.
- PyTorch and timm library documentation for providing comprehensive guides and tutorials.

## Notebook and Copyright
Open In Colab

@misc{MJVisionTransformers2023,
author = {Mohammad Javad (MJ) Ahmadi},
title = {Vision Transformers},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/MJAHMADEE/Vision_Transformers}}
}

---
For more information, please refer to the [official repository](https://github.com/MJAHMADEE/Vision_Transformers).