https://github.com/mjahmadee/vision_transformers

Vision Transformers
https://github.com/mjahmadee/vision_transformers

cait image-classification transformer vision-transformer vision-transformers vit

Last synced: about 2 months ago
JSON representation

Vision Transformers

Host: GitHub
URL: https://github.com/mjahmadee/vision_transformers
Owner: MJAHMADEE
License: mit
Created: 2023-07-17T05:16:18.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-03-16T13:17:16.000Z (over 2 years ago)
Last Synced: 2025-01-11T08:51:39.020Z (over 1 year ago)
Topics: cait, image-classification, transformer, vision-transformer, vision-transformers, vit
Language: Jupyter Notebook
Homepage:
Size: 905 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Vision Transformers with PyTorch 🖼️🤖

![Python](https://img.shields.io/badge/Python-3.8-blue.svg)
![PyTorch](https://img.shields.io/badge/PyTorch-1.8.1-orange.svg)
![License](https://img.shields.io/badge/License-MIT-green.svg)

This project implements Vision Transformers (ViT) using PyTorch to classify images from the CIFAR-10 dataset. It includes pre-trained models like ViT and CaiT, fine-tuned on CIFAR-10, demonstrating how transformers can be adapted for image classification.

## Features 🌟
- Utilizes pre-trained Vision Transformer (ViT) and Class-Attention in Image Transformers (CaiT) models.
- Supports fine-tuning of transformer models on the CIFAR-10 dataset.
- Visualizes training and validation loss, accuracy, and confusion matrices.
- Demonstrates data preprocessing and augmentation techniques for image data.
- Evaluates model performance with metrics such as F1-score, recall, accuracy, and precision.

## Setup and Installation 🛠️
1. Clone the repository from GitHub.
2. Navigate to the project directory.
3. Install the required dependencies listed in the `requirements.txt` file.

## Dataset 📁
The CIFAR-10 dataset is used, consisting of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. The dataset is automatically downloaded and pre-processed for training and testing.

## Training the Model 🚀
The training process involves fine-tuning the pre-trained Vision Transformer models on the CIFAR-10 dataset. The models are adjusted to work with the smaller image size and class count of CIFAR-10.

## Testing the Model 🧪
After training, the model's performance is evaluated on the test set of CIFAR-10. Metrics like accuracy, F1-score, recall, and precision are computed to assess the model.

## Results and Evaluation 📊
Results are documented through confusion matrices, loss, and accuracy plots. These visualizations help in understanding the model's performance and areas of improvement.

## License 📜
This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgements 🙌
- Thanks to the creators of the CIFAR-10 dataset for providing the resources necessary for training and testing the model.
- PyTorch and timm library documentation for providing comprehensive guides and tutorials.

## Notebook and Copyright

@misc{MJVisionTransformers2023,
author = {Mohammad Javad (MJ) Ahmadi},
title = {Vision Transformers},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/MJAHMADEE/Vision_Transformers}}
}

---
For more information, please refer to the [official repository](https://github.com/MJAHMADEE/Vision_Transformers).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mjahmadee/vision_transformers

Awesome Lists containing this project

README