https://github.com/mehrab-kalantari/vision-transformer
Training a ViT on CIFAR-10 dataset and then comparing a pre-trained ViT and CNN
https://github.com/mehrab-kalantari/vision-transformer
cnn resnet18 vision-transformer vit
Last synced: 3 months ago
JSON representation
Training a ViT on CIFAR-10 dataset and then comparing a pre-trained ViT and CNN
- Host: GitHub
- URL: https://github.com/mehrab-kalantari/vision-transformer
- Owner: Mehrab-Kalantari
- Created: 2024-07-07T11:48:42.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-07-07T12:06:15.000Z (11 months ago)
- Last Synced: 2025-01-16T09:42:00.400Z (4 months ago)
- Topics: cnn, resnet18, vision-transformer, vit
- Language: Jupyter Notebook
- Homepage:
- Size: 94.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CIFAR-10 Vision Transformer
The notebook contains
* Building a vision transformer from scratch
* Training a pre-trained ViT
* Comparing ViTs and CNNs## Vision Transformer
In the first part, we trained the model with 20 epochs and 4 attention heads and 4 layers. Also embedding dimension has been set to 64.## Pre-trained Vision Transformer
There is also a built-in version of ViT model which was trained on ImageNet-21k at resolution 224 * 224. Here we try this with 3 epochs model.## Pre-trained CNN
ResNet18 with 3 epochs has been trained on the dataset.## Results
![]()