https://github.com/tuvovan/vision_transformer_keras
Keras Implementation of Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
https://github.com/tuvovan/vision_transformer_keras
computer-vision image-recognition keras tensorflow transformer vision
Last synced: 10 days ago
JSON representation
Keras Implementation of Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
- Host: GitHub
- URL: https://github.com/tuvovan/vision_transformer_keras
- Owner: tuvovan
- Created: 2020-10-12T05:35:44.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-11-13T07:48:02.000Z (over 4 years ago)
- Last Synced: 2025-04-03T08:05:02.174Z (about 2 months ago)
- Topics: computer-vision, image-recognition, keras, tensorflow, transformer, vision
- Language: Python
- Homepage: https://openreview.net/pdf?id=YicbFdNTTy
- Size: 665 KB
- Stars: 96
- Watchers: 2
- Forks: 26
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Vision Transformer
Vision Transformer

## Content
- [Vision Transformer](#vision-transformer)
- [Getting Started](#getting-started)
- [Running](#running)
- [References](#references)
- [Citations](#citation)## Getting Started
- Clone the repository
### Prerequisites
- Tensorflow 2.2.0+
- Tensorflow_addons
- Python 3.6+
- Keras 2.3.0
- PIL
- numpy```python
pip install -r requirements.txt
```## Running
### Training
```
python train.py
```
## Usage
### Training
```
usage: train.py [-h] [--logdir LOGDIR] [--image-size IMAGE_SIZE]
[--patch-size PATCH_SIZE] [--num-layers NUM_LAYERS]
[--d-model D_MODEL] [--num-heads NUM_HEADS]
[--mlp-dim MLP_DIM] [--lr LR] [--weight-decay WEIGHT_DECAY]
[--batch-size BATCH_SIZE] [--epochs EPOCHS]
``````
optional arguments: -h, --help show this help message and exit
--log_dir folder to save weights
--image_size size of input image
--patch_size size of patch to encode
--num-layers number of transformer
--d-model embedding dimension
--mlp-dim hidden layer dimension
--lr learning rate
--weight-decay weight decay
--batch-size batch size
--epochs epochs
```## License
This project is licensed under the MIT License - see the [LICENSE](https://github.com/tuvovan/ANL-HDRI/blob/master/LICENSE) file for details
## References
[1] AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE - [link](https://openreview.net/pdf?id=YicbFdNTTy)[2] Text classification with Transformer - [link](https://keras.io/examples/nlp/text_classification_with_transformer/)
## Acknowledgments
- This work is heavily based on [Keras](https://keras.io/examples/nlp/text_classification_with_transformer/) version of Transformer.
- Any ideas on updating or misunderstanding, please send me an email:
- If you find this repo helpful, kindly give me a star.