https://github.com/tuvovan/vision_transformer_keras

Keras Implementation of Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
https://github.com/tuvovan/vision_transformer_keras

computer-vision image-recognition keras tensorflow transformer vision

Last synced: about 2 months ago
JSON representation

Keras Implementation of Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

Host: GitHub
URL: https://github.com/tuvovan/vision_transformer_keras
Owner: tuvovan
Created: 2020-10-12T05:35:44.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-11-13T07:48:02.000Z (over 4 years ago)
Last Synced: 2025-04-03T08:05:02.174Z (3 months ago)
Topics: computer-vision, image-recognition, keras, tensorflow, transformer, vision
Language: Python
Homepage: https://openreview.net/pdf?id=YicbFdNTTy
Size: 665 KB
Stars: 96
Watchers: 2
Forks: 26
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Vision Transformer

Vision Transformer

![teaser](vit.png)

## Content

- [Vision Transformer](#vision-transformer)

- [Getting Started](#getting-started)

- [Running](#running)

- [References](#references)

- [Citations](#citation)

## Getting Started

- Clone the repository

### Prerequisites

- Tensorflow 2.2.0+

- Tensorflow_addons

- Python 3.6+

- Keras 2.3.0

- PIL

- numpy

```python

pip install -r requirements.txt

```

## Running

### Training 

    ```

    python train.py

    ```

## Usage

### Training

```

usage: train.py [-h] [--logdir LOGDIR] [--image-size IMAGE_SIZE]

                [--patch-size PATCH_SIZE] [--num-layers NUM_LAYERS]

                [--d-model D_MODEL] [--num-heads NUM_HEADS]

                [--mlp-dim MLP_DIM] [--lr LR] [--weight-decay WEIGHT_DECAY]

                [--batch-size BATCH_SIZE] [--epochs EPOCHS]

```

```

optional arguments: -h, --help                show this help message and exit

                    --log_dir                 folder to save weights

                    --image_size              size of input image

                    --patch_size              size of patch to encode

                    --num-layers              number of transformer

                    --d-model                 embedding dimension

                    --mlp-dim                 hidden layer dimension

                    --lr                      learning rate

                    --weight-decay            weight decay

                    --batch-size              batch size

                    --epochs                  epochs

```

## License

This project is licensed under the MIT License - see the [LICENSE](https://github.com/tuvovan/ANL-HDRI/blob/master/LICENSE) file for details

## References

[1] AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE - [link](https://openreview.net/pdf?id=YicbFdNTTy)

[2] Text classification with Transformer - [link](https://keras.io/examples/nlp/text_classification_with_transformer/)

## Acknowledgments

- This work is heavily based on [Keras](https://keras.io/examples/nlp/text_classification_with_transformer/) version of Transformer.

- Any ideas on updating or misunderstanding, please send me an email: 

- If you find this repo helpful, kindly give me a star.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tuvovan/vision_transformer_keras

Awesome Lists containing this project

README