Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/faustomorales/vit-keras
Keras implementation of ViT (Vision Transformer)
https://github.com/faustomorales/vit-keras
computer-vision keras
Last synced: 4 days ago
JSON representation
Keras implementation of ViT (Vision Transformer)
- Host: GitHub
- URL: https://github.com/faustomorales/vit-keras
- Owner: faustomorales
- License: apache-2.0
- Created: 2020-11-07T23:20:16.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2024-05-28T22:18:14.000Z (8 months ago)
- Last Synced: 2025-01-12T10:04:25.952Z (11 days ago)
- Topics: computer-vision, keras
- Language: Python
- Homepage:
- Size: 160 KB
- Stars: 339
- Watchers: 8
- Forks: 82
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# vit-keras
This is a Keras implementation of the models described in [An Image is Worth 16x16 Words:
Transformes For Image Recognition at Scale](https://arxiv.org/pdf/2010.11929.pdf). It is based on an earlier implementation from [tuvovan](https://github.com/tuvovan/Vision_Transformer_Keras), modified to match the Flax implementation in the [official repository](https://github.com/google-research/vision_transformer).The weights here are ported over from the weights provided in the official repository. See `utils.load_weights_numpy` to see how this is done (it's not pretty, but it does the job).
## Usage
Install this package using `pip install vit-keras`You can use the model out-of-the-box with ImageNet 2012 classes using
something like the following. The weights will be downloaded automatically.```python
from vit_keras import vit, utilsimage_size = 384
classes = utils.get_imagenet_classes()
model = vit.vit_b16(
image_size=image_size,
activation='sigmoid',
pretrained=True,
include_top=True,
pretrained_top=True
)
url = 'https://upload.wikimedia.org/wikipedia/commons/d/d7/Granny_smith_and_cross_section.jpg'
image = utils.read(url, image_size)
X = vit.preprocess_inputs(image).reshape(1, image_size, image_size, 3)
y = model.predict(X)
print(classes[y[0].argmax()]) # Granny smith
```You can fine-tune using a model loaded as follows.
```python
image_size = 224
model = vit.vit_l32(
image_size=image_size,
activation='sigmoid',
pretrained=True,
include_top=True,
pretrained_top=False,
classes=200
)
# Train this model on your data as desired.
```## Visualizing Attention Maps
There's some functionality for plotting attention maps for a given image and model. See example below. I'm not sure I'm doing this correctly (the official repository didn't have example code). Feedback /corrections welcome!```python
import numpy as np
import matplotlib.pyplot as plt
from vit_keras import vit, utils, visualize# Load a model
image_size = 384
classes = utils.get_imagenet_classes()
model = vit.vit_b16(
image_size=image_size,
activation='sigmoid',
pretrained=True,
include_top=True,
pretrained_top=True
)
classes = utils.get_imagenet_classes()# Get an image and compute the attention map
url = 'https://upload.wikimedia.org/wikipedia/commons/b/bc/Free%21_%283987584939%29.jpg'
image = utils.read(url, image_size)
attention_map = visualize.attention_map(model=model, image=image)
print('Prediction:', classes[
model.predict(vit.preprocess_inputs(image)[np.newaxis])[0].argmax()]
) # Prediction: Eskimo dog, husky# Plot results
fig, (ax1, ax2) = plt.subplots(ncols=2)
ax1.axis('off')
ax2.axis('off')
ax1.set_title('Original')
ax2.set_title('Attention Map')
_ = ax1.imshow(image)
_ = ax2.imshow(attention_map)
```![example of attention map](https://raw.githubusercontent.com/faustomorales/vit-keras/master/docs/attention_map_example.jpg)