Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vachanvy/vision-transformers
Vision-Transformers in jax using Keras.
https://github.com/vachanvy/vision-transformers
computer-vision deep-learning jax keras transformers vision-transformer
Last synced: about 1 month ago
JSON representation
Vision-Transformers in jax using Keras.
- Host: GitHub
- URL: https://github.com/vachanvy/vision-transformers
- Owner: VachanVY
- License: mit
- Created: 2024-04-13T17:08:02.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-05-25T12:18:40.000Z (8 months ago)
- Last Synced: 2024-11-06T22:07:50.208Z (3 months ago)
- Topics: computer-vision, deep-learning, jax, keras, transformers, vision-transformer
- Language: Jupyter Notebook
- Homepage:
- Size: 772 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Vision-Transformers
## The Classic ViT
![alt text](images/image1.png)
* To handle 2D images we reshape images of shape $(H, W, C)$ to a shape $(N, P^2•C)$, where $(P, P)$ is the resolution of the original patch and $N=HW/P^2$ which is the resulting number of patches, which also serves as the effective input sequence length to the transformer
* The last dimention is projected to $d_{model} = D$![alt text](images/image2.png)
* $x_p^i$ is of shape $(N, P^2•C)$ is linearly projected to shape $(N, D)$ using weights of shape $(P^2•C, D)$, and $i$ goes from $1$ to $N$
* Similar to BERT’s `[class]` token, we prepend a learnable embedding to the sequence of embedded patches ($z_0^0 = x_{class}$), whose state at the output of the Transformer encoder $(z^0_L)$ serves as the image representation $y$ (Eq. 4). Both during pre-training and fine-tuning, a classification head is attached to $z_L^0$
* $x_{class}$ is of shape $(1, D)$
* $z_0$ is of shape $(N+1, D)$