Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arig23498/tokenlearner
TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"
https://github.com/arig23498/tokenlearner
keras tensorflow2 token-learner vision-transformer
Last synced: about 2 months ago
JSON representation
TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"
- Host: GitHub
- URL: https://github.com/arig23498/tokenlearner
- Owner: ariG23498
- License: mit
- Created: 2021-12-08T13:18:04.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2021-12-17T02:44:50.000Z (about 3 years ago)
- Last Synced: 2024-10-24T13:44:18.124Z (2 months ago)
- Topics: keras, tensorflow2, token-learner, vision-transformer
- Language: Jupyter Notebook
- Homepage: https://keras.io/examples/vision/token_learner/
- Size: 44.9 KB
- Stars: 33
- Watchers: 2
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ariG23498/TokenLearner/blob/master/TokenLearner.ipynb) [![](https://img.shields.io/badge/blog-keras.io-%23d00000)](https://keras.io/examples/vision/token_learner/)
A TensorFlow implementation of TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? [1]. In this paper, an earlier version of which is presented at NeurIPS 2021 [2], the authors suggest an adaptive token learning algorithm that makes ViT computationally much more efficient (in terms of FLOPs) and also increases downstream accuracy (here classification accuracy). Experimenting with CIFAR-10 we reduce the number of pathces from **64** to **4** (number of adaptively learned tokens) and also report a boost in the accuracy. We experiment with different hyperparameters and report results which aligns with the literature.
## With and Without TokenLearner
We report results training our mini ViT with and without the vanilla TokenLearner module here.
You can find the vanilla Token Learner module in the [`TokenLearner.ipynb`](https://github.com/ariG23498/TokenLearner/blob/master/TokenLearner.ipynb) notebook.| **TokenLearner** | **# tokens in
TokenLearner** | **Top-1 Acc
(Averaged across 5 runs)** | **GFLOPs** | **TensorBoard** |
|:---:|:---:|:---:|:---:|:---:|
| N | - | 56.112% | 0.0184 | [Link](https://tensorboard.dev/experiment/vkCwM49dQZ2RiK0ZT4mj7w/) |
| Y | 8 | **56.55%** | **0.0153** | [Link](https://tensorboard.dev/experiment/vkCwM49dQZ2RiK0ZT4mj7w/) |
| N | - | 56.37% | 0.0184 | [Link](https://tensorboard.dev/experiment/hdyJ4wznQROwqZTgbtmztQ/) |
| Y | 4 | **56.4980%** | **0.0147** | [Link](https://tensorboard.dev/experiment/hdyJ4wznQROwqZTgbtmztQ/) |
| N | - (# Transformer layers: 8) | 55.36% | 0.0359 | [Link](https://tensorboard.dev/experiment/sepBK5zNSaOtdCeEG6SV9w/) |## TokenLearner v1.1
We have also implemented the Token Learner v11 module which aligns with the [official implementation](https://github.com/google-research/scenic/blob/main/scenic/projects/token_learner/model.py). The Token Learner v11 module can be found in the [`TokenLearner-V1.1.ipynb`](https://github.com/ariG23498/TokenLearner/blob/master/TokenLearner-V1.1.ipynb) notebook. The results training with this module are as follows:
| **# Groups** | **# Tokens** | **Top-1 Acc** | **GFLOPs** | **TensorBoard** |
|:---:|:---:|:---:|:---:|:---:|
| 4 | 4 | 54.638% | 0.0149 | [Link](https://tensorboard.dev/experiment/KmfkGqAGQjikEw85phySmw/) |
| 8 | 8 | 54.898% | 0.0146 | [Link](https://tensorboard.dev/experiment/0PpgYOq9RFWV9njX6NJQ2w/) |
| 4 | 8 | 55.196% | 0.0149 | [Link](https://tensorboard.dev/experiment/WUkrHbZASdu3zrfmY4ETZg/) |We acknowledge that the results with this new TokenLearner module are slightly off than expected and this might
mitigate with hyperparameter tuning.*Note*: To compute the FLOPs of our models we use [this utility](https://github.com/AdityaKane2001/regnety/blob/main/regnety/utils/model_utils.py#L27) from [this repository](https://github.com/AdityaKane2001/regnety).
# Acknowledgements
- [Michael S. Ryoo](http://michaelryoo.com/): The first author of the paper.
- [Google Developers Experts Program](https://developers.google.com/programs/experts/) and [JarvisLabs.ai](https://jarvislabs.ai/) for providing credits to perform extensive experimentation on A100 GPUs.# References
[1] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?; Ryoo et al.; arXiv 2021; https://arxiv.org/abs/2106.11297
[2] TokenLearner: Adaptive Space-Time Tokenization for Videos; Ryoo et al., NeurIPS 2021; https://openreview.net/forum?id=z-l1kpDXs88