Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/arig23498/tokenlearner

TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"
https://github.com/arig23498/tokenlearner

keras tensorflow2 token-learner vision-transformer

Last synced: about 2 months ago
JSON representation

TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Awesome Lists containing this project

README

        

# TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ariG23498/TokenLearner/blob/master/TokenLearner.ipynb) [![](https://img.shields.io/badge/blog-keras.io-%23d00000)](https://keras.io/examples/vision/token_learner/)




Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

A TensorFlow implementation of TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? [1]. In this paper, an earlier version of which is presented at NeurIPS 2021 [2], the authors suggest an adaptive token learning algorithm that makes ViT computationally much more efficient (in terms of FLOPs) and also increases downstream accuracy (here classification accuracy). Experimenting with CIFAR-10 we reduce the number of pathces from **64** to **4** (number of adaptively learned tokens) and also report a boost in the accuracy. We experiment with different hyperparameters and report results which aligns with the literature.

## With and Without TokenLearner

We report results training our mini ViT with and without the vanilla TokenLearner module here.
You can find the vanilla Token Learner module in the [`TokenLearner.ipynb`](https://github.com/ariG23498/TokenLearner/blob/master/TokenLearner.ipynb) notebook.

| **TokenLearner** | **# tokens in
TokenLearner** | **Top-1 Acc
(Averaged across 5 runs)** | **GFLOPs** | **TensorBoard** |
|:---:|:---:|:---:|:---:|:---:|
| N | - | 56.112% | 0.0184 | [Link](https://tensorboard.dev/experiment/vkCwM49dQZ2RiK0ZT4mj7w/) |
| Y | 8 | **56.55%** | **0.0153** | [Link](https://tensorboard.dev/experiment/vkCwM49dQZ2RiK0ZT4mj7w/) |
| N | - | 56.37% | 0.0184 | [Link](https://tensorboard.dev/experiment/hdyJ4wznQROwqZTgbtmztQ/) |
| Y | 4 | **56.4980%** | **0.0147** | [Link](https://tensorboard.dev/experiment/hdyJ4wznQROwqZTgbtmztQ/) |
| N | - (# Transformer layers: 8) | 55.36% | 0.0359 | [Link](https://tensorboard.dev/experiment/sepBK5zNSaOtdCeEG6SV9w/) |

## TokenLearner v1.1

We have also implemented the Token Learner v11 module which aligns with the [official implementation](https://github.com/google-research/scenic/blob/main/scenic/projects/token_learner/model.py). The Token Learner v11 module can be found in the [`TokenLearner-V1.1.ipynb`](https://github.com/ariG23498/TokenLearner/blob/master/TokenLearner-V1.1.ipynb) notebook. The results training with this module are as follows:

| **# Groups** | **# Tokens** | **Top-1 Acc** | **GFLOPs** | **TensorBoard** |
|:---:|:---:|:---:|:---:|:---:|
| 4 | 4 | 54.638% | 0.0149 | [Link](https://tensorboard.dev/experiment/KmfkGqAGQjikEw85phySmw/) |
| 8 | 8 | 54.898% | 0.0146 | [Link](https://tensorboard.dev/experiment/0PpgYOq9RFWV9njX6NJQ2w/) |
| 4 | 8 | 55.196% | 0.0149 | [Link](https://tensorboard.dev/experiment/WUkrHbZASdu3zrfmY4ETZg/) |

We acknowledge that the results with this new TokenLearner module are slightly off than expected and this might
mitigate with hyperparameter tuning.

*Note*: To compute the FLOPs of our models we use [this utility](https://github.com/AdityaKane2001/regnety/blob/main/regnety/utils/model_utils.py#L27) from [this repository](https://github.com/AdityaKane2001/regnety).

# Acknowledgements

- [Michael S. Ryoo](http://michaelryoo.com/): The first author of the paper.
- [Google Developers Experts Program](https://developers.google.com/programs/experts/) and [JarvisLabs.ai](https://jarvislabs.ai/) for providing credits to perform extensive experimentation on A100 GPUs.

# References

[1] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?; Ryoo et al.; arXiv 2021; https://arxiv.org/abs/2106.11297

[2] TokenLearner: Adaptive Space-Time Tokenization for Videos; Ryoo et al., NeurIPS 2021; https://openreview.net/forum?id=z-l1kpDXs88