Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/TimeSformer-pytorch
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
https://github.com/lucidrains/TimeSformer-pytorch
artificial-intelligence attention-mechanism deep-learning transformers video-classification
Last synced: 3 months ago
JSON representation
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
- Host: GitHub
- URL: https://github.com/lucidrains/TimeSformer-pytorch
- Owner: lucidrains
- License: mit
- Created: 2021-02-11T04:01:17.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-08-25T00:52:32.000Z (about 3 years ago)
- Last Synced: 2024-07-05T13:49:45.283Z (4 months ago)
- Topics: artificial-intelligence, attention-mechanism, deep-learning, transformers, video-classification
- Language: Python
- Homepage:
- Size: 181 KB
- Stars: 676
- Watchers: 17
- Forks: 85
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## TimeSformer - Pytorch
Implementation of TimeSformer, from Facebook AI. A pure and simple attention-based solution for reaching SOTA on video classification. This repository will only house the best performing variant, 'Divided Space-Time Attention', which is nothing more than attention along the time axis before the spatial.
## Install
``` bash
$ pip install timesformer-pytorch
```## Usage
```python
import torch
from timesformer_pytorch import TimeSformermodel = TimeSformer(
dim = 512,
image_size = 224,
patch_size = 16,
num_frames = 8,
num_classes = 10,
depth = 12,
heads = 8,
dim_head = 64,
attn_dropout = 0.1,
ff_dropout = 0.1
)video = torch.randn(2, 8, 3, 224, 224) # (batch x frames x channels x height x width)
mask = torch.ones(2, 8).bool() # (batch x frame) - use a mask if there are variable length videos in the same batchpred = model(video, mask = mask) # (2, 10)
```## Citations
```bibtex
@misc{bertasius2021spacetime,
title = {Is Space-Time Attention All You Need for Video Understanding?},
author = {Gedas Bertasius and Heng Wang and Lorenzo Torresani},
year = {2021},
eprint = {2102.05095},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
``````bibtex
@misc{su2021roformer,
title = {RoFormer: Enhanced Transformer with Rotary Position Embedding},
author = {Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},
year = {2021},
eprint = {2104.09864},
archivePrefix = {arXiv},
primaryClass = {cs.CL}
}
``````bibtex
@article{tokshift2021,
title = {Token Shift Transformer for Video Classification},
author = {Hao Zhang, Yanbin Hao, Chong-Wah Ngo},
journal = {ACM Multimedia 2021},
}
```