Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/stam-pytorch
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
https://github.com/lucidrains/stam-pytorch
artificial-intelligence attention-mechanism deep-learning transformers video-classification
Last synced: 5 days ago
JSON representation
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
- Host: GitHub
- URL: https://github.com/lucidrains/stam-pytorch
- Owner: lucidrains
- License: mit
- Created: 2021-03-28T21:46:28.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-04-01T12:02:19.000Z (over 3 years ago)
- Last Synced: 2024-11-07T20:51:52.512Z (13 days ago)
- Topics: artificial-intelligence, attention-mechanism, deep-learning, transformers, video-classification
- Language: Python
- Homepage:
- Size: 64.5 KB
- Stars: 129
- Watchers: 5
- Forks: 15
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## STAM - Pytorch
Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in video classification. This corroborates the finding of TimeSformer. Attention is all we need.
## Install
```bash
$ pip install stam-pytorch
```## Usage
```python
import torch
from stam_pytorch import STAMmodel = STAM(
dim = 512,
image_size = 256, # size of image
patch_size = 32, # patch size
num_frames = 5, # number of image frames, selected out of video
space_depth = 12, # depth of vision transformer
space_heads = 8, # heads of vision transformer
space_mlp_dim = 2048, # feedforward hidden dimension of vision transformer
time_depth = 6, # depth of time transformer (in paper, it was shallower, 6)
time_heads = 8, # heads of time transformer
time_mlp_dim = 2048, # feedforward hidden dimension of time transformer
num_classes = 100, # number of output classes
space_dim_head = 64, # space transformer head dimension
time_dim_head = 64, # time transformer head dimension
dropout = 0., # dropout
emb_dropout = 0. # embedding dropout
)frames = torch.randn(2, 5, 3, 256, 256) # (batch x frames x channels x height x width)
pred = model(frames) # (2, 100)
```## Citations
```bibtex
@misc{sharir2021image,
title = {An Image is Worth 16x16 Words, What is a Video Worth?},
author = {Gilad Sharir and Asaf Noy and Lihi Zelnik-Manor},
year = {2021},
eprint = {2103.13915},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
```