https://github.com/lucidrains/TimeSformer-pytorch

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
https://github.com/lucidrains/TimeSformer-pytorch

artificial-intelligence attention-mechanism deep-learning transformers video-classification

Last synced: 3 months ago
JSON representation

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification

Host: GitHub
URL: https://github.com/lucidrains/TimeSformer-pytorch
Owner: lucidrains
License: mit
Created: 2021-02-11T04:01:17.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-08-25T00:52:32.000Z (about 4 years ago)
Last Synced: 2025-06-27T06:18:46.475Z (3 months ago)
Topics: artificial-intelligence, attention-mechanism, deep-learning, transformers, video-classification
Language: Python
Homepage:
Size: 181 KB
Stars: 719
Watchers: 17
Forks: 85
Open Issues: 14
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          

## TimeSformer - Pytorch

Implementation of TimeSformer, from Facebook AI. A pure and simple attention-based solution for reaching SOTA on video classification. This repository will only house the best performing variant, 'Divided Space-Time Attention', which is nothing more than attention along the time axis before the spatial.

Press release

## Install

``` bash

$ pip install timesformer-pytorch

```

## Usage

```python

import torch

from timesformer_pytorch import TimeSformer

model = TimeSformer(

    dim = 512,

    image_size = 224,

    patch_size = 16,

    num_frames = 8,

    num_classes = 10,

    depth = 12,

    heads = 8,

    dim_head =  64,

    attn_dropout = 0.1,

    ff_dropout = 0.1

)

video = torch.randn(2, 8, 3, 224, 224) # (batch x frames x channels x height x width)

mask = torch.ones(2, 8).bool() # (batch x frame) - use a mask if there are variable length videos in the same batch

pred = model(video, mask = mask) # (2, 10)

```

## Citations

```bibtex

@misc{bertasius2021spacetime,

    title   = {Is Space-Time Attention All You Need for Video Understanding?}, 

    author  = {Gedas Bertasius and Heng Wang and Lorenzo Torresani},

    year    = {2021},

    eprint  = {2102.05095},

    archivePrefix = {arXiv},

    primaryClass = {cs.CV}

}

```

```bibtex

@misc{su2021roformer,

    title   = {RoFormer: Enhanced Transformer with Rotary Position Embedding},

    author  = {Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},

    year    = {2021},

    eprint  = {2104.09864},

    archivePrefix = {arXiv},

    primaryClass = {cs.CL}

}

```

```bibtex

@article{tokshift2021,

    title   = {Token Shift Transformer for Video Classification},

    author  = {Hao Zhang, Yanbin Hao, Chong-Wah Ngo},

    journal = {ACM Multimedia 2021},

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucidrains/TimeSformer-pytorch

Awesome Lists containing this project

README